CN108198573A - Audio identification methods and device, storage medium and electronic equipment - Google Patents
Audio identification methods and device, storage medium and electronic equipment Download PDFInfo
- Publication number
- CN108198573A CN108198573A CN201711486757.6A CN201711486757A CN108198573A CN 108198573 A CN108198573 A CN 108198573A CN 201711486757 A CN201711486757 A CN 201711486757A CN 108198573 A CN108198573 A CN 108198573A
- Authority
- CN
- China
- Prior art keywords
- audio
- frequency
- identified
- fingerprint
- frequency fingerprint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000000605 extraction Methods 0.000 claims abstract description 71
- 239000012634 fragment Substances 0.000 claims description 17
- 239000000284 extract Substances 0.000 claims description 14
- 238000001514 detection method Methods 0.000 claims description 6
- 238000003825 pressing Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 3
- 230000004069 differentiation Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of audio identification methods, including:According to preset first selection rule, target audio sample is chosen in audio to be identified;Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;When non-successful match, in the audio to be identified, new target audio sample is chosen again, until successful match, to realize the identification to audio to be identified.In recognition methods provided by the invention, a section audio is chosen first as target audio sample, audio-frequency fingerprint is extracted from the target audio sample, it is matched with the fingerprint in the audio-frequency fingerprint library pre-established, in non-successful match, a section audio is selected, until completing the identification to the audio to be identified, to improve the discrimination to audio to be identified as new target audio sample again in audio to be identified.
Description
Technical field
The present invention relates to audio frequency identification technique field, more particularly to a kind of audio identification methods and device, storage medium and
Electronic equipment.
Background technology
In recent years, the application of audio image documentation equipment more and more occurs in people’s lives.People broadcast using music
It puts that equipment listens to music or image application equipment is changed during seeing film, needs to watch the lyrics or subtitle.Therefore, audio is known
The application in every field of other technology is more and more extensive.
Generally using the technology of audio-frequency fingerprint identification, technical staff has found existing audio frequency identification technique by research and development, existing
During some audio identifications, such as song recognition, same first song might have multiple and different versions, office between different editions
Portion there are audio it is inconsistent the phenomenon that.It such as can there are the sound that the cheer of mass viewer audiences or people are spoken in live version audio
Sound, therefore the fingerprint frequency range extracted in identification process can reduce the discrimination of audio if there is the sound other than music.
Invention content
The technical problems to be solved by the invention are to provide a kind of audio identification methods, in audio identification process using multiple
The mode that acquisition audio sample is identified, to promote the discrimination of audio.
The present invention also provides a kind of speech recognizing device, to ensure the realization and application of the above method in practice.
A kind of audio identification methods, including:
According to preset first selection rule, target audio sample is chosen in audio to be identified;
Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;The built in advance
Vertical audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction is close
Degree is less than second audio-frequency fingerprint and extracts density;
When non-successful match, in the audio to be identified, choose new target audio sample again, until matching into
Work(, to realize the identification to audio to be identified.
Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified
This, including:
Determine first time length;
The audio of a length of first time length is as target audio sample when being randomly selected in the audio to be identified
This.
Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified
Originally include:
Audio power detection is carried out to each audio fragment in the audio to be identified;
In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen as mesh
Mark audio sample.
Above-mentioned method, optionally, multiple audio-frequency fingerprints by extraction and the finger in pre-established audio-frequency fingerprint library
Line carries out matching and includes:
Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction;
By the fingerprint progress in the multiple audio-frequency fingerprint Jing Guo down-sampled processing, with pre-established audio-frequency fingerprint library
Match.
Above-mentioned method, it is optionally, described in audio to be identified, new target audio sample is chosen again to be included:
According to preset second selection rule, new target audio sample, institute are chosen again in the audio to be identified
The second selection rule is stated different from first selection rule.
Above-mentioned method optionally, further includes:
When successful match, the corresponding audio-frequency information of the audio to be identified is obtained, and the audio-frequency information is fed back to
Client.
A kind of speech recognizing device, including:
First selection unit, for according to preset first selection rule, target audio sample to be chosen in audio to be identified
This;
For pressing the first audio-frequency fingerprint extraction density, multiple audios are extracted in the target audio sample for extraction unit
Fingerprint;
Matching unit, for the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library to be carried out
Matching;The pre-established audio-frequency fingerprint library by the second audio-frequency fingerprint extraction density carry out audio-frequency fingerprint extraction, described first
Audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density;
Second selection unit, for when non-successful match, in the audio to be identified, choosing new target sound again
Frequency sample, until successful match, to realize the identification to audio to be identified.
Above-mentioned device optionally, further includes:
Feedback unit, for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and by the sound
Frequency information feeds back to client.
A kind of storage medium, the storage medium include the program of storage, wherein, when described program is run described in control
Equipment where storage medium performs above-mentioned audio identification methods.
A kind of electronic equipment, including memory and one either one of them or one of more than one program with
Upper program is stored in memory, and is configured to perform above-mentioned video identification side by one or more than one processor
Method.
Compared with prior art, the present invention includes advantages below:
The present invention provides a kind of audio identification methods, preset first selection rule are first depending on, in audio to be identified
Middle selection target audio sample;And density is extracted by the first audio-frequency fingerprint, extract multiple audios in the target audio sample
Fingerprint;The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;When not matching into
During work(, in the audio to be identified, new target audio sample is chosen again, audio is extracted in new target audio sample
Fingerprint continues to match, until the Audio Matching success to be identified.In audio identification methods provided by the invention, using point
The method of section extraction audio sample, in the case where the last period audio sample fails identification, extracts new audio sample again
Originally continue to identify, improve the discrimination of audio to be identified.
In addition, in audio identification methods provided by the invention, density is extracted by the first audio-frequency fingerprint in target audio sample
Audio-frequency fingerprint is extracted, and the audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library.In the present invention,
The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density when establishing, also,
The first audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density.Using the fingerprint extraction side of differentiation
Formula, when audio-frequency fingerprint is compared, you can to keep corresponding recognition rate, it is also ensured that the identification essence of audio-frequency fingerprint
Degree, realizes the balance to recognition rate and precision.
Certainly, it implements any of the products of the present invention and does not necessarily require achieving all the advantages described above at the same time.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, for
For those of ordinary skill in the art, without having to pay creative labor, it can also be obtained according to these attached drawings
His attached drawing.
Fig. 1 is a kind of method flow diagram of audio identification methods provided by the invention;
Fig. 2 is a kind of another method flow diagram of audio identification methods provided by the invention;
Fig. 3 is a kind of another method flow diagram of audio identification methods provided by the invention;
Fig. 4 is a kind of another method flow diagram of audio identification methods provided by the invention;
Fig. 5 is a kind of principle Organization Chart of audio identification methods provided by the invention;
Fig. 6 is a kind of structure diagram of speech recognizing device provided by the invention;
Fig. 7 is a kind of another structure diagram of speech recognizing device provided by the invention;
Fig. 8 is the structure diagram of a kind of electronic equipment provided by the invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work
Embodiment shall fall within the protection scope of the present invention.
The present invention can be used in numerous general or special purpose computing device environment or configuration.Such as:Personal computer, service
Device computer, handheld device or portable device, laptop device, multi-processor device, including any of the above device or equipment
Distributed computing environment etc..
The present invention provides a kind of audio identification methods, the executive agent of this method can be the processing in playback equipment
Device, method flow diagram is as shown in Figure 1, including step:
S101:According to preset first selection rule, target audio sample is chosen in audio to be identified;
In the present invention, for audio to be identified, chosen in the audio to be identified according to preset first selection rule
Target audio sample.First selection rule can be random rule, or the spy set by the audio to be identified
Set pattern is then.
S102:Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
In the present invention, the audio-frequency fingerprint is the numerical characteristic in audio, and the numerical characteristic is only in the audio
One without two feature that can characterize acoustic characteristic, and the audio-frequency fingerprint involved in the present invention i.e. will be in the target audio sample
Numerical characteristic is extracted in the form of identifier.The extraction density of sound intermediate frequency fingerprint of the present invention refers to carry in a section audio
The number of numerical characteristic taken.The numerical characteristic extracted in a section audio more multilist sign audio-frequency fingerprint extraction density is bigger.
S103:The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;Institute
State the extraction that pre-established audio-frequency fingerprint library is carried out audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, first audio-frequency fingerprint
It extracts density and is less than second audio-frequency fingerprint extraction density;
In the present invention, audio-frequency fingerprint library is pre-established, when establishing the audio-frequency fingerprint library, is carried according to the second audio-frequency fingerprint
Density extraction audio-frequency fingerprint is taken, and then forms the audio-frequency fingerprint library;In the present invention, density is extracted by the first audio-frequency fingerprint,
Multiple audio-frequency fingerprints are extracted in the target audio sample;It will be in multiple audio-frequency fingerprints of extraction and pre-established audio-frequency fingerprint library
Fingerprint matched.
S104:When non-successful match, in the audio to be identified, new target audio sample is chosen again, until
Successful match, to realize the identification to audio to be identified.
In the present invention, when choosing new target audio sample again, the new target audio sample is chosen with last
Target audio sample it is different, for the target audio sample chosen again when choosing, the selection rule of use can also be with upper one
The selection rule that the target audio sample of secondary selection uses is different.
Audio fingerprint techniques refer to numerical characteristic unique in a section audio through specific algorithm with indications
Form extract, for identifying the position of the sample sound of magnanimity or track and localization sample in database.The present invention
Inventor finding that during audio identification, audio to be identified can cause due to different versions after numerous studies
Recognition failures.Meanwhile in the existing knowledge method for distinguishing using audio-frequency fingerprint, using long audio fragment and big fingerprint density energy
Discrimination is enough improved, but since bent library is larger, operation efficiency can be influenced, improves the index time according to short identification segment
With small fingerprint density, speed can be fast, but can reduce discrimination again.
Therefore in above-mentioned audio identification methods provided by the invention, using the audio identification strategy of segmentation classification, first
When secondary determining target audio sample is unidentified successful, new target audio sample is redefined, using stage extraction audio sample
This method, in the case where the last period audio sample fails identification, extracts new audio sample and continues to know again
Not, the discrimination of audio to be identified is improved.
In audio identification methods provided by the invention, Density extraction sound is extracted by the first audio-frequency fingerprint in target audio sample
Frequency fingerprint, and the audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library.It is described pre- in the present invention
The audio-frequency fingerprint library of foundation is carried out the extraction of audio-frequency fingerprint, also, described the by the second audio-frequency fingerprint extraction density when establishing
One audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density.Using the fingerprint extraction mode of differentiation, in sound
When frequency fingerprint is compared, you can to keep corresponding recognition rate, it is also ensured that the accuracy of identification of audio-frequency fingerprint realizes
To the balance of recognition rate and precision.
In the present invention, a section audio can be randomly selected as target audio sample, the process of selection can be setting choosing
Duration is taken, which can be first time length, and in audio to be identified, it is described first to randomly select a period of time length
As target audio sample, fingerprint extraction is carried out to the target audio sample for the audio of time span.
In the present invention, a section audio can be randomly selected as target audio sample, method that can also be as shown in Figure 2 is selected
Target audio sample is taken, including step:
S201:Audio power detection is carried out to each audio fragment in the audio to be identified;
S202:In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen
As target audio sample.
In the present invention, when choosing target audio sample, the audio power of each audio fragment can be examined first
It surveys, can be using higher segment as target audio sample by audio, the audio sample feature chosen in this way is apparent, and then is promoted and known
Other efficiency.
With reference to figure 3, multiple audio-frequency fingerprints by extraction are matched with the fingerprint in pre-established audio-frequency fingerprint library
Process include step:
S301:Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction;
S302:By the fingerprint in the multiple audio-frequency fingerprint Jing Guo down-sampled processing, with pre-established audio-frequency fingerprint library
It is matched.
In the present invention, for multiple audio-frequency fingerprints of extraction, down-sampled processing can also be further carried out, from the more of extraction
The apparent audio-frequency fingerprint of a further selected characteristic of audio-frequency fingerprint, is matched with the fingerprint in pre-established audio-frequency fingerprint library.
It, can be according to preset second selection rule, in the audio to be identified in recognition methods provided by the invention
Again new target audio sample is chosen, second selection rule is different from first selection rule.Phase can also be used
Same selection rule is chosen.
On the basis of Fig. 1, as shown in figure 4, the audio identification methods further include:S105:When successful match, obtain
The corresponding audio-frequency information of the audio to be identified, and the audio-frequency information is fed back into client.
Fig. 5 shows the schematic diagram of sound intermediate frequency recognition methods of the present invention, in Figure 5, refers in offline establish in offline structure
The line library stage establishes fingerprint base, and store singer's version album number of song for Qu Ku according to certain fingerprint extraction density A
Etc. information;Wherein fingerprint corresponds to certain of a kind of audio frequency characteristics (such as spectrum information or sub-belt energy information) or audio frequency characteristics
Kind transformation (such as Hash transformation).
The online recognition stage, for music segment L1 (such as preceding 30s) according to fingerprint extraction density B (B<A), sound is extracted
Frequency fingerprint;Wherein fingerprint extraction density, it is possible to understand that the window into audio feature extraction phases moves size;It is small close that big window moves correspondence
Degree, small window move corresponding big density;
Centainly down-sampled can also be carried out to fingerprint in fingerprint generation phase, only partial fingerprints are used for being retrieved;
Player sends fingerprint segment to identification service interface, obtains recognition result.
If display song recognition mistake or recognition failures, by player to the segment L2 of music (L1 is different from L2,
Such as the 30s since the 2nd minute) taking the fingerprint again is identified.
The situation of specific identification by stages, may be used random sampling, can also use certain critical segment selection strategy,
Such as choose the higher segment of audio power or than more continuous segment.
It is corresponding with the audio identification methods described in Fig. 1, in the embodiment of the present invention, additionally provide a kind of audio identification dress
Put, structure diagram as shown in fig. 6, including:
First chooses single 401, for according to preset first selection rule, target audio sample to be chosen in audio to be identified
This;
For pressing the first audio-frequency fingerprint extraction density, multiple sounds are extracted in the target audio sample for extraction unit 402
Frequency fingerprint;
Matching unit 403, for by the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library
It is matched;The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, described
First audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density;
Second selection unit 404, for when non-successful match, in the audio to be identified, choosing new mesh again
Audio sample is marked, until successful match, to realize the identification to audio to be identified.
In speech recognizing device provided by the invention, using the method for stage extraction audio sample, in previous section audio sample
Originally it fails in the case of identification, extracts new audio sample again and continue to identify, improve the knowledge of audio to be identified
Not rate.
On the basis of Fig. 6, as shown in fig. 7, first selection single 401 includes:
Detection sub-unit 405, for carrying out audio power detection to each audio fragment in the audio to be identified;
Subelement 406 is chosen, in each audio fragment of the audio power more than preset audio energy threshold, choosing
One audio fragment is as target audio sample.
The matching unit 403 includes:
Subelement 407 is handled, for carrying out down-sampled processing to multiple audio-frequency fingerprints of the extraction;
Coupling subelement 408 for that will pass through the multiple audio-frequency fingerprint of down-sampled processing, refers to pre-established audio
Fingerprint in line library is matched.
In the identification device, further include:
Feedback unit 409, for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and by institute
It states audio-frequency information and feeds back to client.
The embodiment of the present invention additionally provides a kind of storage medium, and the storage medium includes the program of storage, wherein, in institute
It states the equipment where the storage medium is controlled during program operation and performs above-mentioned audio identification methods, the method specifically includes:
According to preset first selection rule, target audio sample is chosen in audio to be identified;
Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;The built in advance
Vertical audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction is close
Degree is less than second audio-frequency fingerprint and extracts density;
When non-successful match, in the audio to be identified, choose new target audio sample again, until matching into
Work(, to realize the identification to audio to be identified.
Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified
This, including:
Determine first time length;
The audio of a length of first time length is as target audio sample when being randomly selected in the audio to be identified
This.
Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified
Originally include:
Audio power detection is carried out to each audio fragment in the audio to be identified;
In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen as mesh
Mark audio sample.
Above-mentioned method, optionally, multiple audio-frequency fingerprints by extraction and the finger in pre-established audio-frequency fingerprint library
Line carries out matching and includes:
Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction;
By the fingerprint progress in the multiple audio-frequency fingerprint Jing Guo down-sampled processing, with pre-established audio-frequency fingerprint library
Match.
Above-mentioned method, it is optionally, described in audio to be identified, new target audio sample is chosen again to be included:
According to preset second selection rule, new target audio sample, institute are chosen again in the audio to be identified
The second selection rule is stated different from first selection rule.
Above-mentioned method optionally, further includes:
When successful match, the corresponding audio-frequency information of the audio to be identified is obtained, and the audio-frequency information is fed back to
Client.
The embodiment of the present invention additionally provides a kind of electronic equipment, and structure diagram is as shown in figure 8, specifically include memory
501 and one either more than one program 502 one of them or more than one program 502 be stored in memory 501
In, and be configured to by one or more than one processor 503 perform the one or more programs 502 include use
In the instruction for carrying out following operation:
According to preset first selection rule, target audio sample is chosen in audio to be identified;
Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;The built in advance
Vertical audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction is close
Degree is less than second audio-frequency fingerprint and extracts density;
When non-successful match, in the audio to be identified, choose new target audio sample again, until matching into
Work(, to realize the identification to audio to be identified.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight
Point explanation is all difference from other examples, and just to refer each other for identical similar part between each embodiment.
For device class embodiment, since it is basicly similar to embodiment of the method, so description is fairly simple, related part is joined
See the part explanation of embodiment of the method.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, term " comprising ", "comprising" or its any other variant meaning
Covering non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only include that
A little elements, but also including other elements that are not explicitly listed or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except also there are other identical elements in the process, method, article or apparatus that includes the element.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, implementing this
The function of each unit is realized can in the same or multiple software and or hardware during invention.
As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can
It is realized by the mode of software plus required general hardware platform.Based on such understanding, technical scheme of the present invention essence
On the part that the prior art contributes can be embodied in the form of software product in other words, the computer software product
It can be stored in storage medium, such as ROM/RAM, magnetic disc, CD, be used including some instructions so that a computer equipment
(can be personal computer, server either network equipment etc.) performs the certain of each embodiment of the present invention or embodiment
Method described in part.
A kind of audio identification methods provided by the present invention and device are described in detail above, it is used herein
Specific case is expounded the principle of the present invention and embodiment, and the explanation of above example is only intended to help to understand this
The method and its core concept of invention;Meanwhile for those of ordinary skill in the art, thought according to the present invention, specific
There will be changes in embodiment and application range, in conclusion the content of the present specification should not be construed as to the present invention's
Limitation.
Claims (10)
1. a kind of audio identification methods, which is characterized in that including:
According to preset first selection rule, target audio sample is chosen in audio to be identified;
Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;It is described pre-established
Audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction density is small
Density is extracted in second audio-frequency fingerprint;
When non-successful match, in the audio to be identified, new target audio sample is chosen again, until successful match,
To realize the identification to audio to be identified.
2. according to the method described in claim 1, it is characterized in that, described according to preset selection rule, in audio to be identified
Middle selection target audio sample, including:
Determine first time length;
The audio of a length of first time length is as target audio sample when being randomly selected in the audio to be identified.
3. according to the method described in claim 1, it is characterized in that, described according to preset selection rule, in audio to be identified
Middle selection target audio sample includes:
Audio power detection is carried out to each audio fragment in the audio to be identified;
In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen as target sound
Frequency sample.
4. according to the method described in claim 1, it is characterized in that, multiple audio-frequency fingerprints by extraction and pre-established sound
Fingerprint in frequency fingerprint base carries out matching and includes:
Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction;
By the multiple audio-frequency fingerprint Jing Guo down-sampled processing, matched with the fingerprint in pre-established audio-frequency fingerprint library.
5. according to the method described in claim 1, it is characterized in that, described in audio to be identified, new target is chosen again
Audio sample includes:
According to preset second selection rule, new target audio sample is chosen again in the audio to be identified, described
Two selection rules are different from first selection rule.
6. it according to the method described in claim 1, it is characterized in that, further includes:
When successful match, the corresponding audio-frequency information of the audio to be identified is obtained, and the audio-frequency information is fed back into client
End.
7. a kind of speech recognizing device, which is characterized in that including:
First selection unit, for according to preset first selection rule, target audio sample to be chosen in audio to be identified;
For pressing the first audio-frequency fingerprint extraction density, multiple audio-frequency fingerprints are extracted in the target audio sample for extraction unit;
A matching unit, for the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library to be carried out
Match;The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint, first sound by the second audio-frequency fingerprint extraction density
Frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density;
Second selection unit, for when non-successful match, in the audio to be identified, choosing new target audio sample again
This, until successful match, to realize the identification to audio to be identified.
8. device according to claim 7, which is characterized in that further include:
Feedback unit for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and the audio is believed
Breath feeds back to client.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein, it is run in described program
When the equipment where the storage medium is controlled to perform the audio identification methods as described in claim 1~6 any one.
10. a kind of electronic equipment, which is characterized in that including memory and one or more than one program, one of them
Either more than one program is stored in memory and is configured to be wanted by one or more than one processor execution such as right
Seek the video frequency identifying method described in 1~6 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711486757.6A CN108198573B (en) | 2017-12-29 | 2017-12-29 | Audio recognition method and device, storage medium and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711486757.6A CN108198573B (en) | 2017-12-29 | 2017-12-29 | Audio recognition method and device, storage medium and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108198573A true CN108198573A (en) | 2018-06-22 |
CN108198573B CN108198573B (en) | 2021-04-30 |
Family
ID=62587013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711486757.6A Active CN108198573B (en) | 2017-12-29 | 2017-12-29 | Audio recognition method and device, storage medium and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108198573B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723235A (en) * | 2019-03-19 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | Music content identification method, device and equipment |
CN112380382A (en) * | 2020-11-23 | 2021-02-19 | 北京达佳互联信息技术有限公司 | Audio classification method and device and storage medium |
Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1403783A2 (en) * | 2002-09-24 | 2004-03-31 | Matsushita Electric Industrial Co., Ltd. | Audio signal feature extraction |
US20090307273A1 (en) * | 2008-06-06 | 2009-12-10 | Tecsys Development, Inc. | Using Metadata Analysis for Monitoring, Alerting, and Remediation |
CN102308295A (en) * | 2009-02-10 | 2012-01-04 | 索尼爱立信移动通讯有限公司 | Music profiling |
CN102314875A (en) * | 2011-08-01 | 2012-01-11 | 北京百度网讯科技有限公司 | Audio file identification method and device |
CN103021440A (en) * | 2012-11-22 | 2013-04-03 | 腾讯科技(深圳)有限公司 | Method and system for tracking audio streaming media |
CN103403710A (en) * | 2011-02-10 | 2013-11-20 | 雅虎公司 | Extraction and matching of characteristic fingerprints from audio signals |
CN103455514A (en) * | 2012-06-01 | 2013-12-18 | 腾讯科技(深圳)有限公司 | Updating method and updating device for audio file |
CN103778174A (en) * | 2012-10-19 | 2014-05-07 | 索尼公司 | Apparatus and method for scene change detection-based trigger for audio fingerprinting analysis |
US8805560B1 (en) * | 2011-10-18 | 2014-08-12 | Google Inc. | Noise based interest point density pruning |
CN104077336A (en) * | 2013-05-09 | 2014-10-01 | 腾讯科技(深圳)有限公司 | Method and device for dragging audio file to retrieve audio file information |
CN104184697A (en) * | 2013-05-20 | 2014-12-03 | 百度在线网络技术(北京)有限公司 | Audio fingerprint extraction method and system thereof |
CN104317967A (en) * | 2014-11-17 | 2015-01-28 | 北京航空航天大学 | Two-layer advertisement audio retrieval method based on audio fingerprints |
US20150254342A1 (en) * | 2011-05-30 | 2015-09-10 | Lei Yu | Video dna (vdna) method and system for multi-dimensional content matching |
CN104915403A (en) * | 2015-06-01 | 2015-09-16 | 腾讯科技(北京)有限公司 | Information processing method and server |
CN105138541A (en) * | 2015-07-08 | 2015-12-09 | 腾讯科技(深圳)有限公司 | Audio fingerprint matching query method and device |
CN105874732A (en) * | 2014-01-07 | 2016-08-17 | 高通股份有限公司 | Method and device for identifying a piece of music in audio stream |
CN105975568A (en) * | 2016-04-29 | 2016-09-28 | 腾讯科技(深圳)有限公司 | Audio processing method and apparatus |
CN106055615A (en) * | 2016-05-25 | 2016-10-26 | 广州酷狗计算机科技有限公司 | Method, device and system for obtaining music information |
US20170024441A1 (en) * | 2015-03-10 | 2017-01-26 | Compact Disc Express, Inc. | Systems and methods for continuously detecting and identifying songs in a continuous audio stream |
KR20170027649A (en) * | 2015-09-02 | 2017-03-10 | 레이 왕 | Method and apparatus for synchronous putting of real-time mobile advertisement based on audio fingerprint |
CN106802960A (en) * | 2017-01-19 | 2017-06-06 | 湖南大学 | A kind of burst audio search method based on audio-frequency fingerprint |
-
2017
- 2017-12-29 CN CN201711486757.6A patent/CN108198573B/en active Active
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1403783A2 (en) * | 2002-09-24 | 2004-03-31 | Matsushita Electric Industrial Co., Ltd. | Audio signal feature extraction |
US20090307273A1 (en) * | 2008-06-06 | 2009-12-10 | Tecsys Development, Inc. | Using Metadata Analysis for Monitoring, Alerting, and Remediation |
CN102308295A (en) * | 2009-02-10 | 2012-01-04 | 索尼爱立信移动通讯有限公司 | Music profiling |
CN103403710A (en) * | 2011-02-10 | 2013-11-20 | 雅虎公司 | Extraction and matching of characteristic fingerprints from audio signals |
US20150254342A1 (en) * | 2011-05-30 | 2015-09-10 | Lei Yu | Video dna (vdna) method and system for multi-dimensional content matching |
CN102314875A (en) * | 2011-08-01 | 2012-01-11 | 北京百度网讯科技有限公司 | Audio file identification method and device |
US8805560B1 (en) * | 2011-10-18 | 2014-08-12 | Google Inc. | Noise based interest point density pruning |
CN103455514A (en) * | 2012-06-01 | 2013-12-18 | 腾讯科技(深圳)有限公司 | Updating method and updating device for audio file |
CN103778174A (en) * | 2012-10-19 | 2014-05-07 | 索尼公司 | Apparatus and method for scene change detection-based trigger for audio fingerprinting analysis |
CN103021440A (en) * | 2012-11-22 | 2013-04-03 | 腾讯科技(深圳)有限公司 | Method and system for tracking audio streaming media |
CN104077336A (en) * | 2013-05-09 | 2014-10-01 | 腾讯科技(深圳)有限公司 | Method and device for dragging audio file to retrieve audio file information |
CN104184697A (en) * | 2013-05-20 | 2014-12-03 | 百度在线网络技术(北京)有限公司 | Audio fingerprint extraction method and system thereof |
CN105874732A (en) * | 2014-01-07 | 2016-08-17 | 高通股份有限公司 | Method and device for identifying a piece of music in audio stream |
CN104317967A (en) * | 2014-11-17 | 2015-01-28 | 北京航空航天大学 | Two-layer advertisement audio retrieval method based on audio fingerprints |
US20170024441A1 (en) * | 2015-03-10 | 2017-01-26 | Compact Disc Express, Inc. | Systems and methods for continuously detecting and identifying songs in a continuous audio stream |
CN104915403A (en) * | 2015-06-01 | 2015-09-16 | 腾讯科技(北京)有限公司 | Information processing method and server |
CN105138541A (en) * | 2015-07-08 | 2015-12-09 | 腾讯科技(深圳)有限公司 | Audio fingerprint matching query method and device |
KR20170027649A (en) * | 2015-09-02 | 2017-03-10 | 레이 왕 | Method and apparatus for synchronous putting of real-time mobile advertisement based on audio fingerprint |
CN105975568A (en) * | 2016-04-29 | 2016-09-28 | 腾讯科技(深圳)有限公司 | Audio processing method and apparatus |
CN106055615A (en) * | 2016-05-25 | 2016-10-26 | 广州酷狗计算机科技有限公司 | Method, device and system for obtaining music information |
CN106802960A (en) * | 2017-01-19 | 2017-06-06 | 湖南大学 | A kind of burst audio search method based on audio-frequency fingerprint |
Non-Patent Citations (5)
Title |
---|
HAITSMA J, KALKER T.2: "A highly robust audio fingerprinting system", 《ISMIR》 * |
KIM S, UNAL E, NARAYANAN S.: "Music fingerprint extraction for classical music cover song identification", 《2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO》 * |
聂华: "基于音频指纹的广告检测技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
郭永帅: "基于音频指纹和版本识别的音乐检索技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
韩威: "压缩域音频指纹及其鲁棒性研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723235A (en) * | 2019-03-19 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | Music content identification method, device and equipment |
CN111723235B (en) * | 2019-03-19 | 2023-09-26 | 百度在线网络技术(北京)有限公司 | Music content identification method, device and equipment |
CN112380382A (en) * | 2020-11-23 | 2021-02-19 | 北京达佳互联信息技术有限公司 | Audio classification method and device and storage medium |
CN112380382B (en) * | 2020-11-23 | 2024-03-12 | 北京达佳互联信息技术有限公司 | Audio classification method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108198573B (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lee et al. | Music similarity-based approach to generating dance motion sequence | |
Yang et al. | Toward multi-modal music emotion classification | |
US20120054238A1 (en) | Music search apparatus and method using emotion model | |
CN105956053B (en) | A kind of searching method and device based on the network information | |
JP2009123124A (en) | Musical composition search system, method and program | |
CN104598502A (en) | Method, device and system for obtaining background music information in played video | |
CN109493879B (en) | Music rhythm analysis and extraction method and device | |
CN105335414A (en) | Music recommendation method, device and terminal | |
CN109558500A (en) | Multimedia sequence generation method, medium, device and calculating equipment | |
Wang et al. | Towards time-varying music auto-tagging based on cal500 expansion | |
CN110047515A (en) | A kind of audio identification methods, device, equipment and storage medium | |
CN109117622B (en) | Identity authentication method based on audio fingerprints | |
CN105679324A (en) | Voiceprint identification similarity scoring method and apparatus | |
CN107679196A (en) | A kind of multimedia recognition methods, electronic equipment and storage medium | |
Rocha et al. | Segmentation and timbre-and rhythm-similarity in Electronic Dance Music | |
CN108198573A (en) | Audio identification methods and device, storage medium and electronic equipment | |
CN109756628A (en) | Method and device for playing function key sound effect and electronic equipment | |
Grekow | Audio features dedicated to the detection of four basic emotions | |
CN104978380B (en) | A kind of audio-frequency processing method and device | |
CN109802987B (en) | Content push method for display device, push device and display equipment | |
CN108777804B (en) | Media playing method and device | |
CN103870589B (en) | A kind of voice data switching method and electronic equipment | |
CN105225664A (en) | The generation method and apparatus of Information Authentication method and apparatus and sample sound | |
CN112270929B (en) | Song identification method and device | |
Bargaje | Emotion recognition and emotion based classification of audio using genetic algorithm-an optimized approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |