CN108198573A

CN108198573A - Audio identification methods and device, storage medium and electronic equipment

Info

Publication number: CN108198573A
Application number: CN201711486757.6A
Authority: CN
Inventors: 黄瑛; 胡明清; 王涛; 杨琛
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2018-06-22
Anticipated expiration: 2037-12-29
Also published as: CN108198573B

Abstract

The present invention provides a kind of audio identification methods, including：According to preset first selection rule, target audio sample is chosen in audio to be identified；Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample；The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library；When non-successful match, in the audio to be identified, new target audio sample is chosen again, until successful match, to realize the identification to audio to be identified.In recognition methods provided by the invention, a section audio is chosen first as target audio sample, audio-frequency fingerprint is extracted from the target audio sample, it is matched with the fingerprint in the audio-frequency fingerprint library pre-established, in non-successful match, a section audio is selected, until completing the identification to the audio to be identified, to improve the discrimination to audio to be identified as new target audio sample again in audio to be identified.

Description

Audio identification methods and device, storage medium and electronic equipment

Technical field

The present invention relates to audio frequency identification technique field, more particularly to a kind of audio identification methods and device, storage medium and Electronic equipment.

Background technology

In recent years, the application of audio image documentation equipment more and more occurs in people’s lives.People broadcast using music It puts that equipment listens to music or image application equipment is changed during seeing film, needs to watch the lyrics or subtitle.Therefore, audio is known The application in every field of other technology is more and more extensive.

Generally using the technology of audio-frequency fingerprint identification, technical staff has found existing audio frequency identification technique by research and development, existing During some audio identifications, such as song recognition, same first song might have multiple and different versions, office between different editions Portion there are audio it is inconsistent the phenomenon that.It such as can there are the sound that the cheer of mass viewer audiences or people are spoken in live version audio Sound, therefore the fingerprint frequency range extracted in identification process can reduce the discrimination of audio if there is the sound other than music.

Invention content

The technical problems to be solved by the invention are to provide a kind of audio identification methods, in audio identification process using multiple The mode that acquisition audio sample is identified, to promote the discrimination of audio.

The present invention also provides a kind of speech recognizing device, to ensure the realization and application of the above method in practice.

A kind of audio identification methods, including：

According to preset first selection rule, target audio sample is chosen in audio to be identified；

Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample；

The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library；The built in advance Vertical audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction is close Degree is less than second audio-frequency fingerprint and extracts density；

When non-successful match, in the audio to be identified, choose new target audio sample again, until matching into Work(, to realize the identification to audio to be identified.

Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified This, including：

Determine first time length；

The audio of a length of first time length is as target audio sample when being randomly selected in the audio to be identified This.

Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified Originally include：

Audio power detection is carried out to each audio fragment in the audio to be identified；

In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen as mesh Mark audio sample.

Above-mentioned method, optionally, multiple audio-frequency fingerprints by extraction and the finger in pre-established audio-frequency fingerprint library Line carries out matching and includes：

Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction；

By the fingerprint progress in the multiple audio-frequency fingerprint Jing Guo down-sampled processing, with pre-established audio-frequency fingerprint library Match.

Above-mentioned method, it is optionally, described in audio to be identified, new target audio sample is chosen again to be included：

According to preset second selection rule, new target audio sample, institute are chosen again in the audio to be identified The second selection rule is stated different from first selection rule.

Above-mentioned method optionally, further includes：

When successful match, the corresponding audio-frequency information of the audio to be identified is obtained, and the audio-frequency information is fed back to Client.

A kind of speech recognizing device, including：

First selection unit, for according to preset first selection rule, target audio sample to be chosen in audio to be identified This；

For pressing the first audio-frequency fingerprint extraction density, multiple audios are extracted in the target audio sample for extraction unit Fingerprint；

Matching unit, for the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library to be carried out Matching；The pre-established audio-frequency fingerprint library by the second audio-frequency fingerprint extraction density carry out audio-frequency fingerprint extraction, described first Audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density；

Second selection unit, for when non-successful match, in the audio to be identified, choosing new target sound again Frequency sample, until successful match, to realize the identification to audio to be identified.

Above-mentioned device optionally, further includes：

Feedback unit, for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and by the sound Frequency information feeds back to client.

A kind of storage medium, the storage medium include the program of storage, wherein, when described program is run described in control Equipment where storage medium performs above-mentioned audio identification methods.

A kind of electronic equipment, including memory and one either one of them or one of more than one program with Upper program is stored in memory, and is configured to perform above-mentioned video identification side by one or more than one processor Method.

Compared with prior art, the present invention includes advantages below：

The present invention provides a kind of audio identification methods, preset first selection rule are first depending on, in audio to be identified Middle selection target audio sample；And density is extracted by the first audio-frequency fingerprint, extract multiple audios in the target audio sample Fingerprint；The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library；When not matching into During work(, in the audio to be identified, new target audio sample is chosen again, audio is extracted in new target audio sample Fingerprint continues to match, until the Audio Matching success to be identified.In audio identification methods provided by the invention, using point The method of section extraction audio sample, in the case where the last period audio sample fails identification, extracts new audio sample again Originally continue to identify, improve the discrimination of audio to be identified.

In addition, in audio identification methods provided by the invention, density is extracted by the first audio-frequency fingerprint in target audio sample Audio-frequency fingerprint is extracted, and the audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library.In the present invention, The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density when establishing, also, The first audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density.Using the fingerprint extraction side of differentiation Formula, when audio-frequency fingerprint is compared, you can to keep corresponding recognition rate, it is also ensured that the identification essence of audio-frequency fingerprint Degree, realizes the balance to recognition rate and precision.

Certainly, it implements any of the products of the present invention and does not necessarily require achieving all the advantages described above at the same time.

Description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, for For those of ordinary skill in the art, without having to pay creative labor, it can also be obtained according to these attached drawings His attached drawing.

Fig. 1 is a kind of method flow diagram of audio identification methods provided by the invention；

Fig. 2 is a kind of another method flow diagram of audio identification methods provided by the invention；

Fig. 3 is a kind of another method flow diagram of audio identification methods provided by the invention；

Fig. 4 is a kind of another method flow diagram of audio identification methods provided by the invention；

Fig. 5 is a kind of principle Organization Chart of audio identification methods provided by the invention；

Fig. 6 is a kind of structure diagram of speech recognizing device provided by the invention；

Fig. 7 is a kind of another structure diagram of speech recognizing device provided by the invention；

Fig. 8 is the structure diagram of a kind of electronic equipment provided by the invention.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment shall fall within the protection scope of the present invention.

The present invention can be used in numerous general or special purpose computing device environment or configuration.Such as：Personal computer, service Device computer, handheld device or portable device, laptop device, multi-processor device, including any of the above device or equipment Distributed computing environment etc..

The present invention provides a kind of audio identification methods, the executive agent of this method can be the processing in playback equipment Device, method flow diagram is as shown in Figure 1, including step：

S101：According to preset first selection rule, target audio sample is chosen in audio to be identified；

In the present invention, for audio to be identified, chosen in the audio to be identified according to preset first selection rule Target audio sample.First selection rule can be random rule, or the spy set by the audio to be identified Set pattern is then.

S102：Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample；

In the present invention, the audio-frequency fingerprint is the numerical characteristic in audio, and the numerical characteristic is only in the audio One without two feature that can characterize acoustic characteristic, and the audio-frequency fingerprint involved in the present invention i.e. will be in the target audio sample Numerical characteristic is extracted in the form of identifier.The extraction density of sound intermediate frequency fingerprint of the present invention refers to carry in a section audio The number of numerical characteristic taken.The numerical characteristic extracted in a section audio more multilist sign audio-frequency fingerprint extraction density is bigger.

S103：The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library；Institute State the extraction that pre-established audio-frequency fingerprint library is carried out audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, first audio-frequency fingerprint It extracts density and is less than second audio-frequency fingerprint extraction density；

In the present invention, audio-frequency fingerprint library is pre-established, when establishing the audio-frequency fingerprint library, is carried according to the second audio-frequency fingerprint Density extraction audio-frequency fingerprint is taken, and then forms the audio-frequency fingerprint library；In the present invention, density is extracted by the first audio-frequency fingerprint, Multiple audio-frequency fingerprints are extracted in the target audio sample；It will be in multiple audio-frequency fingerprints of extraction and pre-established audio-frequency fingerprint library Fingerprint matched.

S104：When non-successful match, in the audio to be identified, new target audio sample is chosen again, until Successful match, to realize the identification to audio to be identified.

In the present invention, when choosing new target audio sample again, the new target audio sample is chosen with last Target audio sample it is different, for the target audio sample chosen again when choosing, the selection rule of use can also be with upper one The selection rule that the target audio sample of secondary selection uses is different.

Audio fingerprint techniques refer to numerical characteristic unique in a section audio through specific algorithm with indications Form extract, for identifying the position of the sample sound of magnanimity or track and localization sample in database.The present invention Inventor finding that during audio identification, audio to be identified can cause due to different versions after numerous studies Recognition failures.Meanwhile in the existing knowledge method for distinguishing using audio-frequency fingerprint, using long audio fragment and big fingerprint density energy Discrimination is enough improved, but since bent library is larger, operation efficiency can be influenced, improves the index time according to short identification segment With small fingerprint density, speed can be fast, but can reduce discrimination again.

Therefore in above-mentioned audio identification methods provided by the invention, using the audio identification strategy of segmentation classification, first When secondary determining target audio sample is unidentified successful, new target audio sample is redefined, using stage extraction audio sample This method, in the case where the last period audio sample fails identification, extracts new audio sample and continues to know again Not, the discrimination of audio to be identified is improved.

In audio identification methods provided by the invention, Density extraction sound is extracted by the first audio-frequency fingerprint in target audio sample Frequency fingerprint, and the audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library.It is described pre- in the present invention The audio-frequency fingerprint library of foundation is carried out the extraction of audio-frequency fingerprint, also, described the by the second audio-frequency fingerprint extraction density when establishing One audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density.Using the fingerprint extraction mode of differentiation, in sound When frequency fingerprint is compared, you can to keep corresponding recognition rate, it is also ensured that the accuracy of identification of audio-frequency fingerprint realizes To the balance of recognition rate and precision.

In the present invention, a section audio can be randomly selected as target audio sample, the process of selection can be setting choosing Duration is taken, which can be first time length, and in audio to be identified, it is described first to randomly select a period of time length As target audio sample, fingerprint extraction is carried out to the target audio sample for the audio of time span.

In the present invention, a section audio can be randomly selected as target audio sample, method that can also be as shown in Figure 2 is selected Target audio sample is taken, including step：

S201：Audio power detection is carried out to each audio fragment in the audio to be identified；

S202：In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen As target audio sample.

In the present invention, when choosing target audio sample, the audio power of each audio fragment can be examined first It surveys, can be using higher segment as target audio sample by audio, the audio sample feature chosen in this way is apparent, and then is promoted and known Other efficiency.

With reference to figure 3, multiple audio-frequency fingerprints by extraction are matched with the fingerprint in pre-established audio-frequency fingerprint library Process include step：

S301：Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction；

S302：By the fingerprint in the multiple audio-frequency fingerprint Jing Guo down-sampled processing, with pre-established audio-frequency fingerprint library It is matched.

In the present invention, for multiple audio-frequency fingerprints of extraction, down-sampled processing can also be further carried out, from the more of extraction The apparent audio-frequency fingerprint of a further selected characteristic of audio-frequency fingerprint, is matched with the fingerprint in pre-established audio-frequency fingerprint library.

It, can be according to preset second selection rule, in the audio to be identified in recognition methods provided by the invention Again new target audio sample is chosen, second selection rule is different from first selection rule.Phase can also be used Same selection rule is chosen.

On the basis of Fig. 1, as shown in figure 4, the audio identification methods further include：S105：When successful match, obtain The corresponding audio-frequency information of the audio to be identified, and the audio-frequency information is fed back into client.

Fig. 5 shows the schematic diagram of sound intermediate frequency recognition methods of the present invention, in Figure 5, refers in offline establish in offline structure The line library stage establishes fingerprint base, and store singer's version album number of song for Qu Ku according to certain fingerprint extraction density A Etc. information；Wherein fingerprint corresponds to certain of a kind of audio frequency characteristics (such as spectrum information or sub-belt energy information) or audio frequency characteristics Kind transformation (such as Hash transformation).

The online recognition stage, for music segment L1 (such as preceding 30s) according to fingerprint extraction density B (B<A), sound is extracted Frequency fingerprint；Wherein fingerprint extraction density, it is possible to understand that the window into audio feature extraction phases moves size；It is small close that big window moves correspondence Degree, small window move corresponding big density；

Centainly down-sampled can also be carried out to fingerprint in fingerprint generation phase, only partial fingerprints are used for being retrieved； Player sends fingerprint segment to identification service interface, obtains recognition result.

If display song recognition mistake or recognition failures, by player to the segment L2 of music (L1 is different from L2, Such as the 30s since the 2nd minute) taking the fingerprint again is identified.

The situation of specific identification by stages, may be used random sampling, can also use certain critical segment selection strategy, Such as choose the higher segment of audio power or than more continuous segment.

It is corresponding with the audio identification methods described in Fig. 1, in the embodiment of the present invention, additionally provide a kind of audio identification dress Put, structure diagram as shown in fig. 6, including：

First chooses single 401, for according to preset first selection rule, target audio sample to be chosen in audio to be identified This；

For pressing the first audio-frequency fingerprint extraction density, multiple sounds are extracted in the target audio sample for extraction unit 402 Frequency fingerprint；

Matching unit 403, for by the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library It is matched；The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, described First audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density；

Second selection unit 404, for when non-successful match, in the audio to be identified, choosing new mesh again Audio sample is marked, until successful match, to realize the identification to audio to be identified.

In speech recognizing device provided by the invention, using the method for stage extraction audio sample, in previous section audio sample Originally it fails in the case of identification, extracts new audio sample again and continue to identify, improve the knowledge of audio to be identified Not rate.

On the basis of Fig. 6, as shown in fig. 7, first selection single 401 includes：

Detection sub-unit 405, for carrying out audio power detection to each audio fragment in the audio to be identified；

Subelement 406 is chosen, in each audio fragment of the audio power more than preset audio energy threshold, choosing One audio fragment is as target audio sample.

The matching unit 403 includes：

Subelement 407 is handled, for carrying out down-sampled processing to multiple audio-frequency fingerprints of the extraction；

Coupling subelement 408 for that will pass through the multiple audio-frequency fingerprint of down-sampled processing, refers to pre-established audio Fingerprint in line library is matched.

In the identification device, further include：

Feedback unit 409, for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and by institute It states audio-frequency information and feeds back to client.

The embodiment of the present invention additionally provides a kind of storage medium, and the storage medium includes the program of storage, wherein, in institute It states the equipment where the storage medium is controlled during program operation and performs above-mentioned audio identification methods, the method specifically includes：

Determine first time length；

Above-mentioned method optionally, further includes：

The embodiment of the present invention additionally provides a kind of electronic equipment, and structure diagram is as shown in figure 8, specifically include memory 501 and one either more than one program 502 one of them or more than one program 502 be stored in memory 501 In, and be configured to by one or more than one processor 503 perform the one or more programs 502 include use In the instruction for carrying out following operation：

It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference from other examples, and just to refer each other for identical similar part between each embodiment. For device class embodiment, since it is basicly similar to embodiment of the method, so description is fairly simple, related part is joined See the part explanation of embodiment of the method.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only include that A little elements, but also including other elements that are not explicitly listed or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except also there are other identical elements in the process, method, article or apparatus that includes the element.

For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit is realized can in the same or multiple software and or hardware during invention.

As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can It is realized by the mode of software plus required general hardware platform.Based on such understanding, technical scheme of the present invention essence On the part that the prior art contributes can be embodied in the form of software product in other words, the computer software product It can be stored in storage medium, such as ROM/RAM, magnetic disc, CD, be used including some instructions so that a computer equipment (can be personal computer, server either network equipment etc.) performs the certain of each embodiment of the present invention or embodiment Method described in part.

A kind of audio identification methods provided by the present invention and device are described in detail above, it is used herein Specific case is expounded the principle of the present invention and embodiment, and the explanation of above example is only intended to help to understand this The method and its core concept of invention；Meanwhile for those of ordinary skill in the art, thought according to the present invention, specific There will be changes in embodiment and application range, in conclusion the content of the present specification should not be construed as to the present invention's Limitation.

Claims

1. a kind of audio identification methods, which is characterized in that including：

The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library；It is described pre-established Audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction density is small Density is extracted in second audio-frequency fingerprint；

When non-successful match, in the audio to be identified, new target audio sample is chosen again, until successful match, To realize the identification to audio to be identified.

2. according to the method described in claim 1, it is characterized in that, described according to preset selection rule, in audio to be identified Middle selection target audio sample, including：

Determine first time length；

The audio of a length of first time length is as target audio sample when being randomly selected in the audio to be identified.

3. according to the method described in claim 1, it is characterized in that, described according to preset selection rule, in audio to be identified Middle selection target audio sample includes：

In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen as target sound Frequency sample.

4. according to the method described in claim 1, it is characterized in that, multiple audio-frequency fingerprints by extraction and pre-established sound Fingerprint in frequency fingerprint base carries out matching and includes：

By the multiple audio-frequency fingerprint Jing Guo down-sampled processing, matched with the fingerprint in pre-established audio-frequency fingerprint library.

5. according to the method described in claim 1, it is characterized in that, described in audio to be identified, new target is chosen again Audio sample includes：

According to preset second selection rule, new target audio sample is chosen again in the audio to be identified, described Two selection rules are different from first selection rule.

6. it according to the method described in claim 1, it is characterized in that, further includes：

When successful match, the corresponding audio-frequency information of the audio to be identified is obtained, and the audio-frequency information is fed back into client End.

7. a kind of speech recognizing device, which is characterized in that including：

First selection unit, for according to preset first selection rule, target audio sample to be chosen in audio to be identified；

For pressing the first audio-frequency fingerprint extraction density, multiple audio-frequency fingerprints are extracted in the target audio sample for extraction unit；

A matching unit, for the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library to be carried out Match；The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint, first sound by the second audio-frequency fingerprint extraction density Frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density；

Second selection unit, for when non-successful match, in the audio to be identified, choosing new target audio sample again This, until successful match, to realize the identification to audio to be identified.

8. device according to claim 7, which is characterized in that further include：

Feedback unit for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and the audio is believed Breath feeds back to client.

9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein, it is run in described program When the equipment where the storage medium is controlled to perform the audio identification methods as described in claim 1~6 any one.

10. a kind of electronic equipment, which is characterized in that including memory and one or more than one program, one of them Either more than one program is stored in memory and is configured to be wanted by one or more than one processor execution such as right Seek the video frequency identifying method described in 1~6 any one.