CN104123949B - card frame detection method and device - Google Patents

card frame detection method and device Download PDF

Info

Publication number
CN104123949B
CN104123949B CN201410036425.8A CN201410036425A CN104123949B CN 104123949 B CN104123949 B CN 104123949B CN 201410036425 A CN201410036425 A CN 201410036425A CN 104123949 B CN104123949 B CN 104123949B
Authority
CN
China
Prior art keywords
frame
section
frame section
card
quiet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410036425.8A
Other languages
Chinese (zh)
Other versions
CN104123949A (en
Inventor
邹连平
张文婷
何航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410036425.8A priority Critical patent/CN104123949B/en
Publication of CN104123949A publication Critical patent/CN104123949A/en
Application granted granted Critical
Publication of CN104123949B publication Critical patent/CN104123949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a kind of card frame detection method and device.Wherein, the method comprises: treat survey sound signal and carry out feature detection, obtain the eigenwert of each frame in sound signal to be measured; Search from each frame and mark eigenwert and occur abnormal frame section, wherein, the label information of frame section comprise following one of at least: the temporal information of the start frame of frame section and the frame length of frame section; Be whether the quiet section of frame section selecting to occur card frame from frame section according to frame section; There is the label information of the frame section of card frame.The invention solves prior art sound intermediate frequency card to pause the lower technical matters of accuracy detected, reach the erroneous judgement eliminated when test card pauses frame section, realize the technique effect detecting the frame section that audio communication system sound intermediate frequency card pauses accurately and efficiently.

Description

Card frame detection method and device
Technical field
The present invention relates to the communications field, in particular to a kind of card frame detection method and device.
Background technology
Along with the development of computing machine and multimedia communication technology, the application of audio frequency real-time Communication for Power in the networking telephone, Streaming Media, game VOIP, amusement audio/video are live is more and more wider.The complicacy of Internet situation, inevitably there is the factor impacts such as time delay/shake/packet loss, the existence of these factors can cause audio service not smooth, also namely there is audio card frame, but current industry is undertaken by objective evaluation standards such as ITU-TP.862PESQ and ITU-RBS.1387PEAQ for the assessment reaction of audio stream fluency on audio quality, and card frames such as audio frequency/not smooth are embodied in the scoring of overall sound quality assessment by it, like this, concept is not outstanding also fuzzyyer.For audio frequency not fluency special project assessment in, seldom have for the assessment for audio frequency fluency.
On the impact of time delay/shake factor, current industry has the comparative maturity solution based on dithering cache (JitterBuffer) alleviate and absorb time delay, but, this partly solves the problem of audio card only, the assessment of audio card also can based on this one deck to the time interval stored in the packet in dithering cache or to pause degree according to the card that the presence or absence of current data packet in dithering cache assesses audio frequency, but audio frequency arrives final audio frequency again through dithering cache process plays back, middle link may inevitably process audio frequency, as emptied/resetting dithering cache (JitterBuffer) data, zero setting related data packets, or abandon the operations such as high-energy audio pack, the audio frame that these intermediate treatment flow processs cause is lost, have a strong impact on the accuracy of card frame assessment.The card that may cause for packet loss times, the disposal route of current industry comparative maturity has and to copy based on forward direction/backward or the method for interframe interpolation overlap makes up audio frame and loses the card caused and pause, and the incoherence of mending audio frame itself that bag method repairs or front and back also likely can cause audio card, if also just audio card frame entirety is included into audio quality part to the assessment of audio quality based on the assessment of PESQ/PEAQ for this class card frame of benefit bag.
Namely the not fluency of audio frequency blocks time, it is an epochmaking index in audio service, the order of severity that card pauses will affect Consumer's Experience, therefore be necessary using the assessment of audio frequency fluency (card) as special quantification of targets out, the audio frequency total solution provide third party or to compare to assess the fluency of audio product good and bad with the fluency between competing product, promotes improvement that audio product fluency experiences and lifting.
Existing audio frequency fluency appraisal procedure divides subjective evaluation and objective evaluation method.
In appraisal procedure to audio frequency fluency, a set of standard passed judgment on based on code that developer can have oneself, whether interval rank time of arrival such as detecting adjacent packets of audio data at jitter-buffer processing layer exceeds predetermined threshold values (such as 200ms, 200ms*2,200ms*3,200ms*4 ... 200ms*10) determine whether that having caused a secondary card pauses.But for appraiser, audio frequency system under test (SUT) may be black box, be easy to the frame counting on Fei Kadun when test card is paused, thus the accuracy of the mode that above-mentioned test card is paused is lower.
Assessing more method for fluency is at present pass judgment on based on the subjective sense of hearing.Subjective evaluation needs to ask audience to carry out subjective feeling and compares, and human cost is high on the one hand; On the other hand audio card is paused, be easy to allow audient produce unhealthy emotion or to be sick of psychology, not only easily cause erroneous judgement but also the efficiency of appraiser can be made to have a greatly reduced quality.Existing objective evaluation technology is for audio frequency fluency---and the evaluation index of a card seriousness does not have separate amount and dissolves, just as a part for audio frequency total quality assessment, therefore the audio card frame number of times in the audio communication system unit interval and card frame duration can not specifically be reacted, this assessment for audio product fluency is coarse and method that is poor efficiency, be difficult to the reaction audio frequency slack order of severity, the checking being unfavorable for advancing audio product fluency to experience in time and improvement.
For above-mentioned problem, at present effective solution is not yet proposed.
Summary of the invention
Embodiments provide a kind of card frame detection method and device, to pause the lower technical matters of accuracy detected at least to solve prior art sound intermediate frequency card.
According to an aspect of the embodiment of the present invention, provide a kind of card frame detection method, comprising: treat survey sound signal and carry out feature detection, obtain the eigenwert of each frame in sound signal to be measured; Search from each frame and mark eigenwert and occur abnormal frame section, wherein, the label information of frame section comprise following one of at least: the temporal information of the start frame of frame section and the frame length of frame section; Be whether the quiet section of frame section selecting to occur card frame from frame section according to frame section; There is the label information of the frame section of card frame.
Alternatively, whether be quiet section according to frame section and select to occur that the frame section of card frame comprises from frame section: if frame section is quiet section, then judge whether the frame section belonging to quiet section meets the first card frame bar part; If the frame section belonging to quiet section does not meet the first card frame bar part, then there is the frame section of card frame in the frame Duan Buwei judging to belong to quiet section; If the frame section belonging to quiet section meets the first card frame bar part, then judge to belong to the frame section of quiet section for there is the frame section of card frame.
Alternatively, the whether satisfied first card frame bar part of frame section judging to belong to quiet section comprises: judge whether the frame number belonging to the frame section of quiet section is greater than the first predetermined threshold; If frame number is greater than the first predetermined threshold, then judge that the frame section belonging to quiet section meets the first card frame bar part; If frame number is less than or equal to the first predetermined threshold, then judge that the frame section belonging to quiet section does not meet the first card frame bar part.
Alternatively, the frame Duan Buwei judging to belong to quiet section occurs that the frame section of card frame comprises: detect the characteristic parameter of the frame section belonging to quiet section; Judge to belong to the naturally quiet condition in the whether satisfied first card frame bar part of frame section of quiet section according to testing result; If the frame section belonging to quiet section meets the quiet condition of nature, then judge that frame section does not meet the first card frame bar part.
Alternatively, after whether the frame section judging to belong to quiet section according to testing result meets the naturally quiet condition in the first card frame bar part, also comprise: if the frame section belonging to quiet section does not meet the quiet condition of nature, then judge to belong to the audio frequency hit condition in the whether satisfied first card frame bar part of frame section of quiet section; If the frame section belonging to quiet section meets audio frequency hit condition, then judge whether the frame number meeting the frame section of audio frequency hit condition is greater than the second predetermined threshold; If frame number is greater than the second predetermined threshold, then judge that the frame section meeting audio frequency hit condition meets the first card frame bar part; If frame number is less than or equal to the second predetermined threshold, then judge that the frame section meeting audio frequency hit condition does not meet the first card frame bar part.
Alternatively, after whether the frame section judging to belong to quiet section meets the audio frequency hit condition in the first card frame bar part, also comprise: if the frame section belonging to quiet section is discontented with footsteps hit condition frequently, then judge to belong to the sharp-pointed downslide/time domain truncation condition in the whether satisfied first card frame bar part of frame section of quiet section; If the frame section belonging to quiet section is discontented with toe sharp downslide/time domain truncation condition, then judge that the frame section of discontented toe sharp downslide/time domain truncation condition meets the first card frame bar part; If the frame section belonging to quiet section meets sharp-pointed downslide/time domain truncation condition, then judge whether the frame number of the frame section meeting sharp-pointed downslide/time domain truncation condition is greater than the 3rd predetermined threshold; If frame number is greater than the 3rd predetermined threshold, then judge that the frame section meeting sharp-pointed downslide/time domain truncation condition meets the first card frame bar part; If frame number is less than or equal to the 3rd predetermined threshold, then judge that the frame section meeting sharp-pointed downslide/time domain truncation condition does not meet the first card frame bar part.
Alternatively, whether be quiet section according to frame section to select to occur that the frame section of card frame comprises from frame section: if quiet section of frame Duan Buwei, then whether judgment frame section meets the second card frame bar part; If frame section does not meet the second card frame bar part, then judge that the frame section of card frame appears in frame Duan Buwei; If frame section meets the second card frame bar part, then judge that frame section is for occurring the frame section of card frame.
Alternatively, the whether satisfied second card frame bar part of judgment frame section comprises: whether judgment frame section meets the stress condition in the second card frame bar part; If frame section is discontented with lumping weight sound condition, then whether judgment frame section meets the magnetization/mechanical sound condition in the second card frame bar part; If frame section does not meet the magnetization/mechanical sound condition in the second card frame bar part, then judge that frame section does not meet the second card frame bar part.
Alternatively, if frame section meets stress condition or meets magnetization/mechanical sound condition, method also comprises: judge whether the frame number belonging to frame section is greater than the 4th predetermined threshold; If frame number is greater than the 4th predetermined threshold, then judge that belonging to frame section meets the second card frame bar part; If frame number is less than or equal to the 4th predetermined threshold, then judge that belonging to frame section does not meet the second card frame bar part.
Alternatively, search from each frame and mark eigenwert and occur that abnormal frame section comprises: at least one eigenwert of each in the multiple frame of the liaison in each frame is not all within the threshold range of correspondence, then the frame segment mark of continuous multiple frame composition is designated as eigenwert and occurs abnormal frame section, wherein, the threshold range of each correspondence in eigenwert is identical or different.
Alternatively, eigenwert comprise following one of at least: energy envelope value, frequency spectrum flow, spectral smoothing degree, spectrum deflection, spectrum kurtosis.
According to the another aspect of the embodiment of the present invention, additionally provide a kind of card frame pick-up unit, comprising: detecting unit, carrying out feature detection for treating survey sound signal, obtaining the eigenwert of each frame in sound signal to be measured; Search indexing unit, occur abnormal frame section for searching from each frame and marking eigenwert, wherein, the label information of frame section comprise following one of at least: the temporal information of the start frame of frame section and the frame length of frame section; Whether selection unit, for according to frame section being the quiet section of frame section selecting to occur card frame from frame section; Output unit, for exporting the label information of the frame section occurring card frame.
Alternatively, selection unit comprises: the first judge module, for when frame section is quiet section, judges whether the frame section belonging to quiet section meets the first card frame bar part; When the frame section judging to belong to quiet section does not meet the first card frame bar part, there is the frame section of card frame in the frame Duan Buwei judging to belong to quiet section; Meeting the first card frame bar part in the frame section judging to belong to quiet section, judging to belong to the frame section of quiet section for there is the frame section of card frame.
Alternatively, the first judge module comprises: first judges submodule, for judging whether the frame number of the frame section belonging to quiet section is greater than the first predetermined threshold; When frame number is greater than the first predetermined threshold, judge that the frame section belonging to quiet section meets the first card frame bar part; When frame number is less than or equal to the first predetermined threshold, judge that the frame section belonging to quiet section does not meet the first card frame bar part.
Alternatively, the first judge module comprises: detection sub-module, for detecting the characteristic parameter of the frame section belonging to quiet section; Second judges submodule, and whether the frame section for judging to belong to quiet section according to the testing result of detection module meets the naturally quiet condition in the first card frame bar part; When the frame section belonging to quiet section meets the quiet condition of nature, judge that frame section does not meet the first card frame bar part.
Alternatively, the first judge module comprises: the 3rd judges submodule, and during for not meeting the quiet condition of nature in the frame section belonging to quiet section, whether the frame section judging to belong to quiet section meets the audio frequency hit condition in the first card frame bar part; 4th judges submodule, during for meeting audio frequency hit condition in the frame section belonging to quiet section, judges whether the frame number meeting the frame section of audio frequency hit condition is greater than the second predetermined threshold; When frame number is greater than the second predetermined threshold, judge that the frame section meeting audio frequency hit condition meets the first card frame bar part; When frame number is less than or equal to the second predetermined threshold, judge that the frame section meeting audio frequency hit condition does not meet the first card frame bar part.
Alternatively, the first judge module comprises: the 5th judges submodule, during for being discontented with footsteps frequency hit condition in the frame section belonging to quiet section, judges to belong to the sharp-pointed downslide/time domain truncation condition in the whether satisfied first card frame bar part of frame section of quiet section; When the frame section belonging to quiet section is discontented with toe sharp downslide/time domain truncation condition, judge that the frame section of discontented toe sharp downslide/time domain truncation condition meets the first card frame bar part; 6th judges submodule, during for meeting sharp-pointed downslide/time domain truncation condition in the frame section belonging to quiet section, judges whether the frame number of the frame section meeting sharp-pointed downslide/time domain truncation condition is greater than the 3rd predetermined threshold; When frame number is greater than the 3rd predetermined threshold, judge that the frame section meeting sharp-pointed downslide/time domain truncation condition meets the first card frame bar part; When frame number is less than or equal to the 3rd predetermined threshold, judge that the frame section meeting sharp-pointed downslide/time domain truncation condition does not meet the first card frame bar part.
Alternatively, selection unit comprises: the second judge module, and for when quiet section of frame Duan Buwei, whether judgment frame section meets the second card frame bar part; When frame section does not meet the second card frame bar part, judge that the frame section of card frame appears in frame Duan Buwei; When frame section meets the second card frame bar part, then judge that frame section is for occurring the frame section of card frame.
Alternatively, the second judge module comprises: the 7th judges submodule, whether meets the stress condition in the second card frame bar part for judgment frame section; 8th judges submodule, and during for being discontented with lumping weight sound condition in frame section, whether judgment frame section meets the magnetization/mechanical sound condition in the second card frame bar part; When not meeting the magnetization/mechanical sound condition in the second card frame bar part in frame section, judge that frame section does not meet the second card frame bar part.
Alternatively, the second judge module comprises: the 9th judges submodule, during for meeting stress condition in frame section or meeting magnetization/mechanical sound condition, judges whether the frame number belonging to frame section is greater than the 4th predetermined threshold; When frame number is greater than the 4th predetermined threshold, judge that belonging to frame section meets the second card frame bar part; When frame number is less than or equal to the 4th predetermined threshold, judge that belonging to frame section does not meet the second card frame bar part.
Alternatively in, search indexing unit to comprise: mark module, for at least one eigenwert of each in the multiple frame of liaison in each frame all not within the threshold range of correspondence, the frame segment mark of continuous multiple frame composition is designated as eigenwert and occurs abnormal frame section, wherein, the threshold range of each correspondence in eigenwert is identical or different.
Alternatively, eigenwert comprise following one of at least: energy envelope value, frequency spectrum flow, spectral smoothing degree, spectrum deflection, spectrum kurtosis.
In embodiments of the present invention, from occurring extracting the frame section occurring that card pauses abnormal frame section, and ignore other frame section, thus the erroneous judgement eliminated when test card pauses frame section, solve prior art sound intermediate frequency card to pause the lower technical matters of accuracy detected, achieve the technique effect detecting the frame section that audio communication system sound intermediate frequency card pauses accurately and efficiently.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, and form a application's part, schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the process flow diagram of a kind of optional card frame detection method according to the embodiment of the present invention;
Fig. 2 is the process flow diagram according to the optional card frame detection method of the another kind of the embodiment of the present invention;
Fig. 3 is the process flow diagram of another the optional card frame detection method according to the embodiment of the present invention;
Fig. 4 is the process flow diagram of another the optional card frame detection method according to the embodiment of the present invention;
Fig. 5 is the process flow diagram of another the optional card frame detection method according to the embodiment of the present invention;
Fig. 6 is the process flow diagram of another the optional card frame detection method according to the embodiment of the present invention;
Fig. 7 is the decision algorithm process flow diagram of quiet condition in a kind of optional card frame detection method according to the embodiment of the present invention;
Fig. 8 is the decision algorithm process flow diagram of a kind of optional card frame detection method sound intermediate frequency hit condition according to the embodiment of the present invention;
Fig. 9 is the decision algorithm process flow diagram of sharply downslide/time domain truncation condition in a kind of optional card frame detection method according to the embodiment of the present invention;
Figure 10 is the decision algorithm process flow diagram of stress condition in a kind of optional card frame detection method according to the embodiment of the present invention;
Figure 11 is the decision algorithm process flow diagram of magnetization/mechanical sound condition in a kind of optional card frame detection method according to the embodiment of the present invention;
Figure 12 is the schematic diagram of a kind of optional card frame pick-up unit according to the embodiment of the present invention;
Figure 13 is the schematic diagram according to the another kind of the embodiment of the present invention optional card frame pick-up unit;
Figure 14 is the schematic diagram of another the optional card frame pick-up unit according to the embodiment of the present invention; And
Figure 15 is the schematic diagram detecting Output rusults according to a kind of optional card frame of the embodiment of the present invention.
Embodiment
First, the part noun occurred in the process be described the embodiment of the present invention or term are applicable to description below:
The present invention program is understood better in order to make those skilled in the art person, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the embodiment of a part of the present invention, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, should belong to the scope of protection of the invention.
It should be noted that, term " first ", " second " etc. in instructions of the present invention and claims and above-mentioned accompanying drawing are for distinguishing similar object, and need not be used for describing specific order or precedence.Should be appreciated that the data used like this can be exchanged in the appropriate case, so as embodiments of the invention described herein can with except here diagram or describe those except order implement.In addition, term " comprises " and " having " and their any distortion, intention is to cover not exclusive comprising, such as, contain those steps or unit that the process of series of steps or unit, method, system, product or equipment is not necessarily limited to clearly list, but can comprise clearly do not list or for intrinsic other step of these processes, method, product or equipment or unit.
Embodiment 1
According to the embodiment of the present invention, provide a kind of card frame detection method, as shown in Figure 1, the method comprises:
S102, treats survey sound signal and carries out feature detection, obtain the eigenwert of each frame in sound signal to be measured;
Alternatively, the card frame detection method provided in the present embodiment can be, but not limited to be applied to audio system, and as shown in Figure 2, this tested audio system comprises local test originating end 202, remote test receiving end 204, test logic server (TestLogic Server) 206.Treat the output of surveying audio system and carry out audio sound-recording, and carry out signature analysis detection based on this audio content, obtain the eigenwert of each frame.Optionally, audio file can be (such as, comprise sampling rate, channel number with head form in the present embodiment, the information such as sample position bit number) audio file, the form of audio file can include but not limited to following one of at least: wav, wma, mp3.
Alternatively, in the present embodiment the eigenwert that time domain/time-frequency conversion/frequency domain character analysis obtains each frame corresponding domain is carried out to audio signal segment, wherein, the eigenwert of each frame include but not limited to following one of at least: energy envelope value, frequency spectrum flow, spectral smoothing degree, spectrum deflection, spectrum kurtosis.
Alternatively, if at least one in the above-mentioned eigenwert of present frame to be detected occurs abnormal, then judge that this present frame to be detected is that abnormal frame appears in eigenwert.
Such as, as shown in Figure 2, the recording testing process of this tested audio frequency comprises:
1) the card frame test App of local side card frame test App and far-end will sign in test logic server (TestLogic Server) 206 respectively, and keeps online;
2) local test originating end 202 configures the network delay/shake packet loss of this simulation, and opens corresponding delay/packet loss simulation, and notifies time delay/packet-dropping model that opposite end remote test receiving end 204 is current;
3) local test originating end 202 starts audio plays code book, and loop play is set, export codebook signal through tested audio system collection and wherethrough reason flow process be transferred to tested audio system far-end play export after, gather through remote test App, and to be with the forms such as the audio frequency wav/wma/mp3 of audio head form to save;
4) remote test receiving end 204 is within the time of setting, after having gathered the audio frequency sent through network delay/packet loss environmental simulation, the eigenwert of each frame of recording file sound intermediate frequency is carried out to the automated analysis of card frame.
S104, searches and marks the abnormal frame section of eigenwert appearance from each frame;
Alternatively, the label information of frame section comprise following one of at least: the temporal information of the start frame of frame section and the frame length of frame section.
Alternatively, after in the present embodiment the eigenwert of each frame of above-mentioned audio frequency to be measured being detected, eigenwert is occurred that abnormal frame section marks, and above-mentioned eigenwert is occurred that abnormal frame segment mark is designated as the first card frame section.
Alternatively, frame section is in the present embodiment the frame section that abnormal frame composition appears in continuous multiple eigenwert.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, obtain after detecting the eigenwert of each frame occurring that abnormal frame section is A, B, C, D, E, then to the temporal information of the start frame of each frame (such as, time be t) and the frame length of described frame section (such as, frame length is N) make marks.
Whether S106 is the quiet section of frame section selecting to occur card frame from frame section according to frame section;
Alternatively, judge that whether frame section in the signal segment of audio frequency is that the mode of quiet section includes but not limited in the present embodiment: carry out audio activity detection (VAD detects, Voice Activity Detection).
Alternatively, to occurring that abnormal frame section judges further, judge whether quiet section, and then therefrom select the frame section occurring card frame.
Alternatively, in the present embodiment, the frame section occurring card frame can be selected from the frame section being designated as the first card frame section, and the frame segment mark selected is designated as the second card frame section.
, there is the label information of the frame section of card frame in S108.
Alternatively, to occur that the label information of the frame section of card frame exports, such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, obtain after detecting the eigenwert of each frame occurring abnormal frame section (such as, the frame section of energy envelope value exception) be A, B, C, D, E, obtain through analyzing judgement further again, frame section C, D, E is real effective card frame, frame section A, B is erroneous judgement frame section, then by the frame section C of effective card frame, D, the temporal information of the start frame of E (such as, time be t) and the frame length of above-mentioned frame section (such as, frame length is N) information output.
As Figure 15, the label information of the frame section occurring card frame is shown in figure, wherein, 6 audio files (WavFile) have been shown in file " Summery_KaDuninfo ", occur the number (5Min_KaDunTImes) of the frame section of card frame in 5 minutes in each audio file, and there is total duration (5MinContinousKaSeconds) of card frame phenomenon in each audio file.For audio file " 6.wav ", there is the frame section that 7 occur card frame in 5 minutes, the total duration taken is 0.76s.
In addition, the card frame information that audio file " 6.wav " is concrete has been shown in Figure 15 file " 6_KaDuninfo ", such as, there is the sequence number (KaDunNo) of the frame section of card frame, each timestamp (KaPos [Min:Seconds]) occurring start frame in the frame section of card frame, totalframes (ContinousKaFrames (Frames/20ms)) in the frame section of appearance card frame (wherein, the duration of every frame is 20ms), and each duration (ContinousKaSeconds) occurring the frame section of card frame, take sequence number as the frame section of the appearance card frame of 1, the timestamp of start frame is 53.439999s, this frame section has 10 frames, total duration of 10 frames is 0.200000s.Also show audio file " 5.wav " in Figure 15, the card frame information that " 4.wav " is concrete, the application repeats no more this.
By the embodiment that the application provides, extract the eigenwert detecting sound signal, and will abnormal frame segment mark be there is out, the frame section occurring card frame is selected after judging further, then there is the label information of the frame section of card frame, and then realize detecting the frame section that audio communication system sound intermediate frequency card pauses accurately and efficiently.
As the optional scheme of one, as shown in Figure 3, whether be quiet section according to frame section to select to occur that the frame section of card frame comprises from frame section:
S302, if frame section is quiet section, then judges whether the frame section belonging to quiet section meets the first card frame bar part;
Alternatively, the first card frame bar part in the present embodiment include but not limited to following one of at least: card frame frame number, naturally quiet condition, audio frequency hit condition, sharp-pointed downslide/time domain truncation condition.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, judge to show that the frame section belonging to quiet section is A, B, C, D, E, then whether the card frame frame number of judgment frame section A, B, C, D, E meets preselected threshold condition (such as, frame number is greater than M).
S304, if the frame section belonging to quiet section does not meet the first card frame bar part, then there is the frame section of card frame in the frame Duan Buwei judging to belong to quiet section;
Such as, the card frame frame number belonging to frame section D, E in the frame section A of quiet section, B, C, D, E does not meet the first card frame bar part, and such as, frame number is less than or equal to M, then judging to draw frame section D, E not as there is the frame section of card frame, frame section D, E not being labeled as the second card frame section.
S306, if the frame section belonging to quiet section meets the first card frame bar part, then judges to belong to the frame section of quiet section for there is the frame section of card frame.
Such as, the card frame frame number belonging to frame section A, B, C in the frame section A of quiet section, B, C, D, E meets the first card frame bar part, and such as, frame number is greater than M, then judge to draw frame section A, B, C are the frame section occurring card frame, and frame section A, B, C are labeled as the second card frame section.
It should be noted that, the district because of people's ear distinguishes limited in one's ability, and the window of each frame windowing is in millisecond rank, and when the frame number of continuous-form card frame is too little, be difficult to experience extremely short audio region based on people's ear subjectivity, therefore, such card frame can be left in the basket and disregard.
By the embodiment that the application provides, the frame section belonging to quiet section is carried out to the judgement of refinement, judge whether satisfied first card frame bar part, and then accurately can draw the card frame frame section that can be identified in audio communication system.
As the optional scheme of one, as shown in Figure 4, the whether satisfied first card frame bar part of frame section judging to belong to quiet section comprises:
S402, judges whether the frame number belonging to the frame section of quiet section is greater than the first predetermined threshold;
Alternatively, the setting of the first predetermined threshold is relevant with the recognition capability of people's ear to the Caton phenomenon of audio frequency in the present embodiment, and this first predetermined threshold can be obtained by training or be determined according to product quality demanding class degree in actual assessment.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, and judge that the frame section belonging to quiet section is A, B, C, D, E, then whether the card frame frame number of judgment frame section A, B, C, D, E is greater than the first predetermined threshold, and such as, frame number is greater than M.
S404, if frame number is greater than the first predetermined threshold, then judges that the frame section belonging to quiet section meets the first card frame bar part;
Such as, the card frame frame number belonging to frame section A, B, C in the frame section A of quiet section, B, C, D, E is greater than the first predetermined threshold, and such as, frame number is greater than M, then judge to draw the frame section A, the satisfied first card frame bar part of B, C that belong to quiet section.
S406, if frame number is less than or equal to the first predetermined threshold, then judges that the frame section belonging to quiet section does not meet the first card frame bar part.
Such as, the card frame frame number belonging to frame section D, E in the frame section A of quiet section, B, C, D, E is less than or equal to the first predetermined threshold, and such as, frame number is less than or equal to M, then judge to show that frame section D, the E belonging to quiet section does not meet the first card frame bar part.
By the embodiment that the application provides, threshold value is arranged to the frame number of card frame frame section, can be used for selecting more accurately the card frame frame section in the audio system that people's ear can identify.
As the optional scheme of one, the frame Duan Buwei judging to belong to quiet section occurs that the frame section of card frame comprises:
S1, detects the characteristic parameter of the frame section belonging to quiet section;
Alternatively, characteristic parameter in the present embodiment include but not limited to following one of at least: length, energy, the average of current quiet section.
Such as, shown in composition graphs 5, treat after judging, belong to the present frame section of quiet section in survey sound signal length, energy and average and carry out detection of characteristic parameters.
Again such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, judge that the frame section belonging to quiet section is A, B, C, D, E, then frame section A, B, C, D, E of belonging to quiet section are carried out to the detection of characteristic parameter (such as, characteristic parameter is the length of present frame section, energy and average).
S2, judges to belong to the naturally quiet condition in the whether satisfied first card frame bar part of frame section of quiet section according to testing result;
Alternatively, the first card frame bar part in the present embodiment includes but not limited to: naturally quiet.Such as, be illustrated in figure 7 and judge that whether frame section in the signal segment of audio frequency is the decision algorithm process flow diagram of the quiet condition of nature, this figure only illustrates the decision algorithm flow process of the quiet condition of nature as an example, the application does not limit this.
Alternatively, judge whether the frame section belonging to quiet section meets the quiet condition of nature according to above-mentioned testing result.
It should be noted that, not all quiet section is all card frame, in voice-frequency telephony some exchange between quiet be natural pause, so naturally quiet be not occurred that audio card is paused, thus not as effective card frame (such as, the second card frame section).
S3, if the frame section belonging to quiet section meets the quiet condition of nature, then judges that frame section does not meet the first card frame bar part.
Such as, shown in composition graphs 5, the frame section belonging to quiet section is that in A, B, C, D, E, frame section E meets the naturally quiet condition in the first card frame bar part, and that is, the quiet of frame section E is normally quiet, then frame section E is not labeled as the second card frame section.
By the embodiment that the application provides, by judging whether to meet the quiet condition of nature to the frame section belonging to quiet section in sound signal, eliminating because of the naturally quiet situation being mistaken for card frame caused, thus obtaining the card frame in sound signal more effectively and accurately.
As the optional scheme of one, after whether the frame section judging to belong to quiet section according to testing result meets the naturally quiet condition in the first card frame bar part, also comprise:
S1, if the frame section belonging to quiet section does not meet the quiet condition of nature, then judges to belong to the audio frequency hit condition in the whether satisfied first card frame bar part of frame section of quiet section;
Alternatively, the first card frame bar part in the present embodiment includes but not limited to: audio frequency hit condition.Such as, be illustrated in figure 8 the decision algorithm process flow diagram whether the frame section judged in the signal segment of audio frequency meets audio frequency hit condition, this figure only illustrates the decision algorithm flow process of audio frequency hit as an example, the application does not limit this.
Such as, shown in composition graphs 5, belonging to the frame section not meeting the quiet condition of nature in the frame section A of quiet section, B, C, D, E is frame section A, B, C, D, judges the audio frequency hit condition of above-mentioned frame section A, the whether satisfied first card frame bar part of B, C, D.
It should be noted that, audio frequency hit is that sound hit causes, if tut does not have hit phenomenon, is not then likely effective card frame frame section (such as, the second card frame section) of audio system, is thus necessary to treat the judgement that acoustic carries out audio frequency hit condition frequently.
S2, if the frame section belonging to quiet section meets audio frequency hit condition, then judges whether the frame number meeting the frame section of audio frequency hit condition is greater than the second predetermined threshold;
Alternatively, the setting of the second predetermined threshold is in the present embodiment also relevant with the recognition capability of people's ear to the Caton phenomenon of audio frequency, and this second predetermined threshold can be obtained by training or be determined according to product quality demanding class degree in actual assessment.
Such as, shown in composition graphs 5, when sound signal comprises A, B, C, D, E, F, G, H eight frame sections, belonging to the frame section not meeting the quiet condition of nature in the frame section A of quiet section, B, C, D, E is frame section A, B, C, D, judge again to show that the frame section wherein meeting audio frequency hit condition is A, B, then whether the card frame frame number of judgment frame section A, B is greater than the second predetermined threshold (such as, frame number is P).
S3, if frame number is greater than the second predetermined threshold, then judges that the frame section meeting audio frequency hit condition meets the first card frame bar part; If frame number is less than or equal to the second predetermined threshold, then judge that the frame section meeting audio frequency hit condition does not meet the first card frame bar part.
Such as, the frame section A judging to belong to quiet section, the frame section not meeting the quiet condition of nature in B, C, D, E are frame section A, B, C, D, judge again to show that the frame section wherein meeting audio frequency hit condition is A, B, if through judging to learn that the frame number of frame section B is greater than the second predetermined threshold, such as, frame number is greater than P, then judge to draw that the frame section B meeting audio frequency hit condition meets the first card frame bar part, and frame section B is charged to the second card frame section.Such as, if through judging to learn that the frame number of frame section A is less than or equal to the second predetermined threshold, frame number is less than or equal to P, then judges to show that the frame section A meeting audio frequency hit condition does not meet the first card frame bar part, frame section A is not labeled as the second card frame section.
By the embodiment that the application provides, by determining whether audio frequency hit to the frame section belonging to quiet section in sound signal, judging whether frame number meets thresholding and arrange further, thus obtaining the card frame in sound signal more effectively and accurately.
As the optional scheme of one, after whether the frame section judging to belong to quiet section meets the audio frequency hit condition in the first card frame bar part, also comprise:
S1, if the frame section belonging to quiet section is discontented with footsteps hit condition frequently, then judges to belong to the sharp-pointed downslide/time domain truncation condition in the whether satisfied first card frame bar part of frame section of quiet section;
Alternatively, the first card frame bar part in the present embodiment includes but not limited to: sharp-pointed downslide/time domain truncation condition.Such as, be illustrated in figure 9 and judge whether the frame section in the signal segment of audio frequency meets the decision algorithm process flow diagram of sharp-pointed downslide/time domain truncation condition, as an example, this figure only illustrates that sound signal sharply glides/decision algorithm the flow process of time domain truncation, the application does not limit this.
Such as, shown in composition graphs 5, belonging to the frame section not meeting the quiet condition of nature in the frame section A of quiet section, B, C, D, E is frame section A, B, C, D, judge that the frame section of audio frequency hit condition that above-mentioned frame section A, B, C, D do not meet the first card frame bar part is C, D, then judge whether to meet sharp-pointed downslide/time domain truncation condition to frame section C, D.
It should be noted that, sharp-pointed downslide/time domain truncation is that time domain is blocked suddenly and caused, if above-mentioned frame section neither audio frequency hit neither sharply glide/time domain truncation cause quiet suddenly, then likely not effective card frame frame section of audio system is (such as, second card frame section), be thus necessary to treat acoustic frequently sharply to glide/the judgement of time domain truncation condition.
S2, if the frame section belonging to quiet section is discontented with toe sharp downslide/time domain truncation condition, then judges that the frame section of discontented toe sharp downslide/time domain truncation condition meets the first card frame bar part;
Such as, shown in composition graphs 5, frame section C, the D of the audio frequency hit condition do not met in the first card frame bar part are sharply glided/the judgement of time domain truncation condition, show that frame section D is discontented with toe sharp downslide/time domain truncation condition, then frame section D is not labeled as the second card frame section.
S3, if the frame section belonging to quiet section meets sharp-pointed downslide/time domain truncation condition, then judges whether the frame number of the frame section meeting sharp-pointed downslide/time domain truncation condition is greater than the 3rd predetermined threshold;
Alternatively, the setting of the 3rd predetermined threshold is in the present embodiment also relevant with the recognition capability of people's ear to the Caton phenomenon of audio frequency, and the 3rd predetermined threshold can be obtained by training or be determined according to product quality demanding class degree in actual assessment.
Such as, shown in composition graphs 5, frame section C, the D of the audio frequency hit condition do not met in the first card frame bar part are sharply glided/the judgement of time domain truncation condition, show that frame section C meets sharp-pointed downslide/time domain truncation condition, then whether the card frame frame number of judgment frame section C is greater than the 3rd predetermined threshold (such as, frame number is Q).
S4, if frame number is greater than the 3rd predetermined threshold, then judges that the frame section meeting sharp-pointed downslide/time domain truncation condition meets the first card frame bar part; If frame number is less than or equal to the 3rd predetermined threshold, then judge that the frame section meeting sharp-pointed downslide/time domain truncation condition does not meet the first card frame bar part.
Such as, shown in composition graphs 5, the card frame frame number of the frame section C meeting sharp-pointed downslide/time domain truncation condition is judged, if through judging to learn that the frame number of frame section C is greater than the 3rd predetermined threshold, such as, frame number is greater than Q, then judge to draw that the frame section C meeting sharp-pointed downslide/time domain truncation condition meets the first card frame bar part, then frame section C is labeled as the second card frame section; Such as, if through judging to learn that the frame number of frame section C is less than or equal to the 3rd predetermined threshold, frame number is less than or equal to Q, then judges to draw that the frame section C meeting sharp-pointed downslide/time domain truncation condition does not meet the first card frame bar part, then frame section C is not labeled as the second card frame section.
By the embodiment that the application provides, by determining whether sharp-pointed downslide/time domain truncation to the frame section belonging to quiet section in sound signal, judge whether frame number meets thresholding and arrange further, thus obtain the card frame in sound signal more effectively and accurately.
As the optional scheme of one, whether be quiet section according to frame section and select to occur that the frame section of card frame comprises from frame section:
S1, if quiet section of frame Duan Buwei, then whether judgment frame section meets the second card frame bar part;
Alternatively, shown in composition graphs 5, the second card frame bar part in the present embodiment includes but not limited to: the correlativity of audio frequency characteristics, periodically judgement.Such as, stress condition, magnetization/mechanical sound condition.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, and judge that the frame section not belonging to quiet section is F, G, H, then whether judgment frame section is F, G, H is stress.
S2, if frame section does not meet the second card frame bar part, then judges that the frame section of card frame appears in frame Duan Buwei;
Such as, shown in composition graphs 5, if do not belong to frame section G, H in the frame section F of quiet section, G, H not meet the second card frame bar part, such as, judge to draw frame section G, H not as stress, and magnetization/mechanical voice frequency composition does not exceed preset ratio, then judge to draw frame section G, H not as there is the frame section of card frame, then frame section G, H are not labeled as the second card frame section.
S3, if frame section meets the second card frame bar part, then judges that frame section is for occurring the frame section of card frame.
Such as, shown in composition graphs 5, if do not belong to frame section F in the frame section F of quiet section, G, H to meet the second card frame bar part, such as, judge to show that frame section F is stress, and card frame frame number meets the discernible condition of people's ear, then judge to show that frame section F is the frame section occurring card frame, then frame section F is labeled as the second card frame section.
By the embodiment that the application provides, by judging the frame section not belonging to quiet section, judging whether satisfied second card frame bar part, and then differentiation is made to the frame section of non-mute section, accurately draw the card frame frame section that can be identified in audio communication system.
As the optional scheme of one, whether judgment frame section meets the second card frame bar part comprises:
S1, whether judgment frame section meets the stress condition in the second card frame bar part;
Alternatively, the second card frame bar part includes but not limited in the present embodiment: stress condition, magnetization/mechanical sound condition.
Such as, shown in composition graphs 5, judge to draw do not belong to quiet section frame section F, after G, H, then judge whether above-mentioned frame section meets the stress condition in the second card frame bar part.Such as, as shown in Figure 10 for judge whether the frame section in the signal segment of audio frequency meets the decision algorithm process flow diagram of stress condition, this figure only illustrates the decision algorithm flow process of sound signal stress as an example, the application does not limit this.
S2, if frame section is discontented with lumping weight sound condition, then whether judgment frame section meets the magnetization/mechanical sound condition in the second card frame bar part;
Such as, shown in composition graphs 5, if judgement draws frame section G, H is discontented with lumping weight sound condition, then whether judgment frame section G, H meet the magnetization/mechanical sound condition in the second card frame bar part, that is, whether magnetization/mechanical voice frequency the composition of judgment frame section G, H exceeds preset ratio.Such as, as shown in figure 11 for judging whether the frame section in the signal segment of audio frequency meets the decision algorithm process flow diagram of magnetization/mechanical sound condition, and this figure only illustrates the decision algorithm flow process of magnetization/mechanical sound as an example, the application does not limit this.
S3, if frame section does not meet the magnetization/mechanical sound condition in the second card frame bar part, then judges that frame section does not meet the second card frame bar part.
It should be noted that, shown in composition graphs 5, the frame section not belonging to quiet section is discontented with lumping weight sound condition, judge again not meet magnetization/mechanical sound condition, then such frame section is not real card frame frame section effectively, but the frame section of erroneous judgement, thus not as effective card frame (such as, the second card frame section).
Such as, shown in composition graphs 5, the frame section H in stress condition frame section G, the H do not met in the second card frame bar part is drawn if judge, magnetization/mechanical sound the condition in the second card frame bar part is not met yet, that is, magnetization/mechanical voice frequency the composition of judgment frame section H does not exceed preset ratio, then judge that frame section H does not meet the second card frame bar part, then frame section H is not labeled as the second card frame section.
By the embodiment that the application provides, by carrying out distinguishing of refinement to the frame section not belonging to quiet section, judge whether the stress condition in satisfied second card frame bar part and magnetization/mechanical sound condition, and then differentiation is made to the frame section of non-mute section, accurately draw the card frame frame section that can be identified in audio communication system.
As the optional scheme of one, if frame section meets stress condition or meets magnetization/mechanical sound condition, method also comprises:
S1, judges whether the frame number belonging to frame section is greater than the 4th predetermined threshold;
Alternatively, the setting of the 4th predetermined threshold is in the present embodiment also relevant with the recognition capability of people's ear to the Caton phenomenon of audio frequency, and the 4th predetermined threshold can be obtained by training or be determined according to product quality demanding class degree in actual assessment.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, the frame section that judgment frame section meets stress condition or satisfied magnetization/mechanical sound condition is G, then whether the card frame frame number of judgment frame section G is greater than the 4th predetermined threshold (such as, the 4th predetermined threshold is S).
S2, if frame number is greater than the 4th predetermined threshold, then judges that belonging to frame section meets the second card frame bar part; If frame number is less than or equal to the 4th predetermined threshold, then judge that belonging to frame section does not meet the second card frame bar part.
Such as, if the card frame frame number of frame section G is greater than the 4th predetermined threshold, such as, frame number is greater than S, then judge that belonging to frame section G meets the second card frame bar part, then charge to the second card frame and break by frame section G; If the card frame frame number of frame section G is less than or equal to the 4th predetermined threshold, such as, frame number is less than or equal to S, then judge that belonging to frame section G does not meet the second card frame bar part, be not then labeled as the second card frame section by frame section G.
By the embodiment that the application provides, by to not belonging to quiet section in sound signal and meeting stress condition or meet the frame section of magnetization/mechanical sound condition, judge whether frame number meets thresholding and arrange, thus obtains the card frame in sound signal more effectively and accurately further.
As the optional scheme of one, search from each frame and mark eigenwert and occur that abnormal frame section comprises:
S602, if at least one eigenwert of each in the multiple frame of liaison in each frame is not all within the threshold range of correspondence, is then designated as eigenwert by the frame segment mark of continuous multiple frame composition and occurs abnormal frame section;
Alternatively, the threshold range of each correspondence in eigenwert is in the present embodiment identical or different.
Such as, when searching from each frame and mark the frame section of eigenwert appearance exception, be from the multiple frame of liaison, search each frame at least one eigenwert all not within the threshold range of correspondence, and the frame section marking above-mentioned continuous multiple frame composition is that abnormal frame section appears in eigenwert.
As the optional scheme of one, the eigenwert in the present embodiment comprise following one of at least: energy envelope value, frequency spectrum flow, spectral smoothing degree, spectrum deflection, spectrum kurtosis.
Alternatively, the Related Computational Methods of above-mentioned eigenwert can be expressed as follows in the present embodiment:
1) energy envelope value, for representing the change of audio frequency short-time energy, wherein, added window function comprise following one of at least: rectangular window, Hamming window, Hanning window, quarter window, Abbado Lay window.Wherein, the expression formula of the window function of rectangular window is as follows:
w ( n ) = 1,0 &le; n < N 0 - - - ( 1 )
Wherein, the kth frame after sound signal windowing is: X kn ()=w (n) * x (k*N+n), wherein N represents the audio sample number that the time window of each frame is corresponding.Kth frame signal X kn the average energy value of () represents with E (k), computing formula is as follows:
E ( k ) = 1 N &Sum; n = 1 N - 1 X k ( n ) * X k ( n ) - - - ( 2 )
Envelope is to audio power signal evolution and takes the logarithm after normalization, and be the one mark of audio frequency short-time energy change, the envelope of kth frame sound signal represents with Env (k), is shown below
Env ( k ) = 20 * log 10 ( 1 N &Sum; n = 1 N - 1 X k ( n ) * X k ( n ) / 32768 ) - - - ( 3 )
2) frequency spectrum flow: for embodying the behavioral characteristics of sound signal, can be drawn by the difference being mould with 2 after the vector normalization of adjacent two frames, formula specific as follows represents:
v SF ( n ) = &Sum; k = 0 &kappa; / 2 - 1 ( | X ( k , n ) | - | X ( k , n - 1 ) | ) 2 &kappa; / 2 . - - - ( 4 )
Wherein, 0≤v sFn ()≤A, A is default judgement spectrum amplitude threshold value, v sF (n)littlely show that adjacent sound signal is more steady, or another kind of situation is that input signal thresholding is low.V during sound signal non-stationary transition sudden change sF (n)the abnormal thresholding of very high arrival of punching.
3) spectral smoothing degree, the pause caused for tag card frame or sudden change, can be drawn by following computing formula:
v Tf ( n ) = &Pi; k = 0 &kappa; / 2 - 1 | X ( k , n ) | &kappa; / 2 2 / &kappa; &CenterDot; &Sum; k = 0 &kappa; / 2 - 1 | X ( k , n ) | = exp ( 2 / &kappa; &CenterDot; &Sum; k = 0 &kappa; / 2 - 1 log ( | X ( k , n ) | ) ) 2 / &kappa; &CenterDot; &Sum; k = 0 &kappa; / 2 - 1 | X ( k , n ) | . - - - ( 5 )
Wherein, smooth audio region, v tf (n)less, then signal is that the probability of tonal properties is larger; When the pause that card frame is caused or sudden change, v tf (n)that can rush is very high, forms spike and exceeds abnormal thresholding.
4) deflection is composed, for characterizing the symmetry that sound signal probability density function (PDF, Probability Density Function) distributes, can by 3 center, rank squares of audio signal statistics cube to draw divided by standard deviation, formula specific as follows represents:
v Sk ( n ) = 1 &sigma; x 3 ( n ) &CenterDot; &kappa; &Sigma; i = i s ( n ) i e ( n ) ( x ( i ) - &mu; x ( n ) ) 3 . - - - ( 6 )
Wherein, μ xn () is the average of a frame statistical signal, σ xn () is corresponding standard deviation.
5) compose kurtosis, for characterizing the non-Gaussian system of sound signal PDF distribution, compared with Gaussian distribution, it characterizes the flatness of input signal values, and can be drawn by the biquadratic of 4 center, rank squares of audio signal statistics divided by standard deviation, formula specific as follows represents:
v K ( n ) = 1 &sigma; x 4 ( n ) &CenterDot; I &Sum; i = i s ( n ) i e ( n ) ( x ( i ) - &mu; x ( n ) ) 4 - 3 . - - - ( 7 )
Wherein, μ xn () is the average of a frame statistical signal, σ xn () is corresponding standard deviation.
It should be noted that, ask for the spectrum deflection of each frame sound signal and spectrum kurtosis value, these two eigenwerts characterize the degree of sound signal distortion. for spectrum deflection and the audio frame of spectrum kurtosis lower than the decision threshold preset, according to Spectrum Distortion Measure, respective frame is listed in the audio frame of distortion.
Alternatively, before card frame detection method in the present embodiment, also comprise and carry out pre-service to sound signal, wherein, pre-service includes but not limited to: go direct current, normalized, channel separation.
Alternatively, above-mentioned preprocess method can comprise following process:
1) remove direct current, the dc component interference of characteristic frequency can be removed based on comb filter (Notch Filter trapper); If or length is that the audio section of t is desirable, also can by following account form:
x ( i ) = x DC ( i ) - 1 &tau; &Sum; i = 0 &tau; - 1 x DC ( i ) - - - ( 8 )
2) normalized, simple disposal route is:
x ( i ) x s ( i ) max ( | x s ( i ) | ) - - - ( 9 )
Wherein, i is from 0 to T, and wherein, T is the length of audio section.
Carry out gain-adjusted based on AGC dynamic gain control to audio section to regulate to adopt the different factor of dynamic increasing to realize normalization according to the change of microphone volume level in the present embodiment.
3) channel separation, by the voice data of multichannel is carried out channel separation, finally gets the voice data of a sound channel.Simple reduction audio mixing (DownMixing) treatment scheme can with reference to following computing formula:
x ( i ) = 1 c &Sum; c = 0 c - 1 x c ( i ) - - - ( 10 )
Wherein, C is the number of sound channel.
The treatment scheme reducing audio mixing (DownMixing) in the present embodiment adopts and based on the processing mode of sound channel weight can be:
x ( i ) = 1 C &Sum; c = 0 C w c ( i ) * x c ( i ) - - - ( 11 )
Wherein, w c (i)be the weight ratio of c sound channel, wherein C is sound channel number, w c (i)according to calculate each sound channel the average energy value and all passages average energy and than weighting weight.
It should be noted that, for aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the present invention is necessary.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that can add required general hardware platform by software according to the method for above-described embodiment and realize, hardware can certainly be passed through, but in a lot of situation, the former is better embodiment.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product is stored in a storage medium (as ROM/RAM, magnetic disc, CD), comprising some instructions in order to make a station terminal equipment (can be mobile phone, computing machine, server, or the network equipment etc.) perform method described in each embodiment of the present invention.
Embodiment 2
According to the embodiment of the present invention, additionally provide a kind of card frame pick-up unit, as shown in figure 12, this device comprises:
1) detecting unit 1202, carrying out feature detection for treating survey sound signal, obtaining the eigenwert of each frame in sound signal to be measured;
Alternatively, the card frame detection method provided in the present embodiment can be, but not limited to be applied to audio system, and as shown in Figure 2, this tested audio system comprises local test originating end 202, remote test receiving end 204, test logic server (TestLogic Server) 206.Treat the output of surveying audio system and carry out audio sound-recording, and carry out signature analysis detection based on this audio content, obtain the eigenwert of each frame.Optionally, audio file can be (such as, comprise sampling rate, channel number with head form in the present embodiment, the information such as sample position bit number) audio file, the form of audio file can include but not limited to following one of at least: wav, wma, mp3.
Alternatively, in the present embodiment the eigenwert that time domain/time-frequency conversion/frequency domain character analysis obtains each frame corresponding domain is carried out to audio signal segment, wherein, the eigenwert of each frame include but not limited to following one of at least: energy envelope value, frequency spectrum flow, spectral smoothing degree, spectrum deflection, spectrum kurtosis.
Alternatively, if at least one in the above-mentioned eigenwert of present frame to be detected occurs abnormal, then judge that this present frame to be detected is that abnormal frame appears in eigenwert.
Such as, as shown in Figure 2, the recording testing process of this tested audio frequency comprises:
1) the card frame test App of local side card frame test App and far-end will sign in test logic server (TestLogicServer) 206 respectively, and keeps online;
2) local test originating end 202 configures the network delay/shake packet loss of this simulation, and opens corresponding delay/packet loss simulation, and notifies time delay/packet-dropping model that opposite end remote test receiving end 204 is current;
3) local test originating end 202 starts audio plays code book, and loop play is set, export codebook signal through tested audio system collection and wherethrough reason flow process be transferred to tested audio system far-end play export after, gather through remote test App, and to be with the forms such as the audio frequency wav/wma/mp3 of audio head form to save;
4) remote test receiving end 204 is within the time of setting, after having gathered the audio frequency sent through network delay/packet loss environmental simulation, the eigenwert of each frame of recording file sound intermediate frequency is carried out to the automated analysis of card frame.
2) search indexing unit 1204, occur abnormal frame section for searching from each frame and marking eigenwert, wherein, the label information of frame section comprise following one of at least: the temporal information of the start frame of frame section and the frame length of frame section;
Alternatively, after in the present embodiment the eigenwert of each frame of above-mentioned audio frequency to be measured being detected, eigenwert is occurred that abnormal frame section marks, and above-mentioned eigenwert is occurred that abnormal frame segment mark is designated as the first card frame section.
Alternatively, frame section is in the present embodiment the frame section that abnormal frame composition appears in continuous multiple eigenwert.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, obtain after detecting the eigenwert of each frame occurring that abnormal frame section is A, B, C, D, E, then to the temporal information of the start frame of each frame (such as, time be t) and the frame length of described frame section (such as, frame length is N) make marks.
3) whether selection unit 1206, for according to frame section being the quiet section of frame section selecting to occur card frame from frame section;
Alternatively, judge that whether frame section in the signal segment of audio frequency is that the mode of quiet section includes but not limited in the present embodiment: carry out audio activity detection (VAD detects, Voice Activity Detection).
Alternatively, to occurring that abnormal frame section judges further, judge whether quiet section, and then therefrom select the frame section occurring card frame.
Alternatively, in the present embodiment, the frame section occurring card frame can be selected from the frame section being designated as the first card frame section, and the frame segment mark selected is designated as the second card frame section.
4) output unit 1208, for exporting the label information of the frame section occurring card frame.
Alternatively, to occur that the label information of the frame section of card frame exports, such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, obtain after detecting the eigenwert of each frame occurring abnormal frame section (such as, the frame section of energy envelope value exception) be A, B, C, D, E, obtain through analyzing judgement further again, frame section C, D, E is real effective card frame, frame section A, B is erroneous judgement frame section, then by the frame section C of effective card frame, D, the temporal information of the start frame of E (such as, time be t) and the frame length of above-mentioned frame section (such as, frame length is N) information output.
As Figure 15, the label information of the frame section occurring card frame is shown in figure, wherein, 6 audio files (WavFile) have been shown in file " Summery_KaDuninfo ", occur the number (5Min_KaDunTImes) of the frame section of card frame in 5 minutes in each audio file, and there is total duration (5MinContinousKaSeconds) of card frame phenomenon in each audio file.For audio file " 6.wav ", there is the frame section that 7 occur card frame in 5 minutes, the total duration taken is 0.76s.
In addition, the card frame information that audio file " 6.wav " is concrete has been shown in Figure 15 file " 6_KaDuninfo ", such as, there is the sequence number (KaDunNo) of the frame section of card frame, each timestamp (KaPos [Min:Seconds]) occurring start frame in the frame section of card frame, totalframes (ContinousKaFrames (Frames/20ms)) in the frame section of appearance card frame (wherein, the duration of every frame is 20ms), and each duration (ContinousKaSeconds) occurring the frame section of card frame, take sequence number as the frame section of the appearance card frame of 1, the timestamp of start frame is 53.439999s, this frame section has 10 frames, total duration of 10 frames is 0.200000s.Also show audio file " 5.wav " in Figure 15, the card frame information that " 4.wav " is concrete, the application repeats no more this.
By the embodiment that the application provides, extract the eigenwert detecting sound signal, and will abnormal frame segment mark be there is out, effective card frame is selected after judging further, then the label information of output card frame frame section, and then realize detecting the frame section that audio communication system sound intermediate frequency card pauses accurately and efficiently.
As the optional scheme of one, as shown in figure 13, selection unit 1206 comprises:
1) the first judge module 1302, for when frame section is quiet section, judges whether the frame section belonging to quiet section meets the first card frame bar part; When the frame section judging to belong to quiet section does not meet the first card frame bar part, there is the frame section of card frame in the frame Duan Buwei judging to belong to quiet section; Meeting the first card frame bar part in the frame section judging to belong to quiet section, judging to belong to the frame section of quiet section for there is the frame section of card frame.
Alternatively, the first card frame bar part in the present embodiment include but not limited to following one of at least: card frame frame number, naturally quiet condition, audio frequency hit condition, sharp-pointed downslide/time domain truncation condition.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, judge to show that the frame section belonging to quiet section is A, B, C, D, E, then whether the card frame frame number of judgment frame section A, B, C, D, E meets preselected threshold condition (such as, frame number is greater than M).
1) if belong to the frame section A of quiet section, the card frame frame number of frame section D, E does not meet the first card frame bar part in B, C, D, E, such as, frame number is less than or equal to M, then judge to draw frame section D, E not as there is the frame section of card frame, frame section D, E is not labeled as the second card frame section.
2) if belong to the frame section A of quiet section, the card frame frame number of frame section A, B, C meets the first card frame bar part in B, C, D, E, such as, frame number is greater than M, then judge to draw frame section A, B, C are the frame section occurring card frame, and frame section A, B, C are labeled as the second card frame section.
It should be noted that, the district because of people's ear distinguishes limited in one's ability, and the window of each frame windowing is in millisecond rank, and when the frame number of continuous-form card frame is too little, be difficult to experience extremely short audio region based on people's ear subjectivity, therefore, such card frame can be left in the basket and disregard.
By the embodiment that the application provides, the frame section belonging to quiet section is carried out to the judgement of refinement, judge whether satisfied first card frame bar part, and then accurately can draw the card frame frame section that can be identified in audio communication system.
As the optional scheme of one, the first judge module 1302 comprises:
1) first judges submodule, for judging whether the frame number of the frame section belonging to quiet section is greater than the first predetermined threshold; When frame number is greater than the first predetermined threshold, judge that the frame section belonging to quiet section meets the first card frame bar part; When frame number is less than or equal to the first predetermined threshold, judge that the frame section belonging to quiet section does not meet the first card frame bar part.
Alternatively, the setting of the first predetermined threshold is relevant with the recognition capability of people's ear to the Caton phenomenon of audio frequency in the present embodiment, and this first predetermined threshold can be obtained by training or be determined according to product quality demanding class degree in actual assessment.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, and judge that the frame section belonging to quiet section is A, B, C, D, E, then whether the card frame frame number of judgment frame section A, B, C, D, E is greater than the first predetermined threshold, and such as, frame number is greater than M.
1) if belong to the frame section A of quiet section, the card frame frame number of frame section A, B, C is greater than the first predetermined threshold in B, C, D, E, such as, frame number is greater than M, then judge to draw that frame section A, B, the C belonging to quiet section meets the first card frame bar part.
2) if belong to the frame section A of quiet section, the card frame frame number of frame section D, E is less than or equal to the first predetermined threshold in B, C, D, E, such as, frame number is less than or equal to M, then judge to show that frame section D, the E belonging to quiet section does not meet the first card frame bar part.
By the embodiment that the application provides, threshold value is arranged to the frame number of card frame frame section, can be used for selecting more accurately the card frame frame section in the audio system that people's ear can identify.
As the optional scheme of one, the first judge module 1302 comprises:
1) detection sub-module, for detecting the characteristic parameter of the frame section belonging to quiet section;
Alternatively, characteristic parameter in the present embodiment include but not limited to following one of at least: length, energy, the average of current quiet section.
Such as, shown in composition graphs 5, treat after judging, belong to the present frame section of quiet section in survey sound signal length, energy and average and carry out detection of characteristic parameters.
Again such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, judge that the frame section belonging to quiet section is A, B, C, D, E, then frame section A, B, C, D, E of belonging to quiet section are carried out to the detection of characteristic parameter (such as, characteristic parameter is the length of present frame section, energy and average).
2) second judges submodule, and whether the frame section for judging to belong to quiet section according to the testing result of detection sub-module meets the naturally quiet condition in the first card frame bar part; When the frame section belonging to quiet section meets the quiet condition of nature, judge that frame section does not meet the first card frame bar part.
Alternatively, the first card frame bar part in the present embodiment includes but not limited to: naturally quiet.
Alternatively, judge whether the frame section belonging to quiet section meets the quiet condition of nature according to above-mentioned testing result.
It should be noted that, not all quiet section is all card frame, in voice-frequency telephony some exchange between quiet be natural pause, so naturally quiet be not occurred that audio card is paused, thus not as effective card frame (such as, the second card frame section).
Such as, shown in composition graphs 5, the frame section belonging to quiet section is that in A, B, C, D, E, frame section E meets the naturally quiet condition in the first card frame bar part, and that is, the quiet of frame section E is normally quiet, then frame section E is not labeled as the second card frame section.
By the embodiment that the application provides, by judging whether to meet the quiet condition of nature to the frame section belonging to quiet section in sound signal, eliminating because of the naturally quiet situation being mistaken for card frame caused, thus obtaining the card frame in sound signal more effectively and accurately.
As the optional scheme of one, the first judge module 1302 comprises:
1) the 3rd judges submodule, and during for not meeting the quiet condition of nature in the frame section belonging to quiet section, whether the frame section judging to belong to quiet section meets the audio frequency hit condition in the first card frame bar part;
Alternatively, the first card frame bar part in the present embodiment includes but not limited to: audio frequency hit condition.
Such as, shown in composition graphs 5, belonging to the frame section not meeting the quiet condition of nature in the frame section A of quiet section, B, C, D, E is frame section A, B, C, D, judges the audio frequency hit condition of above-mentioned frame section A, the whether satisfied first card frame bar part of B, C, D.
It should be noted that, audio frequency hit is that sound hit causes, if tut does not have hit phenomenon, is not then likely effective card frame frame section (such as, the second card frame section) of audio system, is thus necessary to treat the judgement that acoustic carries out audio frequency hit condition frequently.
2) the 4th judges submodule, during for meeting audio frequency hit condition in the frame section belonging to quiet section, judges whether the frame number meeting the frame section of audio frequency hit condition is greater than the second predetermined threshold;
Alternatively, the setting of the second predetermined threshold is in the present embodiment also relevant with the recognition capability of people's ear to the Caton phenomenon of audio frequency, and this second predetermined threshold can be obtained by training or be determined according to product quality demanding class degree in actual assessment.
Such as, shown in composition graphs 5, when sound signal comprises A, B, C, D, E, F, G, H eight frame sections, belonging to the frame section not meeting the quiet condition of nature in the frame section A of quiet section, B, C, D, E is frame section A, B, C, D, judge again to show that the frame section wherein meeting audio frequency hit condition is A, B, then whether the card frame frame number of judgment frame section A, B is greater than the second predetermined threshold (such as, frame number is P).
Alternatively, when frame number is greater than the second predetermined threshold, judge that the frame section meeting audio frequency hit condition meets the first card frame bar part; When frame number is less than or equal to the second predetermined threshold, judge that the frame section meeting audio frequency hit condition does not meet the first card frame bar part.
Such as, the frame section A judging to belong to quiet section, the frame section not meeting the quiet condition of nature in B, C, D, E are frame section A, B, C, D, judge again to show that the frame section wherein meeting audio frequency hit condition is A, B, if through judging to learn that the frame number of frame section B is greater than the second predetermined threshold, such as, frame number is greater than P, then judge to draw that the frame section B meeting audio frequency hit condition meets the first card frame bar part, and frame section B is charged to the second card frame section.Such as, if through judging to learn that the frame number of frame section A is less than or equal to the second predetermined threshold, frame number is less than or equal to P, then judges to show that the frame section A meeting audio frequency hit condition does not meet the first card frame bar part, frame section A is not labeled as the second card frame section.
By the embodiment that the application provides, by determining whether audio frequency hit to the frame section belonging to quiet section in sound signal, judging whether frame number meets thresholding and arrange further, thus obtaining the card frame in sound signal more effectively and accurately.
As the optional scheme of one, the first judge module 1302 comprises:
1) the 5th judges submodule, during for being discontented with footsteps frequency hit condition in the frame section belonging to quiet section, judges to belong to the sharp-pointed downslide/time domain truncation condition in the whether satisfied first card frame bar part of frame section of quiet section; When the frame section belonging to quiet section is discontented with toe sharp downslide/time domain truncation condition, judge that the frame section of discontented toe sharp downslide/time domain truncation condition meets the first card frame bar part;
Alternatively, the first card frame bar part in the present embodiment includes but not limited to: sharp-pointed downslide/time domain truncation condition.
Such as, shown in composition graphs 5, belonging to the frame section not meeting the quiet condition of nature in the frame section A of quiet section, B, C, D, E is frame section A, B, C, D, judge that the frame section of audio frequency hit condition that above-mentioned frame section A, B, C, D do not meet the first card frame bar part is C, D, then judge whether to meet sharp-pointed downslide/time domain truncation condition to frame section C, D.
It should be noted that, sharp-pointed downslide/time domain truncation is that time domain is blocked suddenly and caused, if above-mentioned frame section neither audio frequency hit neither sharply glide/time domain truncation cause quiet suddenly, then likely not effective card frame frame section of audio system is (such as, second card frame section), be thus necessary to treat acoustic frequently sharply to glide/the judgement of time domain truncation condition.
Again such as, shown in composition graphs 5, frame section C, the D of the audio frequency hit condition do not met in the first card frame bar part are sharply glided/the judgement of time domain truncation condition, show that frame section D is discontented with toe sharp downslide/time domain truncation condition, then frame section D is not labeled as the second card frame section.
2) the 6th judges submodule, during for meeting sharp-pointed downslide/time domain truncation condition in the frame section belonging to quiet section, judges whether the frame number of the frame section meeting sharp-pointed downslide/time domain truncation condition is greater than the 3rd predetermined threshold; When frame number is greater than the 3rd predetermined threshold, judge that the frame section meeting sharp-pointed downslide/time domain truncation condition meets the first card frame bar part; When frame number is less than or equal to the 3rd predetermined threshold, judge that the frame section meeting sharp-pointed downslide/time domain truncation condition does not meet the first card frame bar part.
Alternatively, the setting of the 3rd predetermined threshold is in the present embodiment also relevant with the recognition capability of people's ear to the Caton phenomenon of audio frequency, and the 3rd predetermined threshold can be obtained by training or be determined according to product quality demanding class degree in actual assessment.
Such as, shown in composition graphs 5, frame section C, the D of the audio frequency hit condition do not met in the first card frame bar part are sharply glided/the judgement of time domain truncation condition, show that frame section C meets sharp-pointed downslide/time domain truncation condition, then whether the card frame frame number of judgment frame section C is greater than the 3rd predetermined threshold (such as, frame number is Q).
Again such as, shown in composition graphs 5, the card frame frame number of the frame section C meeting sharp-pointed downslide/time domain truncation condition is judged, if through judging to learn that the frame number of frame section C is greater than the 3rd predetermined threshold, such as, frame number is greater than Q, then judge to draw that the frame section C meeting sharp-pointed downslide/time domain truncation condition meets the first card frame bar part, then frame section C is labeled as the second card frame section; Such as, if through judging to learn that the frame number of frame section C is less than or equal to the 3rd predetermined threshold, frame number is less than or equal to Q, then judges to draw that the frame section C meeting sharp-pointed downslide/time domain truncation condition does not meet the first card frame bar part, then frame section C is not labeled as the second card frame section.
By the embodiment that the application provides, by determining whether sharp-pointed downslide/time domain truncation to the frame section belonging to quiet section in sound signal, judge whether frame number meets thresholding and arrange further, thus obtain the card frame in sound signal more effectively and accurately.
As the optional scheme of one, as shown in figure 13, selection unit 1206 comprises:
1) the second judge module 1304, for when quiet section of frame Duan Buwei, whether judgment frame section meets the second card frame bar part; When frame section does not meet the second card frame bar part, judge that the frame section of card frame appears in frame Duan Buwei; When frame section meets the second card frame bar part, then judge that frame section is for occurring the frame section of card frame.
Alternatively, shown in composition graphs 5, the second card frame bar part in the present embodiment includes but not limited to: the correlativity of audio frequency characteristics, periodically judgement.Such as, stress condition, magnetization/mechanical sound condition.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, and judge that the frame section not belonging to quiet section is F, G, H, then whether judgment frame section is F, G, H is stress.
Again such as, shown in composition graphs 5, if do not belong to frame section G, H in the frame section F of quiet section, G, H not meet the second card frame bar part, such as, judge to draw frame section G, H not as stress, and magnetization/mechanical voice frequency composition does not exceed preset ratio, then judge to draw frame section G, H not as there is the frame section of card frame, then frame section G, H are not labeled as the second card frame section.
Again such as, shown in composition graphs 5, if do not belong to frame section F in the frame section F of quiet section, G, H to meet the second card frame bar part, such as, judge to show that frame section F is stress, and card frame frame number meets the discernible condition of people's ear, then judge to show that frame section F is the frame section occurring card frame, then frame section F is labeled as the second card frame section.
By the embodiment that the application provides, by judging the frame section not belonging to quiet section, judging whether satisfied second card frame bar part, and then differentiation is made to the frame section of non-mute section, accurately draw the card frame frame section that can be identified in audio communication system.
As the optional scheme of one, the second judge module 1304 comprises:
1) the 7th judges submodule, whether meets the stress condition in the second card frame bar part for judgment frame section;
Alternatively, the second card frame bar part includes but not limited in the present embodiment: stress condition, magnetization/mechanical sound condition.
Such as, shown in composition graphs 5, judge to draw do not belong to quiet section frame section F, after G, H, then judge whether above-mentioned frame section meets the stress condition in the second card frame bar part.
2) the 8th judges submodule, and during for being discontented with lumping weight sound condition in frame section, whether judgment frame section meets the magnetization/mechanical sound condition in the second card frame bar part; When not meeting the magnetization/mechanical sound condition in the second card frame bar part in frame section, judge that frame section does not meet the second card frame bar part.
Such as, shown in composition graphs 5, if judgement draws frame section G, H is discontented with lumping weight sound condition, then whether judgment frame section G, H meet the magnetization/mechanical sound condition in the second card frame bar part, that is, whether magnetization/mechanical voice frequency the composition of judgment frame section G, H exceeds preset ratio.
It should be noted that, shown in composition graphs 5, the frame section not belonging to quiet section is discontented with lumping weight sound condition, judge again not meet magnetization/mechanical sound condition, then such frame section is not real card frame frame section effectively, but the frame section of erroneous judgement, thus not as effective card frame (such as, the second card frame section).
Such as, shown in composition graphs 5, the frame section H in stress condition frame section G, the H do not met in the second card frame bar part is drawn if judge, magnetization/mechanical sound the condition in the second card frame bar part is not met yet, that is, magnetization/mechanical voice frequency the composition of judgment frame section H does not exceed preset ratio, then judge that frame section H does not meet the second card frame bar part, then frame section H is not labeled as the second card frame section.
By the embodiment that the application provides, by carrying out distinguishing of refinement to the frame section not belonging to quiet section, judge whether the stress condition in satisfied second card frame bar part and magnetization/mechanical sound condition, and then differentiation is made to the frame section of non-mute section, accurately draw the card frame frame section that can be identified in audio communication system.
As the optional scheme of one, the second judge module 1304 comprises:
1) the 9th judges submodule, during for meeting stress condition in frame section or meeting magnetization/mechanical sound condition, judges whether the frame number belonging to frame section is greater than the 4th predetermined threshold; When frame number is greater than the 4th predetermined threshold, judge that belonging to frame section meets the second card frame bar part; When frame number is less than or equal to the 4th predetermined threshold, judge that belonging to frame section does not meet the second card frame bar part.
Alternatively, the setting of the 4th predetermined threshold is in the present embodiment also relevant with the recognition capability of people's ear to the Caton phenomenon of audio frequency, and the 4th predetermined threshold can be obtained by training or be determined according to product quality demanding class degree in actual assessment.
Such as, sound signal comprises A, B, C, D, E, F, G, H eight frame sections, the frame section that judgment frame section meets stress condition or satisfied magnetization/mechanical sound condition is G, then whether the card frame frame number of judgment frame section G is greater than the 4th predetermined threshold (such as, the 4th predetermined threshold is S).
Again such as, if the card frame frame number of frame section G is greater than the 4th predetermined threshold, such as, frame number is greater than S, then judge that belonging to frame section G meets the second card frame bar part, then charge to the second card frame and break by frame section G; If the card frame frame number of frame section G is less than or equal to the 4th predetermined threshold, such as, frame number is less than or equal to S, then judge that belonging to frame section G does not meet the second card frame bar part, be not then labeled as the second card frame section by frame section G.
By the embodiment that the application provides, by to not belonging to quiet section in sound signal and meeting stress condition or meet the frame section of magnetization/mechanical sound condition, judge whether frame number meets thresholding and arrange, thus obtains the card frame in sound signal more effectively and accurately further.
As the optional scheme of one, as shown in figure 14, search indexing unit 1204 to comprise:
1) mark module 1402, for at least one eigenwert of each in the multiple frame of liaison in each frame all not within the threshold range of correspondence, the frame segment mark of continuous multiple frame composition is designated as and occurs abnormal frame section, wherein, the threshold range of each correspondence in eigenwert is identical or different.
Such as, when searching from each frame and mark the frame section of eigenwert appearance exception, be from the multiple frame of liaison, search each frame at least one eigenwert all not within the threshold range of correspondence, and the frame section marking above-mentioned continuous multiple frame composition is that abnormal frame section appears in eigenwert.
As the optional scheme of one, eigenwert comprise following one of at least: energy envelope value, frequency spectrum flow, spectral smoothing degree, spectrum deflection, spectrum kurtosis.
Alternatively, the Related Computational Methods of above-mentioned eigenwert can be expressed as follows in the present embodiment:
1) energy envelope value, for representing the change of audio frequency short-time energy, wherein, added window function comprise following one of at least: rectangular window, Hamming window, Hanning window, quarter window, Abbado Lay window.Wherein, the expression formula of the window function of rectangular window is as follows:
w ( n ) = 1,0 &le; n < N 0 - - - ( 12 )
Wherein, the kth frame after sound signal windowing is: X kn ()=w (n) * x (k*N+n), wherein N represents the audio sample number that the time window of each frame is corresponding.Kth frame signal X kn the average energy value of () represents with E (k), computing formula is as follows:
E ( k ) = 1 N &Sum; n = 1 N - 1 X k ( n ) * X k ( n ) - - - ( 13 )
Envelope is to audio power signal evolution and takes the logarithm after normalization, and be the one mark of audio frequency short-time energy change, the envelope of kth frame sound signal represents with Env (k), is shown below
Env ( k ) = 20 * log 10 ( 1 N &Sum; n = 1 N - 1 X k ( n ) * X k ( n ) / 32768 ) - - - ( 14 )
2) frequency spectrum flow: for embodying the behavioral characteristics of sound signal, can be drawn by the difference being mould with 2 after the vector normalization of adjacent two frames, formula specific as follows represents:
v SF ( n ) = &Sum; k = 0 &kappa; / 2 - 1 ( | X ( k , n ) | - | X ( k , n - 1 ) | ) 2 &kappa; / 2 . - - - ( 15 )
Wherein, 0≤v sFn ()≤A, A is default judgement spectrum amplitude threshold value, v sF (n)littlely show that adjacent sound signal is more steady, or another kind of situation is that input signal thresholding is low.V during sound signal non-stationary transition sudden change sF (n)the abnormal thresholding of very high arrival of punching.
3) spectral smoothing degree, the pause caused for tag card frame or sudden change, can be drawn by following computing formula:
v Tf ( n ) = &Pi; k = 0 &kappa; / 2 - 1 | X ( k , n ) | &kappa; / 2 2 / &kappa; &CenterDot; &Sum; k = 0 &kappa; / 2 - 1 | X ( k , n ) | = exp ( 2 / &kappa; &CenterDot; &Sum; k = 0 &kappa; / 2 - 1 log ( | X ( k , n ) | ) ) 2 / &kappa; &CenterDot; &Sum; k = 0 &kappa; / 2 - 1 | X ( k , n ) | . - - - ( 16 )
Wherein, smooth audio region, v tf (n)less, then signal is that the probability of tonal properties is larger; When the pause that card frame is caused or sudden change, v tf (n)that can rush is very high, forms spike and exceeds abnormal thresholding.
4) deflection is composed, for characterizing the symmetry that sound signal probability density function (PDF, Probability Density Function) distributes, can by 3 center, rank squares of audio signal statistics cube to draw divided by standard deviation, formula specific as follows represents:
v Sk ( n ) = 1 &sigma; x 3 ( n ) &CenterDot; &kappa; &Sigma; i = i s ( n ) i e ( n ) ( x ( i ) - &mu; x ( n ) ) 3 . - - - ( 17 )
Wherein, μ xn () is the average of a frame statistical signal, σ xn () is corresponding standard deviation.
5) compose kurtosis, for characterizing the non-Gaussian system of sound signal PDF distribution, compared with Gaussian distribution, it characterizes the flatness of input signal values, and can be drawn by the biquadratic of 4 center, rank squares of audio signal statistics divided by standard deviation, formula specific as follows represents:
v K ( n ) = 1 &sigma; x 4 ( n ) &CenterDot; I &Sum; i = i s ( n ) i e ( n ) ( x ( i ) - &mu; x ( n ) ) 4 - 3 . - - - ( 18 )
Wherein, μ xn () is the average of a frame statistical signal, σ xn () is corresponding standard deviation.
It should be noted that, ask for the spectrum deflection of each frame sound signal and spectrum kurtosis value, these two eigenwerts characterize the degree of sound signal distortion. for spectrum deflection and the audio frame of spectrum kurtosis lower than the decision threshold preset, according to Spectrum Distortion Measure, respective frame is listed in the audio frame of distortion.
Alternatively, before card frame detection method in the present embodiment, also comprise and carry out pre-service to sound signal, wherein, pre-service includes but not limited to: go direct current, normalized, channel separation.
Alternatively, above-mentioned pretreated method can see the processing procedure described in embodiment 1.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
In the above embodiment of the present invention, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part described in detail, can see the associated description of other embodiments.
In several embodiments that the application provides, should be understood that, disclosed client, the mode by other realizes.Wherein, device embodiment described above is only schematic, the such as division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of unit or module or communication connection can be electrical or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form of SFU software functional unit also can be adopted to realize.
If described integrated unit using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computer read/write memory medium.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words or all or part of of this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprises all or part of step of some instructions in order to make a computer equipment (can be personal computer, server or the network equipment etc.) perform method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD etc. various can be program code stored medium.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (22)

1. a card frame detection method, is characterized in that, comprising:
Treat survey sound signal and carry out feature detection, obtain the eigenwert of each frame in described sound signal to be measured;
Search from described each frame and mark described eigenwert and occur abnormal frame section, wherein, the label information of described frame section comprise following one of at least: the temporal information of the start frame of described frame section and the frame length of described frame section;
Be whether the quiet section of frame section selecting to occur card frame from described frame section according to described frame section;
Export the described label information occurring the frame section of card frame.
2. method according to claim 1, is characterized in that, is describedly whether quiet section according to described frame section and selects to occur that the frame section of card frame comprises from described frame section:
If described frame section is described quiet section, then judge whether the described frame section belonging to described quiet section meets the first card frame bar part;
If the described frame section belonging to described quiet section does not meet described first card frame bar part, then there is the frame section of card frame described in the described frame Duan Buwei judging to belong to described quiet section;
If the described frame section belonging to described quiet section meets described first card frame bar part, then the described frame section judging to belong to described quiet section is the described frame section occurring card frame.
3. method according to claim 2, is characterized in that, whether the described frame section that described judgement belongs to described quiet section meets the first card frame bar part and comprise:
Judge whether the frame number belonging to the described frame section of described quiet section is greater than the first predetermined threshold;
If described frame number is greater than described first predetermined threshold, then the described frame section judging to belong to described quiet section meets described first card frame bar part; If described frame number is less than or equal to described first predetermined threshold, then the described frame section judging to belong to described quiet section does not meet described first card frame bar part.
4. method according to claim 2, is characterized in that, described in judge to belong to described quiet section described frame Duan Buwei described in occur that the frame section of card frame comprises:
The characteristic parameter of the described frame section belonging to described quiet section is detected;
Whether the described frame section belonging to described quiet section described in judging according to described testing result meets the naturally quiet condition in described first card frame bar part;
If described in belong to described quiet section described frame section meet described naturally quiet condition, then judge that described frame section does not meet described first card frame bar part.
5. method according to claim 4, is characterized in that, after whether the described frame section belonging to described quiet section described in judging according to described testing result meets the naturally quiet condition in described first card frame bar part, also comprises:
If described in belong to described quiet section described frame section do not meet described naturally quiet condition, then whether the described frame section judging to belong to described quiet section meets the audio frequency hit condition in described first card frame bar part;
If described in belong to described quiet section described frame section meet described audio frequency hit condition, then judge whether the frame number meeting the described frame section of described audio frequency hit condition is greater than the second predetermined threshold;
If described frame number is greater than described second predetermined threshold, then the described frame section judging to meet described audio frequency hit condition meets described first card frame bar part; If described frame number is less than or equal to described second predetermined threshold, then the described frame section judging to meet described audio frequency hit condition does not meet described first card frame bar part.
6. method according to claim 5, is characterized in that, after whether the described frame section judging to belong to described quiet section meets the audio frequency hit condition in described first card frame bar part, also comprises:
If described in belong to described quiet section described frame section do not meet described audio frequency hit condition, then whether the described frame section belonging to described quiet section described in judging meets the sharp-pointed downslide/time domain truncation condition in described first card frame bar part;
If described in belong to described quiet section described frame section do not meet described sharp-pointed downslide/time domain truncation condition, then the described frame section judging not meet described sharp-pointed downslide/time domain truncation condition meets described first card frame bar part;
If described in belong to described quiet section described frame section meet described sharp-pointed downslide/time domain truncation condition, then judge whether the frame number meeting the described frame section of described sharp-pointed downslide/time domain truncation condition is greater than the 3rd predetermined threshold;
If described frame number is greater than described 3rd predetermined threshold, then the described frame section judging to meet described sharp-pointed downslide/time domain truncation condition meets described first card frame bar part; If described frame number is less than or equal to described 3rd predetermined threshold, then the described frame section judging to meet described sharp-pointed downslide/time domain truncation condition does not meet described first card frame bar part.
7. method according to claim 1, is characterized in that, is describedly whether quiet section according to described frame section and selects to occur that the frame section of card frame comprises from described frame section:
If described in described frame Duan Buwei quiet section, then judge whether described frame section meets the second card frame bar part;
If described frame section does not meet described second card frame bar part, then judge the frame section occurring card frame described in described frame Duan Buwei;
If described frame section meets described second card frame bar part, then judge that described frame section is the described frame section occurring card frame.
8. method according to claim 7, is characterized in that, describedly judges whether described frame section meets described second card frame bar part and comprise:
Judge whether described frame section meets the stress condition in described second card frame bar part;
If described frame section does not meet described stress condition, then judge whether described frame section meets the magnetization/mechanical sound condition in described second card frame bar part;
If described frame section does not meet the magnetization/mechanical sound condition in described second card frame bar part, then judge that described frame section does not meet described second card frame bar part.
9. method according to claim 8, is characterized in that, if described frame section meets described stress condition or meets described magnetization/mechanical sound condition, described method also comprises:
Judge whether the frame number belonging to described frame section is greater than the 4th predetermined threshold;
If described frame number is greater than described 4th predetermined threshold, then judge that belonging to described frame section meets described second card frame bar part; If described frame number is less than or equal to described 4th predetermined threshold, then judge that belonging to described frame section does not meet described second card frame bar part.
10. method according to any one of claim 1 to 9, it is characterized in that, search from described each frame and mark described eigenwert and occur that abnormal frame section comprises: if eigenwert is not all within the threshold range of correspondence described in each at least one in the multiple frame of liaison in described each frame, then the frame segment mark of described continuous multiple frame composition is designated as described eigenwert and occurs abnormal frame section, wherein, the described threshold range of each correspondence in described eigenwert is identical or different.
11. methods according to claim 10, is characterized in that, described eigenwert comprise following one of at least: energy envelope value, frequency spectrum flow, spectral smoothing degree, spectrum deflection, spectrum kurtosis.
12. 1 kinds of card frame pick-up units, is characterized in that, comprising:
Detecting unit, carrying out feature detection for treating survey sound signal, obtaining the eigenwert of each frame in described sound signal to be measured;
Search indexing unit, occur abnormal frame section for searching from described each frame and marking described eigenwert, wherein, the label information of described frame section comprise following one of at least: the temporal information of the start frame of described frame section and the frame length of described frame section;
Whether selection unit, for according to described frame section being the quiet section of frame section selecting to occur card frame from described frame section;
Output unit, for exporting the described label information occurring the frame section of card frame.
13. devices according to claim 12, is characterized in that, described selection unit comprises:
First judge module, for when described frame section is described quiet section, judges whether the described frame section belonging to described quiet section meets the first card frame bar part; When the described frame section judging to belong to described quiet section does not meet described first card frame bar part, described in the described frame Duan Buwei judging to belong to described quiet section, there is the frame section of card frame; Meet described first card frame bar part in the described frame section judging to belong to described quiet section, the described frame section judging to belong to described quiet section is the described frame section occurring card frame.
14. devices according to claim 13, is characterized in that, described first judge module comprises:
First judges submodule, for judging whether the frame number of the described frame section belonging to described quiet section is greater than the first predetermined threshold; When described frame number is greater than described first predetermined threshold, the described frame section judging to belong to described quiet section meets described first card frame bar part; When described frame number is less than or equal to described first predetermined threshold, the described frame section judging to belong to described quiet section does not meet described first card frame bar part.
15. devices according to claim 13, is characterized in that, described first judge module comprises:
Detection sub-module, for detecting the characteristic parameter of the described frame section belonging to described quiet section;
Second judges submodule, and whether the described frame section for belonging to described quiet section according to the testing result judgement of described detection module meets the naturally quiet condition in described first card frame bar part; When the described described frame section belonging to described quiet section meets described quiet condition naturally, judge that described frame section does not meet described first card frame bar part.
16. devices according to claim 15, is characterized in that, described first judge module comprises:
3rd judges submodule, for when the described described frame section belonging to described quiet section does not meet described quiet condition naturally, judges whether the described frame section belonging to described quiet section meets the audio frequency hit condition in described first card frame bar part;
4th judges submodule, for when the described described frame section belonging to described quiet section meets described audio frequency hit condition, judges whether the frame number meeting the described frame section of described audio frequency hit condition is greater than the second predetermined threshold; When described frame number is greater than described second predetermined threshold, the described frame section judging to meet described audio frequency hit condition meets described first card frame bar part; When described frame number is less than or equal to described second predetermined threshold, the described frame section judging to meet described audio frequency hit condition does not meet described first card frame bar part.
17. devices according to claim 16, is characterized in that, described first judge module comprises:
5th judges submodule, and for when the described described frame section belonging to described quiet section does not meet described audio frequency hit condition, whether the described frame section belonging to described quiet section described in judgement meets the sharp-pointed downslide/time domain truncation condition in described first card frame bar part; When the described described frame section belonging to described quiet section does not meet described sharp-pointed downslide/time domain truncation condition, the described frame section judging not meet described sharp-pointed downslide/time domain truncation condition meets described first card frame bar part;
6th judges submodule, for when the described described frame section belonging to described quiet section meets described sharp-pointed downslide/time domain truncation condition, judges whether the frame number meeting the described frame section of described sharp-pointed downslide/time domain truncation condition is greater than the 3rd predetermined threshold; When described frame number is greater than described 3rd predetermined threshold, the described frame section judging to meet described sharp-pointed downslide/time domain truncation condition meets described first card frame bar part; When described frame number is less than or equal to described 3rd predetermined threshold, the described frame section judging to meet described sharp-pointed downslide/time domain truncation condition does not meet described first card frame bar part.
18. devices according to claim 12, is characterized in that, described selection unit comprises:
Second judge module, for described in described frame Duan Buwei when quiet section, judges whether described frame section meets the second card frame bar part; When described frame section does not meet described second card frame bar part, judge the frame section occurring card frame described in described frame Duan Buwei; When described frame section meets described second card frame bar part, then judge that described frame section is the described frame section occurring card frame.
19. devices according to claim 18, is characterized in that, described second judge module comprises:
7th judges submodule, for judging whether described frame section meets the stress condition in described second card frame bar part;
8th judges submodule, for when described frame section does not meet described stress condition, judges whether described frame section meets the magnetization/mechanical sound condition in described second card frame bar part; When described frame section does not meet the magnetization/mechanical sound condition in described second card frame bar part, judge that described frame section does not meet described second card frame bar part.
20. devices according to claim 19, is characterized in that, described second judge module comprises:
9th judges submodule, during for meeting described stress condition in described frame section or meeting described magnetization/mechanical sound condition, judges whether the frame number belonging to described frame section is greater than the 4th predetermined threshold; When described frame number is greater than described 4th predetermined threshold, judge that belonging to described frame section meets described second card frame bar part; When described frame number is less than or equal to described 4th predetermined threshold, judge that belonging to described frame section does not meet described second card frame bar part.
21., according to claim 12 to the device according to any one of 20, is characterized in that, described in search indexing unit and comprise:
Mark module, for each in the multiple frame of liaison in described each frame at least one described in eigenwert all not within the threshold range of correspondence, the frame segment mark of described continuous multiple frame composition is designated as described eigenwert and occurs abnormal frame section, wherein, the described threshold range of each correspondence in described eigenwert is identical or different.
22. devices according to claim 21, is characterized in that, described eigenwert comprise following one of at least: energy envelope value, frequency spectrum flow, spectral smoothing degree, spectrum deflection, spectrum kurtosis.
CN201410036425.8A 2014-01-24 2014-01-24 card frame detection method and device Active CN104123949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410036425.8A CN104123949B (en) 2014-01-24 2014-01-24 card frame detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410036425.8A CN104123949B (en) 2014-01-24 2014-01-24 card frame detection method and device

Publications (2)

Publication Number Publication Date
CN104123949A CN104123949A (en) 2014-10-29
CN104123949B true CN104123949B (en) 2015-08-12

Family

ID=51769336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410036425.8A Active CN104123949B (en) 2014-01-24 2014-01-24 card frame detection method and device

Country Status (1)

Country Link
CN (1) CN104123949B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106847307B (en) * 2016-12-21 2020-07-10 广州酷狗计算机科技有限公司 Signal detection method and device
CN109346061B (en) * 2018-09-28 2021-04-20 腾讯音乐娱乐科技(深圳)有限公司 Audio detection method, device and storage medium
CN110430425B (en) * 2019-07-31 2021-02-05 北京奇艺世纪科技有限公司 Video fluency determination method and device, electronic equipment and medium
CN111770413B (en) * 2020-06-30 2021-08-27 浙江大华技术股份有限公司 Multi-sound-source sound mixing method and device and storage medium
CN112802453B (en) * 2020-12-30 2024-04-26 深圳飞思通科技有限公司 Fast adaptive prediction voice fitting method, system, terminal and storage medium
CN113496705B (en) * 2021-08-19 2024-03-08 杭州华橙软件技术有限公司 Audio processing method and device, storage medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000341322A (en) * 1999-05-25 2000-12-08 Nippon Telegr & Teleph Corp <Ntt> Stream information distributor
CN102610228B (en) * 2011-01-19 2014-01-22 上海弘视通信技术有限公司 Audio exception event detection system and calibration method for the same
CN103475906B (en) * 2012-06-08 2016-08-10 华为技术有限公司 Measuring method and measurement apparatus for media stream

Also Published As

Publication number Publication date
CN104123949A (en) 2014-10-29

Similar Documents

Publication Publication Date Title
CN104123949B (en) card frame detection method and device
US20160210984A1 (en) Voice Quality Evaluation Method and Apparatus
CN107305774A (en) Speech detection method and device
US20060212295A1 (en) Apparatus and method for audio analysis
CN107910014A (en) Test method, device and the test equipment of echo cancellor
EP2927906B1 (en) Method and apparatus for detecting voice signal
US20050228649A1 (en) Method and apparatus for classifying sound signals
CN104143324B (en) A kind of musical tone recognition method
CN111312218B (en) Neural network training and voice endpoint detection method and device
CN102623009A (en) Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN108877823A (en) Sound enhancement method and device
CN105513598A (en) Playback voice detection method based on distribution of information quantity in frequency domain
CN111312286A (en) Age identification method, age identification device, age identification equipment and computer readable storage medium
Vieira et al. A speech quality classifier based on tree-cnn algorithm that considers network degradations
CN111161746B (en) Voiceprint registration method and system
Schlotterbeck et al. What classroom audio tells about teaching: a cost-effective approach for detection of teaching practices using spectral audio features
CN105590629B (en) A kind of method and device of speech processes
CN113823323A (en) Audio processing method and device based on convolutional neural network and related equipment
Li et al. Non-intrusive quality assessment for enhanced speech signals based on spectro-temporal features
CN110210893A (en) Generation method, device, storage medium and the electronic device of report
CN101460994A (en) Speech differentiation
CN109817243A (en) A kind of speech quality detection method and system based on speech recognition and energy measuring
CN110556114B (en) Speaker identification method and device based on attention mechanism
CN116884427A (en) Embedded vector processing method based on end-to-end deep learning voice re-etching model
JP4761391B2 (en) Listening quality evaluation method and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant