CN104022967A - Voice decoding apparatus - Google Patents

Voice decoding apparatus Download PDF

Info

Publication number
CN104022967A
CN104022967A CN201410058259.1A CN201410058259A CN104022967A CN 104022967 A CN104022967 A CN 104022967A CN 201410058259 A CN201410058259 A CN 201410058259A CN 104022967 A CN104022967 A CN 104022967A
Authority
CN
China
Prior art keywords
mentioned
grouping
background noise
buffer
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410058259.1A
Other languages
Chinese (zh)
Inventor
伏见涉
铃木茂明
山浦正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitsubishi Electric Corp
Original Assignee
Mitsubishi Electric Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitsubishi Electric Corp filed Critical Mitsubishi Electric Corp
Publication of CN104022967A publication Critical patent/CN104022967A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/043Time compression or expansion by changing speed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

A voice decoding apparatus capable of reducing degradation of quality of connection even silent compression is applied. The voice decoding apparatus comprises: a shaking absorption buffer temporarily accumulating received a packet and outputting the packet in a specified output; a background noise generation portion generating voice data of background noise according to background noise data included by the packet output by the shaking absorption buffer; a voice decoding portion decoding voice coding data included by the packet output by the shaking absorption buffer and generating voice data of voices; a speed conversion portion performing speed conversion for converting reproduction speed of the voice data after being decoded by the voice decoding portion; and a control portion controlling time duration of the background noise generated by the background noise generation portion and controlling the reproduction speed converted by the speed conversion portion on the basis of accumulating conditions of the packet in the shaking absorption buffer.

Description

Audio decoding apparatus
Technical field
The audio decoding apparatus that the audio frequency to after coding that the present invention relates to use in the networking telephone etc. is decoded.
Background technology
The voice calls such as the networking telephone are conversed as follows: after voice are encoded, form grouping, and receive and dispatch grouping by network.In the communication of grouping, the time interval that receives grouping is mostly fixing, and the situation of the time of reception interval generation deviation (shake) of grouping is more.As absorbing such shake, and export continuously that audio code that the grouping to receiving comprises is decoded and the technology of the decoded audio that obtains, the technology that for example exists patent documentation 1 to record.
In the technology of recording at patent documentation 1, accelerate or the reproduction speed that slows down according to the storage capacity of the reception grouping in the shake absorbing buffer of interim storing received grouping, thus the storage capacity of the reception grouping in shake absorbing buffer is remained to appropriate amount, and export continuously decoded audio.Thus, compared with the storage capacity that receives grouping being remained by shaking the discarding, copying of reception grouping in absorbing buffer to the situation of appropriate amount, alleviated the deteriorated of audio quality.
No. 3796240 communique of [patent documentation 1] Japanese Patent
But, in audio decoding apparatus in the past, be the control of carrying out taking following situation as prerequisite: with regular time interval packet memory that voice are encoded, packed and send to position corresponding to the packet numbering with this grouping in shake absorbing buffer.Therefore, applied in noiseless interval such as, divide into groups send the elongated grade in interval, not necessarily with regular time interval send in the system of compression of soundless part of grouping, existence cannot be carried out appropriate processing and cause the deteriorated problem of speech quality.
Summary of the invention
The present invention completes in order to solve problem as described above just, and its object is, even if obtain a kind of deteriorated audio decoding apparatus that also can reduce speech quality in the time of application compression of soundless part.
Audio decoding apparatus of the present invention has: shake absorbing buffer, and it accumulates the grouping receiving temporarily, and this grouping of output timing output specifying; Background noise generating unit, it is according to the background noise data comprising from the grouping of above-mentioned shake absorbing buffer output, the voice data of generation background noise; Audio decoder portion, it is decoded to the coded audio data comprising from the grouping of above-mentioned shake absorbing buffer output and generates the voice data of voice; Speeking speed changing portion, it carries out the Speeking speed changing to being converted by the reproduction speed of the decoded above-mentioned voice data of above-mentioned audio decoder portion; And control part, it is according to the situation of accumulating of the grouping in above-mentioned shake absorbing buffer, controls the time span of the above-mentioned background noise being generated by above-mentioned background noise generating unit, and controls the above-mentioned reproduction speed being converted by above-mentioned Speeking speed changing portion.
According to the present invention, by having: shake absorbing buffer, it accumulates the grouping receiving temporarily, and this grouping of output timing output specifying; Background noise generating unit, it is according to the voice data of the background noise data generation background noise comprising from the grouping of above-mentioned shake absorbing buffer output; Audio decoder portion, it is decoded to the coded audio data comprising from the grouping of above-mentioned shake absorbing buffer output and generates the voice data of voice; Speeking speed changing portion, it carries out the Speeking speed changing to being converted by the reproduction speed of the decoded above-mentioned voice data of above-mentioned audio decoder portion; And control part, it is according to the situation of accumulating of the grouping in above-mentioned shake absorbing buffer, control the time span of the above-mentioned background noise being generated by above-mentioned background noise generating unit, and control the above-mentioned reproduction speed being converted by above-mentioned Speeking speed changing portion, thus, even also can prevent that speech quality is deteriorated in the time of application compression of soundless part.
Brief description of the drawings
Fig. 1 is the functional block structure chart of the audio decoding apparatus in embodiment of the present invention 1.
Fig. 2 is the key diagram that the relation between timestamp and shake the accumulating of absorbing buffer of grouping is shown.
Fig. 3 is the functional block structure chart of the audio decoding apparatus in embodiment of the present invention 2.
Fig. 4 is the functional block structure chart of the audio decoding apparatus in embodiment of the present invention 3.
Fig. 5 is the functional block structure chart of the audio decoding apparatus in embodiment of the present invention 4.
Fig. 6 is the functional block structure chart of the audio decoding apparatus in embodiment of the present invention 5.
Fig. 7 is the key diagram that the relation between timestamp and shake the accumulating of absorbing buffer of grouping is shown.
Label declaration
1: shake absorbing buffer; 2: background noise generating unit; 3: audio decoder portion; 4: Speeking speed changing portion; 5: output buffer; 6: output buffer monitoring unit; 7: control part; 71: buffer surplus monitoring unit; 72: control signal efferent; 73: arrival rate monitoring unit; 8: high accuracy compression of soundless part portion; 9: audio detection portion; 10: audio coding portion; 11: compression of soundless part control part; 12: background noise data detection/insertion section; 20: audio decoding apparatus; 21: audio coding apparatus.
Embodiment
Below, embodiments of the present invention are described.In addition, following execution mode is an example of the present invention, and the present invention is not limited by following execution mode.
Execution mode 1.
Fig. 1 is the functional block structure chart that the audio decoding apparatus of one embodiment of the present of invention is shown.
In Fig. 1, shake absorbing buffer 1 is accumulated the grouping receiving temporarily, and this grouping of output timing output specifying.The background noise data that background noise generating unit 2 comprises according to the grouping of exporting from shake absorbing buffer 1, the voice data of generation background noise.Audio decoder portion 3 decodes to the coded audio data comprising from the grouping of exporting of shake absorbing buffer 1 and generates the voice data of voice.Speeking speed changing portion 4 carries out the Speeking speed changing to being converted by the reproduction speed of audio decoder portion 3 decoded voice datas.The voice data of the voice data of output buffer 5 to the background noise being generated by above-mentioned background noise generating unit 2 and the voice that generated by above-mentioned audio decoder portion 3 is accumulated temporarily.Output buffer monitoring unit 6 monitors the storage capacity of the voice data that output buffer 5 accumulates, and according to this storage capacity, shake absorbing buffer 1 is indicated the output timing of the grouping of temporarily accumulating.Control part 7, according to the situation of accumulating of the grouping in shake absorbing buffer 1, is controlled the time span of the background noise being generated by background noise generating unit 2, and controls the reproduction speed after being converted by Speeking speed changing portion 4.
In the present embodiment, control part 7 has buffer surplus monitoring unit 71 and control signal efferent 72.Buffer surplus monitoring unit 71 monitors the surplus of shake absorbing buffer 1, as the situation of accumulating of the grouping in shake absorbing buffer 1.Control signal efferent 72, according to the shake absorbing buffer surplus being monitored by buffer surplus monitoring unit 71, is exported the time span control signal that the time span of the background noise being generated by background noise generating unit 2 is controlled and the reproduction speed control signal that the reproduction speed being converted by Speeking speed changing portion 4 is controlled.
Then action is described.
In addition, in the present embodiment, illustrate and carry out the action voice call between the two, but the present invention being not limited by it in user and user's partner.
First,, in the time of user's partner sounding, its voice are encoded and form grouping in partner side, and received in user's side by network.In the time that user's side joint is received the grouping sending from partner side, shake absorbing buffer 1 is accumulated the grouping that this receives temporarily like this.Shake absorbing buffer 1 is accumulated the grouping that after the grouping of predetermined initial delay amount, output is accumulated temporarily successively, and the fluctuation that arrives delay to absorb grouping shakes, and makes it possible to the timing output grouping after smoothing.Wherein, from the output timing of shake absorbing buffer 1 according to the instruction from output buffer monitoring unit 6.
The grouping of exporting from shake absorbing buffer 1 is divided into the background noise grouping that comprises background noise data and the audio packet that comprises coded audio data is processed.The in the situation that of audio packet, this grouping is imported into audio decoder portion 3, and in the situation that background noise divides into groups, this grouping is imported into background noise generating unit 2.With together with background noise grouping, by this background noise grouping and the time difference of next grouping, for example represent respectively the transmitting time that is given to background noise grouping and next grouping timestamp value difference as background noise rise time length pass to background noise generating unit 2 from shake absorbing buffer 1.
Use the detailed action of figure explanation.Fig. 2 is the key diagram that the relation between timestamp and shake the accumulating of absorbing buffer of grouping is shown.
In Fig. 2, audio packet #1, the #2 of the coded audio data that comprises the t time, #4, and the background noise that comprises background noise data grouping #3 arrives according to the order of #1, #2, #3, #4 respectively, and accumulate in shake absorbing buffer 1 temporarily.
In the time giving sequence number N, timestamp value M to the grouping of the #3 as background noise dividing into groups, the sequence number of #1 grouping is that the sequence number of N-2, #2 grouping is that the sequence number that N-1, #4 divide into groups is N+1, and the timestamp value of #1 grouping is that the timestamp value of M-2t, #2 grouping is M-t.The timestamp value of #4 grouping become through noise siding-to-siding block length be time after the T time, i.e. M+T.It is #3 grouping is timestamp value poor of #4 grouping with next grouping that background noise rise time length becomes background noise grouping, i.e. (M+T)-M=T.
Be transfused to the background noise generating unit 2 of background noise grouping and background noise rise time length according to the background noise data generation background noise of storing in background noise grouping, make the generation of background noise continue background noise rise time length, and voice data as background noise output to output buffer 5.
Be transfused to the audio decoder portion 3 of audio packet by the coded audio data of storing in audio packet is decoded and become the voice data of voice next life, and outputed to Speeking speed changing portion 4.Voice data by Speeking speed changing portion 4 voice after treatment is input to output buffer 5.
Output buffer monitoring unit 6 monitors have or not (storage capacity of the voice data of accumulating) of voice data that output buffer 5 accumulates, in the case of being judged as the not input from background noise generating unit 2 and Speeking speed changing portion 4 (amount than regulation is few), to the output timing of shake absorbing buffer 1 instruction grouping, to export 1 grouping being accumulated in shake absorbing buffer 1.
Buffer surplus monitoring unit 71 monitors the amount that is temporarily accumulated in the grouping in shake absorbing buffer 1, in the situation that buffer surplus is less than certain threshold value A, " little " notified to control signal efferent 72, more than certain threshold value B in the situation that, " greatly " notified to control signal efferent 72, when at certain more than threshold value A and at certain below threshold value B in the situation that, will " in " notice is to control signal efferent 72.
Receive control signal efferent 72 output time length control signals and reproduction speed control signal from the notice of buffer surplus monitoring unit 71, the buffer surplus that this time span control signal (instruction) is controlled to shake absorbing buffer 1 more more shortens background noise rise time length, the buffer surplus that this reproduction speed control signal (instruction) is controlled to shake absorbing buffer 1 sooner word speed reproduce.
For example, the Control the content of recording according to table 1, notified if " little ", send to background noise generating unit 2 and extend the instruction of background noise rise time length, be for example extended for the instruction of 1.1 times, sending the instruction of slowly reproducing, for example slow down to Speeking speed changing portion 4 is the instruction of 0.8 times.For example, if notified " greatly ", sends to background noise generating unit 2 and shorten the instruction of background noise rise time length, shorten to the instruction of 0.9 times, sending the instruction accelerating to reproduce, for example accelerate to Speeking speed changing portion 4 is the instruction of 1.2 times.If notified " in ", send background noise rise time length is made as to the instruction of normal length, is for example the instruction of 1.0 times to background noise generating unit 2, send reproduction is made as to the instruction of common speed, is for example the instruction of 1.0 times to Speeking speed changing portion 4.
[table 1]
As mentioned above, according to present embodiment, send the instruction of interlock from control part 7 to background noise generating unit 2 and Speeking speed changing portion 4., according to the situation of accumulating of the grouping in shake absorbing buffer 1, control the time span of the background noise being generated by background noise generating unit 2, and control the reproduction speed being converted by Speeking speed changing portion 4.Thus, control and send interval different background noise (noiseless interval) and voice (between ensonified zone) respectively, even if therefore also can prevent that speech quality is deteriorated being applied to not necessarily while sending the compression of soundless part of grouping with fixed intervals.
As the situation of accumulating of the grouping in shake absorbing buffer 1, according to the surplus of shake absorbing buffer 1, the time span control signal of time span and the reproduction speed control signal of the reproduction speed that control is converted by Speeking speed changing portion 4 of the background noise being generated by background noise generating unit 2 controlled in output, can carry out appropriate jitter buffer control according to the surplus of shake absorbing buffer 1 thus, even also can prevent that speech quality is deteriorated in the time of application compression of soundless part.
According to threshold value A, threshold value B will shake absorbing buffer surplus divide for " little ", " in ", " greatly " three classes are illustrated, but can, by the further control of segmentation, carry out finer control.
In addition, also follow surplus to change and change although control, but by according to the change direction of surplus to distinguish " little ", " in ", the different threshold value of Threshold of " greatly ", can avoid control frequently to change due to the increase and decrease of the surplus at Near Threshold place, better speech quality can be provided.For example, threshold value C, threshold value D in the situation of the change direction that can change by the direction of setting in increasing towards shake absorbing buffer surplus, threshold value E, threshold value F with in the situation of the change direction that changes of direction in towards reducing, provide better speech quality.
In addition, in background noise generating unit 2, can be by shortening the background noise rise time length in the situation that, make the background noise rise time not be shorter than certain length regular time, better speech quality is provided.
In addition, in the above description, be extended for 1.1 times or shorten to 0.9 times by being designated as from the instruction for background noise generating unit 2 of control part 7, but can be also for example to extend 100ms or shorten the instruction about the time quantum of increase and decrease such as 200ms.
In addition, the situation with output buffer 5 and output buffer monitoring unit 6 has been described, but also can have deleted output buffer 5 and output buffer monitoring unit 6.For example, shake absorbing buffer 1 also can be configured to and have the output timing output grouping of specific time interval.And for example, also can be configured to according to the situation of accumulating of the grouping in shake absorbing buffer, in the output timing output grouping corresponding with the control of control part 7.
Execution mode 2.
Fig. 3 is the functional block structure chart that the audio decoding apparatus of one embodiment of the present of invention is shown.
In Fig. 3, the part identical or corresponding with above-mentioned execution mode is shown with same label, and description thereof is omitted.
In Fig. 3, control part 7 has buffer surplus monitoring unit 71, control signal efferent 72 and arrival rate monitoring unit 73.Arrival rate monitoring unit 73 monitors the arrival rate that is accumulated in the grouping in shake absorbing buffer 1.In the present embodiment, control signal efferent 72 is according to accumulating surplus situation, that monitored by buffer surplus monitoring unit 71 and the arrival rate being monitored by arrival rate monitoring unit 73 as the grouping in shake absorbing buffer, and the time span control signal of time span and the reproduction speed control signal of the reproduction speed that control is converted by Speeking speed changing portion 4 of the background noise being generated by background noise generating unit 2 controlled in output.
Then action is described.
In addition, in the present embodiment, illustrate in the case of user and user's partner carried out the action voice call between the two, but the invention is not restricted to this.
First,, in the time of user's partner sounding, its voice are encoded and form grouping in partner side, and received in user's side by network.In the time that user's side joint is received the grouping sending from partner side, shake absorbing buffer 1 is accumulated the grouping that this receives temporarily like this.Shake absorbing buffer 1 is accumulated the grouping that after the grouping of predetermined initial delay amount, output is accumulated temporarily successively, and the fluctuation that arrives delay to absorb grouping shakes, and makes it possible to the timing output grouping after smoothing.Wherein, from the output timing of shake absorbing buffer 1 according to the instruction from output buffer monitoring unit 6.
The grouping of exporting from shake absorbing buffer 1 is divided into the background noise grouping that comprises background noise data and the audio packet that comprises coded audio data is processed.The in the situation that of audio packet, this grouping is imported into audio decoder portion 3, and in the situation that background noise divides into groups, this grouping is imported into background noise generating unit 2.From shake absorbing buffer 1, with together with background noise grouping, by this background noise grouping and the time difference of next grouping, for example represent respectively the transmitting time that is given to background noise grouping and next grouping timestamp value difference as background noise rise time length pass to background noise generating unit 2.
Be transfused to the background noise generating unit 2 of background noise grouping and background noise rise time length according to the background noise data generation background noise of storing in background noise grouping, make the generation of background noise continue background noise rise time length, and voice data as background noise output to buffer 5.
Be transfused to the audio decoder portion 3 of audio packet by the coded audio data of storing in audio packet is decoded and become the voice data of voice next life, and outputed to Speeking speed changing portion 4.Voice data by Speeking speed changing portion 4 voice after treatment is imported into output buffer 5.
Output buffer monitoring unit 6 monitors have or not (storage capacity of the voice data of accumulating) of voice data that output buffer 5 accumulates, in the case of being judged as the not input from background noise generating unit 2 and Speeking speed changing portion 4 (amount than regulation is few), to the output timing of shake absorbing buffer 1 instruction grouping, to export 1 grouping being accumulated in shake absorbing buffer 1.
Buffer surplus monitoring unit 71 monitors the amount that is temporarily accumulated in the grouping in shake absorbing buffer 1, in the situation that buffer surplus is less than certain threshold value A, " little " notified to control signal efferent 72, more than certain threshold value B in the situation that, " greatly " notified to control signal efferent 72, when at certain more than threshold value A and at certain below threshold value B in the situation that, will " in " notice is to control signal efferent 72.
Arrival rate monitoring unit 73 monitors the arrival rate of the grouping of input (arrival) shake absorbing buffer 1, in the case of inputting with the speed slower than certain threshold alpha, notify to control signal efferent 72 as " low speed ", in the case of inputting with the speed faster than certain threshold value beta, notify to control signal efferent 72 as " at a high speed ", be not less than certain threshold alpha and not higher than certain threshold value beta in the situation that, notifying to control signal efferent 72 as " middling speed ".
Receive control signal efferent 72 output time length control signals and reproduction speed control signal from the notice of buffer surplus monitoring unit 71 and arrival rate monitoring unit 73; This time span control signal (instruction) is controlled as the buffer surplus of shake absorbing buffer 1 more more shortens background noise rise time length, and the arrival rate of the grouping of input (arrival) shake absorbing buffer 1 more more shortens background noise rise time length; This reproduction speed control signal (instruction) is controlled the larger reproduction of all the more fast word speed of buffer surplus for shake absorbing buffer 1, input (arrivals) shake absorbing buffer 1 grouping arrival rate sooner word speed reproduce.
The Control the content of for example recording according to table 2, sends instruction to background noise generating unit 2 and Speeking speed changing portion 4.For background noise generating unit 2, in the situation that being made as " prolongation ", send for example instruction of 1.1 times, in the situation that being made as " further extending ", send for example instruction of 1.3 times, in the situation that being made as " shortening ", send for example instruction of 0.9 times, in the situation that being made as " further shortening ", send for example instruction of 0.5 times, in the situation that being made as " common ", send for example instruction of 1.0 times.For Speeking speed changing portion 4, in the situation that being made as " slowly ", send for example instruction of 0.8 times, in the situation that being made as " slower ", send for example instruction of 0.6 times, in the situation that being made as " quickening ", send for example instruction of 1.2 times, in the situation that being made as " further accelerating ", send for example instruction of 1.4 times, in the situation that being made as " common ", send for example instruction of 1.0 times.
[table 2]
As mentioned above, according to present embodiment, send the instruction of interlock from control part 7 to background noise generating unit 2 and Speeking speed changing portion 4.; according to the situation of accumulating of the grouping in shake absorbing buffer 1; control the time span of the background noise being generated by background noise generating unit 2; and control the reproduction speed being converted by Speeking speed changing portion 4; control and send interval different background noise (noiseless interval) and voice (between ensonified zone) respectively thus, even if therefore also can prevent that speech quality is deteriorated being applied to not necessarily while sending the compression of soundless part of grouping with fixed intervals.
According to as shake grouping in absorbing buffer 1 accumulate situation, the arrival rate of the surplus of shake absorbing buffer 1 and arrival shake absorbing buffer 1, the time span control signal of the time span of the background noise being generated by background noise generating unit 2 is controlled in output, with the reproduction speed control signal of controlling the reproduction speed being converted by Speeking speed changing portion 4, can carry out appropriate jitter buffer control according to the surplus of shake absorbing buffer 1 thus, even and if stagnate in the reception of grouping temporarily, then stagnate releasing and arrive in the situations of a large amount of groupings quickly, also can buffer can be overflowed to the appropriate jitter buffer control preventing trouble before it happens by monitoring that arrival rate realizes, even also can prevent that speech quality is deteriorated in the time of application compression of soundless part.
Although according to threshold value A, threshold value B will shake absorbing buffer surplus divide for " little ", " in ", " greatly " three classes, according to threshold alpha, threshold value beta, arrival rate is divided and is illustrated for " low speed ", " middling speed ", " at a high speed " three classes, but can, by further segmentation control, carry out finer control.
In addition, also follow shake absorbing buffer surplus and arrival rate change and change although control, but by according to the change direction of surplus and speed to distinguish " little ", " in ", the different threshold value of Threshold of " greatly ", " low speed ", " middling speed ", " at a high speed ", can avoid control frequently to change due to the increase and decrease of the surplus at Near Threshold place, better speech quality can be provided.For example, set threshold value C, threshold value D in the situation of the change direction that the direction in increasing towards shake absorbing buffer surplus changes, and threshold value E, threshold value F in the situation of change direction in changing towards the direction reducing.In addition, set threshold gamma, threshold value δ in the situation of change direction that the direction in accelerating towards arrival rate changes, and threshold epsilon, threshold value ζ in the situation of change direction in changing towards the direction slowing down.Thus, can provide better speech quality.
In addition, in background noise generating unit 2, can be by shortening the background noise rise time length in the situation that, make the background noise rise time not be shorter than certain length regular time, and better speech quality is provided.
In addition, in the above description, will be designated as 1.1 times or 0.9 times from the instruction to background noise generating unit 2 of control part 7, but can be also for example to extend the instruction about the time quantum of increase and decrease such as 100ms or shortening 200ms.
In addition, the control part 7 with buffer surplus monitoring unit 71 and arrival rate monitoring unit 73 has been described, delete buffer surplus monitoring unit 71 but also can be configured to, control signal efferent 72 is according to arrival rate, output time length control signal and the reproduction speed control signal of the arrival shake absorbing buffer being monitored by arrival rate monitoring unit 73.
In addition, the situation with output buffer 5 and output buffer monitoring unit 6 has been described, but also can have deleted output buffer 5 and output buffer monitoring unit 6.For example, shake absorbing buffer 1 also can be configured to and have the output timing output grouping of specific time interval.And for example, can be configured to according to the situation of accumulating of the grouping in shake absorbing buffer, in the output timing output grouping corresponding with the control of control part 7.
Execution mode 3.
Fig. 4 is the functional block structure chart that the audio decoding apparatus of one embodiment of the present of invention is shown.
In Fig. 4, or corresponding part identical with above-mentioned execution mode are shown with same label, and description thereof is omitted.
In Fig. 4, high accuracy compression of soundless part portion 8 analyzes the grouping receiving, detecting between noiseless/noise range in the case of the coded audio data comprising from this grouping, this grouping is replaced into the background noise grouping that comprises background noise data, in the situation that not detecting between noiseless/noise range, output grouping in the situation that not replacing.
Then action is described.
In addition, in the present embodiment, illustrate in the case of user and user's partner carry out the action voice call between the two, but the invention is not restricted to this.
First,, in the time of user's partner sounding, its voice are encoded and form grouping in partner side, and received in user's side by network.In the coding of partner side, carry out compression of soundless part, in background noise interval, export background noise grouping, between speech region in output audio divide into groups, and arrive the audio decoding apparatus of user's side.The low precision of the compression of soundless part function in the audio coding apparatus of partner side, no matter to be in fact whether background noise interval, all as audio packet output grouping.Or, in the audio coding apparatus of partner side, do not implement compression of soundless part function, and all groupings exported as audio packet.No matter be which kind of situation, be all provided with high accuracy compression of soundless part portion 8 in the mode that can realize appropriate shake absorbing buffer control in the audio decoding apparatus of user's side.
In the time that user's side joint is received the grouping sending from partner side, high accuracy compression of soundless part portion 8 analyzes the grouping receiving, and finds out more accurately between noise range the coded data of storing from the audio packet receiving.Detecting between noiseless/noise range in the case of the coded audio data comprising from this grouping, this grouping is replaced into the background noise grouping that comprises background noise data, and outputs to shake absorbing buffer 1.In the time not detecting between noiseless/noise range, in the situation that not dividing into groups displacement, grouping is outputed to shake absorbing buffer 1.Action is afterwards identical with above-mentioned execution mode.
As mentioned above, according to present embodiment, the grouping receiving is analyzed, detecting between noiseless/noise range in the coded audio data comprising in this grouping, this grouping is replaced into the background noise grouping that comprises background noise data, in the time not detecting between noiseless/noise range, output grouping in the situation that not dividing into groups displacement, no matter the other side's audio coding apparatus has or not the quality of compression of soundless part function or compression of soundless part function thus, all control respectively background noise (noiseless interval) and voice (between ensonified zone), therefore can realize appropriate shake absorbing buffer control, can further prevent that speech quality is deteriorated.
In addition, in the present embodiment, illustrate that arrival rate monitoring unit 73 monitors the situation of the arrival rate of the grouping that is input to high accuracy compression of soundless part portion 8, but also can be configured to the arrival rate that monitors grouping between high accuracy compression of soundless part portion 8 and shake absorbing buffer 1.
In addition, the control part 7 with buffer surplus monitoring unit 71 and arrival rate monitoring unit 73 has been described, but also can be configured to any one party having in buffer surplus monitoring unit 71 and arrival rate monitoring unit 73, and output time length control signal and reproduction speed control signal.
In addition, the situation with output buffer 5 and output buffer monitoring unit 6 has been described, but also can have deleted output buffer 5 and output buffer monitoring unit 6.For example, shake absorbing buffer 1 also can be configured to and have the output timing output grouping at official hour interval.And, for example, also can be configured to according to the situation of accumulating of the grouping in shake absorbing buffer, in the output timing output grouping corresponding with the control of control part 7.
Execution mode 4.
Fig. 5 is the functional block structure chart that the audio decoding apparatus of one embodiment of the present of invention is shown.
In Fig. 5, or corresponding part identical with above-mentioned execution mode are shown with same label, and description thereof is omitted.
In Fig. 5, audio decoding apparatus 20 is decoded to the coded audio data of receiving at user's side joint.Audio coding apparatus 21 is encoded to the voice that will send from user's side.The sounding that has no user detects in audio detection portion 9.In the present embodiment, be " voice " every fixed interval to inputted voice data, still not for " noise " of voice judged.The in the situation that of being " voice " at voice data, be judged to be to exist user's sounding, the in the situation that of being " noise " at voice data, be judged to be not exist user's sounding.
Audio coding portion 10 encodes to voice data, and output audio coded data.Compression of soundless part control part 11 is in the situation that being judged to be " voice " by audio detection portion 9, and output, from the coded audio data of audio coding portion 10, is exported off and on background noise data from audio coding portion 10 in the situation that being judged to be " noise ".
In addition, in the present embodiment, shake absorbing buffer 1 is configured to and is being detected by audio detection portion 9 there is user's sounding in the situation that, makes to turn back in buffer initial condition.
Then action is described.
In addition, in the present embodiment, illustrate in the case of user and user's partner carry out the action voice call between the two, but the invention is not restricted to this.
In audio coding apparatus 21, voice data is input to audio detection portion 9 and audio coding portion 10.Audio detection portion 9 is " voice " every fixed interval to inputted voice data, still " noise " that be not voice judged, and its result outputed to audio coding portion 10, compression of soundless part control part 11 and the shake absorbing buffer 1 in audio decoding apparatus 20.Audio coding portion 10 is notified be " voice " in the situation that, the coded data of the voice data of inputting, and notified be " noise " in the situation that, output background noise data.Compression of soundless part control part 11 is notified be " voice " in the situation that, and output is from the coded audio data of audio coding portion 10, exports off and on background noise data notified be " noise " in the situation that from audio coding portion 10.Also to the result of determination of shaking absorbing buffer 1 notification audio test section 9.Shake absorbing buffer 1 continues common processing be " noise " in the situation that notified, but be " voice " in the situation that, gives up the audio packet being accumulated in shake absorbing buffer 1 notified, and from initial condition beginning reason again.
In the case of having inputted the voice data of " voice " to audio coding apparatus 21, be user just at the state of sounding, conventionally at this moment, not sounding of user's partner.Therefore, in this situation, there is no need user's side decode process possibility high, therefore be accumulated in the audio packet in shake absorbing buffer 1 and return to initial condition by giving up, can start sounding and in the time that user's side starts decoding and processes in user's partner, can be never to approach buffer the initial condition exhausted or state that overflows to rise and shake absorbing buffer control.
As mentioned above, according to present embodiment, in the case of having inputted to audio coding apparatus 21 the voice data of " voice ", be accumulated in the audio packet in shake absorbing buffer 1 and return to initial condition by giving up, user's partner start sounding and user's side start decoding process time, can be never that the initial condition that approaches buffer exhaustion or overflow status rises and shakes absorbing buffer control, therefore can realize more appropriate control, can further prevent that speech quality is deteriorated.
In addition, in audio coding apparatus 21, not necessarily need to apply compression of soundless part, also can there is audio detection portion 9, and obtain this result of determination by shake absorbing buffer 1.
In addition, the control part 7 with buffer surplus monitoring unit 71 and arrival rate monitoring unit 73 has been described, but also can be configured to any one party having in buffer surplus monitoring unit 71 and arrival rate monitoring unit 73, and output time length control signal and reproduction speed control signal.
In addition, the situation with output buffer 5 and output buffer monitoring unit 6 has been described, but also can have deleted output buffer 5 and output buffer monitoring unit 6.For example, shake absorbing buffer 1 also can be configured to and have the output timing output grouping at official hour interval.And, for example, also can be configured to according to the situation of accumulating of the grouping in shake absorbing buffer, in the output timing output grouping corresponding with the control of control part 7.
Execution mode 5.
Fig. 6 is the functional block structure chart that the audio decoding apparatus of one embodiment of the present of invention is shown.
In Fig. 6, or corresponding part identical with above-mentioned execution mode are shown with same label, and description thereof is omitted.
In Fig. 6, background noise data detection/insertion section 12 is detected the grouping receiving and whether is comprised background noise data, detecting comprise background noise data in the situation that, by number and background noise data noiseless/the suitable following grouping of time span between noise range is inserted in shake absorbing buffer 1, the time span of every 1 grouping of this grouping equates with the time span of every 1 grouping of the grouping that comprises coded audio data.
Then action is described.
In addition, in the present embodiment, illustrate in the case of user and user's partner carry out the action voice call between the two, but the invention is not restricted to this.
First,, in the time of user's partner sounding, its voice are encoded and form grouping in partner side, and received in user's side by network.
In background noise data detection/insertion section 12, whether the grouping that detection receives is the background noise grouping that comprises background noise data, in the situation that background noise grouping being detected, by number and background noise data noiseless/the suitable grouping of time span between noise range is inserted in shake absorbing buffer 1, the time span of every 1 grouping of this grouping equates with the time span of every 1 grouping of the grouping that comprises coded audio data.
Use the detailed action of brief description of the drawings.Fig. 7 is the key diagram that the relation between timestamp and shake the accumulating of absorbing buffer of grouping is shown.
In Fig. 7, the audio packet #1 of the coded audio data that comprises the t time, #2, #4 and the background noise grouping #3 that comprises background noise data arrive according to the order of #1, #2, #3, #4 and accumulate in shake absorbing buffer 1 temporarily.In the time giving sequence number N, timestamp value M to the grouping of the #3 as background noise dividing into groups, the sequence number of #1 grouping is that the sequence number of N-2, #2 grouping is that the sequence number that N-1, #4 divide into groups is N+1, and the timestamp value of #1 grouping is that the timestamp value of M-2t, #2 grouping is M-t.The timestamp value of #4 grouping becomes through as noise siding-to-siding block length being time after the T time, i.e. M+T.
Background noise data detection/insertion section 12 is in the time of the #3 grouping that as background noise grouping detected, pre-stored its sequence number N and timestamp value M, #3 grouping is outputed to shake absorbing buffer 1, and wait for the arrival of the grouping that is N+1 as the sequence number of next grouping.Its timestamp value M+T, in the time that the grouping of sequence number N+1, i.e. #4 grouping arrive, is found out in background noise data detection/insertion section 12, and calculates the time span T between the noise range being present between #2 grouping and #4 grouping.For background noise grouping and the audio packet existing with the t time interval are similarly also existed with the t time interval, by and the noise range of T time between after the background noise grouping of suitable X t time is inserted into the #2 grouping in shake absorbing buffer 1, then #4 grouping is outputed to shake absorbing buffer 1.Thus, make in shake absorbing buffer 1, have temporally audio packet or background noise grouping every t.
Buffer surplus monitoring unit 71 monitors the amount that is temporarily accumulated in the grouping in shake absorbing buffer 1, in the situation that being less than certain threshold value A as buffer surplus, " little " notified to control signal efferent 72, more than certain threshold value B in the situation that, " greatly " notified to control signal efferent 72, when at certain more than threshold value A and certain below threshold value B in the situation that will " in " notice is to control signal efferent 72.
Arrival rate monitoring unit 73 monitors the arrival rate of the grouping of input (arrival) shake absorbing buffer 1, in the case of having carried out with the speed slower than certain threshold alpha input, notify to control signal efferent 72 as " low speed ", in the case of having carried out with the speed faster than certain threshold value beta input, notify to control signal efferent 72 as " at a high speed ", be not less than certain threshold alpha and not higher than certain threshold value beta in the situation that, notifying to control signal efferent 72 as " middling speed ".
Receive and export (instruction) from the control signal efferent 72 of the notice of buffer surplus monitoring unit 71 and arrival rate monitoring unit 73 and control and more greatly more shorten background noise rise time length for the buffer surplus of shake absorbing buffer 1, the arrival rate of the grouping of input (arrival) shake absorbing buffer 1 more shortens more at a high speed the time span control signal of background noise rise time length, and the fast word speed of buffer surplus that output (instruction) is controlled as shaking absorbing buffer 1 is reproduced, the arrival rate of the grouping of input (arrival) the shake absorbing buffer 1 reproduction speed control signal that fast word speed is reproduced.
In the control signal efferent 72 receiving from the notice of buffer surplus monitoring unit 71 and arrival rate monitoring unit 73, the Control the content of for example recording according to table 2, sends instruction to shake absorbing buffer 1 and Speeking speed changing portion 4.For shake absorbing buffer 1, for " prolongation " in the situation that, send the instruction of for example inserting 1 background noise grouping, for " further extending " in the situation that, send the instruction of for example inserting 3 background noise groupings, for " shortening " in the situation that, send the instruction of for example deleting 1 background noise grouping, for " further shortening " in the situation that, send the instruction of for example deleting 3 background noise groupings, for " common " in the situation that, send for example without the instruction of inserting/deleting.For Speeking speed changing portion 4, for " slowly " in the situation that, send for example instruction of 0.8 times, for " slower " in the situation that, send for example instruction of 0.6 times, for " quickening " in the situation that, send for example instruction of 1.2 times, for " further accelerating " in the situation that, send for example instruction of 1.4 times, for " common " in the situation that, send for example instruction of 1.0 times.
As mentioned above, according to present embodiment, according to shake absorbing buffer surplus and arrival rate, send interlock instruction from control part 7 to shaking absorbing buffer 1 and Speeking speed changing portion 4.; according to the situation of accumulating of the grouping in shake absorbing buffer 1; control the time span of the background noise being generated by background noise generating unit 2; and control the reproduction speed being converted by Speeking speed changing portion 4; control and send interval different background noise (noiseless interval) and voice (between ensonified zone) respectively thus, even if therefore also can prevent that speech quality is deteriorated being applied to not necessarily while sending the compression of soundless part of grouping with fixed intervals.
Detecting comprise background noise data in the situation that, by by number and background noise data noiseless/the suitable grouping of time span between noise range is inserted in shake absorbing buffer 1, control the time span of the background noise being generated by background noise generating unit 2, wherein, the time span of every 1 grouping of this grouping equates with the time span of every 1 grouping of the grouping that comprises coded audio data, can control with the number that is accumulated in the grouping in shake absorbing buffer 1 thus, therefore can simplify the processing of background noise generating unit 2.
In addition,, even if stagnate in the reception of dividing into groups, then stagnate and eliminate and arrive a large amount of grouping temporarily quickly, also can buffer can be overflowed to the appropriate jitter buffer control preventing trouble before it happens by monitoring that arrival rate realize.
According to threshold value A, threshold value B will shake absorbing buffer surplus divide for " little ", " in ", " greatly " three classes, according to threshold alpha, threshold value beta, arrival rate is divided and is illustrated for " low speed ", " middling speed ", " at a high speed " three classes, but can, by further segmentation control, carry out finer control.
In addition, control and also follow shake absorbing buffer surplus and arrival rate change and change, but by according to the change direction of surplus and speed to distinguish " little ", " in ", the different threshold value of Threshold of " greatly ", " low speed ", " middling speed ", " at a high speed ", can avoid control frequently to change due to the increase and decrease of the surplus at Near Threshold place, better speech quality can be provided.For example, set threshold value C, threshold value D in the situation of the change direction that the direction in increasing towards shake absorbing buffer surplus changes, and threshold value E, threshold value F in the situation of change direction in changing towards the direction reducing.In addition, set threshold gamma, threshold value δ in the situation of change direction that the direction in accelerating towards arrival rate changes, and threshold epsilon, threshold value ζ in the situation of change direction in changing towards the direction slowing down.Thus, can provide better speech quality.
In addition, in the present embodiment, be illustrated based on packetization period, but comprise multiple audio coding frame in 1 grouping in the situation that, also can the time span based on this audio coding frame control.
In addition, the as background noise action of Data Detection/insertion section 12, during before can dividing into groups after the #3 grouping of as background noise dividing into groups arrives and as the #4 of audio packet, every process t time is inserted into background noise grouping in shake absorbing buffer 1 successively.
In addition, in background noise generating unit 2, can be by shortening the background noise rise time length in the situation that, make the background noise rise time unlike certain regular time length short, better speech quality is provided.
In addition, the control part 7 with buffer surplus monitoring unit 71 and arrival rate monitoring unit 73 has been described, delete arrival rate monitoring unit 73 but also can be configured to, and according to supervision result output time length control signal and the reproduction speed control signal of buffer surplus monitoring unit 71.
In addition, the situation with output buffer 5 and output buffer monitoring unit 6 has been described, but also can have deleted output buffer 5 and output buffer monitoring unit 6.For example, shake absorbing buffer 1 also can be configured to and have the output timing output grouping at official hour interval.And for example, also can be configured to according to the situation of accumulating of the grouping in shake absorbing buffer, in the output timing output grouping corresponding with the control of control part 7.

Claims (7)

1. an audio decoding apparatus, is characterized in that, has:
Shake absorbing buffer, it accumulates the grouping receiving temporarily, and this grouping of output timing output specifying;
Background noise generating unit, it is according to the background noise data comprising from the grouping of above-mentioned shake absorbing buffer output, the voice data of generation background noise;
Audio decoder portion, it is decoded to the coded audio data comprising from the grouping of above-mentioned shake absorbing buffer output and generates the voice data of voice;
Speeking speed changing portion, it carries out the Speeking speed changing to being converted by the reproduction speed of the decoded above-mentioned voice data of above-mentioned audio decoder portion; And
Control part, it is according to the situation of accumulating of the grouping in above-mentioned shake absorbing buffer, controls the time span of the above-mentioned background noise being generated by above-mentioned background noise generating unit, and controls the above-mentioned reproduction speed being converted by above-mentioned Speeking speed changing portion.
2. audio decoding apparatus according to claim 1, is characterized in that,
Above-mentioned control part has:
Buffer surplus monitoring unit, it monitors the surplus of above-mentioned shake absorbing buffer, as the above-mentioned situation of accumulating; And
Control signal efferent, it is according to the above-mentioned surplus that monitored by above-mentioned buffer surplus monitoring unit, output control the above-mentioned background noise being generated by above-mentioned background noise generating unit time span time span control signal and control the reproduction speed control signal of the above-mentioned reproduction speed being converted by above-mentioned Speeking speed changing portion.
3. audio decoding apparatus according to claim 1, is characterized in that,
Above-mentioned control part has:
Arrival rate monitoring unit, it monitors that the above-mentioned grouping receiving arrives the arrival rate of above-mentioned shake absorbing buffer, as the above-mentioned situation of accumulating; And
Control signal efferent, it is according to the above-mentioned arrival rate that monitored by above-mentioned arrival rate monitoring unit, output control the above-mentioned background noise being generated by above-mentioned background noise generating unit time span time span control signal and control the reproduction speed control signal of the above-mentioned reproduction speed being converted by above-mentioned Speeking speed changing portion.
4. audio decoding apparatus according to claim 1, is characterized in that,
Above-mentioned audio decoding apparatus has high accuracy compression of soundless part portion, above-mentioned high accuracy compression of soundless part portion analyzes the above-mentioned grouping receiving, detecting between noiseless/noise range in the case of the coded audio data comprising from this grouping, above-mentioned grouping is replaced into the background noise grouping that comprises background noise data, do not detect above-mentioned noiseless/noise range between in the situation that, in the situation that not replacing, export above-mentioned grouping
Above-mentioned shake absorbing buffer is to accumulate from the grouping of above-mentioned high accuracy compression of soundless part portion output temporarily.
5. audio decoding apparatus according to claim 1, is characterized in that,
Above-mentioned audio decoding apparatus has the audio detection portion that user has or not sounding of detecting,
Above-mentioned shake absorbing buffer, in the case of being detected the sounding that has user by above-mentioned audio detection portion, turns back to initial condition.
6. audio decoding apparatus according to claim 1, is characterized in that,
Above-mentioned audio decoding apparatus has background noise data detection/insertion section, above-mentioned background noise data detection/insertion section is detected the above-mentioned grouping receiving and whether is comprised background noise data, detecting comprise background noise data in the situation that, by number and above-mentioned background noise data noiseless/the suitable following grouping of time span between noise range is inserted in above-mentioned shake absorbing buffer, the time span of every 1 grouping of this grouping equates with the time span of every 1 grouping of the grouping that comprises coded audio data.
7. audio decoding apparatus according to claim 1, is characterized in that, has:
Output buffer, it accumulates the voice data of above-mentioned background noise and the voice data of above-mentioned voice temporarily; And
Output buffer monitoring unit, it monitors the storage capacity of the above-mentioned voice data that above-mentioned output buffer accumulates, and according to this storage capacity, above-mentioned shake absorbing buffer is indicated the output timing of the above-mentioned grouping of temporarily accumulating,
Above-mentioned shake absorbing buffer, according to the instruction from above-mentioned output buffer monitoring unit, is exported the above-mentioned grouping of temporarily accumulating.
CN201410058259.1A 2013-02-28 2014-02-20 Voice decoding apparatus Pending CN104022967A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2013038937A JP2014167525A (en) 2013-02-28 2013-02-28 Audio decoding device
JP2013-038937 2013-02-28

Publications (1)

Publication Number Publication Date
CN104022967A true CN104022967A (en) 2014-09-03

Family

ID=51439541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410058259.1A Pending CN104022967A (en) 2013-02-28 2014-02-20 Voice decoding apparatus

Country Status (4)

Country Link
JP (1) JP2014167525A (en)
KR (1) KR101516113B1 (en)
CN (1) CN104022967A (en)
TW (1) TW201434039A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108924665A (en) * 2018-05-30 2018-11-30 深圳市捷视飞通科技股份有限公司 Reduce method, apparatus, computer equipment and the storage medium of video playing delay

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6399001B2 (en) * 2016-01-07 2018-10-03 ブラザー工業株式会社 Remote conference method and program
JP6451910B1 (en) * 2017-08-02 2019-01-16 オムロン株式会社 Sensor management unit, sensing data distribution system, sensing data evaluation method, and sensing data evaluation program
JP7019117B2 (en) * 2020-02-20 2022-02-14 三菱電機株式会社 Speech speed converter, speech velocity conversion method, program and recording medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1497933A (en) * 2002-09-30 2004-05-19 ������������ʽ���� Network telephone set and voice decoder
US20060171419A1 (en) * 2005-02-01 2006-08-03 Spindola Serafin D Method for discontinuous transmission and accurate reproduction of background noise information
CN1926824A (en) * 2004-05-26 2007-03-07 日本电信电话株式会社 Sound packet reproducing method, sound packet reproducing apparatus, sound packet reproducing program, and recording medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102058714B1 (en) 2011-10-20 2019-12-23 엘지전자 주식회사 Method of managing a jitter buffer, and jitter buffer using same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1497933A (en) * 2002-09-30 2004-05-19 ������������ʽ���� Network telephone set and voice decoder
CN1926824A (en) * 2004-05-26 2007-03-07 日本电信电话株式会社 Sound packet reproducing method, sound packet reproducing apparatus, sound packet reproducing program, and recording medium
US20060171419A1 (en) * 2005-02-01 2006-08-03 Spindola Serafin D Method for discontinuous transmission and accurate reproduction of background noise information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108924665A (en) * 2018-05-30 2018-11-30 深圳市捷视飞通科技股份有限公司 Reduce method, apparatus, computer equipment and the storage medium of video playing delay
CN108924665B (en) * 2018-05-30 2020-11-20 深圳市捷视飞通科技股份有限公司 Method and device for reducing video playing delay, computer equipment and storage medium

Also Published As

Publication number Publication date
KR101516113B1 (en) 2015-05-04
KR20140108119A (en) 2014-09-05
TW201434039A (en) 2014-09-01
JP2014167525A (en) 2014-09-11

Similar Documents

Publication Publication Date Title
EP0743773B1 (en) Timing recovery scheme for packet speech in multiplexing environment of voice with data applications
CN104022967A (en) Voice decoding apparatus
CN103888381A (en) Device and method used for controlling jitter buffer
WO2007132377A1 (en) Adaptive jitter management control in decoder
JPWO2005117366A1 (en) Audio packet reproduction method, audio packet reproduction apparatus, audio packet reproduction program, and recording medium
CN101636990B (en) Method of transmitting data in a communication system
CN1564984A (en) Network media playout
RU2017111578A (en) AUDIO DATA PRINCIPLE
US8090588B2 (en) System and method for providing AMR-WB DTX synchronization
KR101002405B1 (en) Controlling a time-scaling of an audio signal
JPH01999A (en) Pitch position extraction method
JPH09191296A (en) Method and equipment for synchronizing clock for digital decoder and digital coder
CN107005591B (en) Data processing apparatus, data processing method, and program
US9031678B2 (en) Audio time stretch method and associated apparatus
EP3086319B1 (en) Methods and apparatuses for dtx hangover in audio coding
CN1364287A (en) Method for decreasing the processing capacity required by speech encoding and a network element
KR100332526B1 (en) Systems and methods for communicating desired audio information over a communications medium
JP2001506764A (en) Methods and arrangements in telecommunications systems
US7362770B2 (en) Method and apparatus for using and combining sub-frame processing and adaptive jitter-buffers for improved voice quality in voice-over-packet networks
KR20120036788A (en) Information processing device, method therefor, and program
US5897615A (en) Speech packet transmission system
CN101226744A (en) Method and device for implementing voice decode in voice decoder
US7239999B2 (en) Speed control playback of parametric speech encoded digital audio
US20100274918A1 (en) Stream data multiplexing device and multiplexing method
CN104934040A (en) Duration adjustment method and device for audio signal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140903