CN101243490B

CN101243490B - Method and apparatus for encoding and decoding an audio signal

Info

Publication number: CN101243490B
Application number: CN2006800294367A
Authority: CN
Inventors: 房熙锡; 吴贤午; 金东秀; 林宰显; 郑亮源
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2005-06-30
Filing date: 2006-06-30
Publication date: 2013-01-09
Anticipated expiration: 2026-06-30
Also published as: KR20070003594A; CN101243490A; CN101243491B; CN101297352A; KR20070003593A; CN101297352B; CN101243491A

Abstract

A method and apparatus for encoding and decoding an audio signal are provided. The present invention includes receiving an audio signal including a downmix signal and a spatial information signal, if a header is included in the spatial information signal, extracting configuration information from the header, extracting spatial information included in the spatial information signal, and convertingthe downmix signal to a multi-channel signal using the configuration information and the spatial information. Accordingly, the header can be selectively included in the spatial information signal, thereby if the header is plurally included in the spatial information signal, it is able to decode spatial information in case of reproducing the audio signal from a random point.

Description

The method and apparatus of Code And Decode sound signal

Technical field

The present invention relates to Audio Signal Processing, relate more specifically to a kind of devices and methods therefor of Code And Decode sound signal.

Background technology

Substantially, audio signal encoding apparatus becomes the down-mix audio signal of monophony or stereo form with audio signal compression, rather than each sound channel of multi-channel audio signal is compressed.Down-mix audio signal and spatial signal information that down-mix audio signal after audio signal encoding apparatus will compress and spatial signal information (or auxiliary data signal) will be passed to after decoding device also will compress are stored in the storage medium.

In this case, the spatial signal information of extracting in to multi-channel audio signal multi-channel audio process is used for recovering original multi-channel audio signal from the down-mix audio signal after the compression.

Spatial signal information comprises header and spatial information.And, comprise configuration information in the header.Header is the information of version space information.

Audio signal decoder uses the configuration information decoding spatial information that is included in the header.Being included in configuration information in the header is delivered to decoding device with spatial information or is stored in the storage medium.

Down-mix audio signal after audio signal encoding apparatus will be encoded and spatial signal information are multiplexed into a bit stream form and subsequently multiplexed signal are passed to decoding device.Because in general configuration information is constant, the header that therefore contains configuration information once is inserted in the bit stream.Since configuration information only once is sent out being inserted into sound signal the earliest, owing under the situation of random time point reproducing audio signal, do not have configuration information, so audio signal decoder has problems when decoding spatial information.That is, because sound signal is reproduced from the particular point in time of user's request under the situations such as broadcasting, VOD (video request program), rather than reproduces from the beginning, therefore can't use to be included in the configuration information that the form in the sound signal is transmitted.So spatial information of can't decoding.

Summary of the invention

An object of the present invention is to provide a kind of method and apparatus to coding audio signal and decoding, allow sound signal decoded by header is included in the frame of spatial signal information selectively.

Another object of the present invention provides a kind of method and apparatus to coding audio signal and decoding, by in spatial signal information, comprising a plurality of headers, even also can decoded audio signal from the random point reproducing audio signal by audio signal decoder.

For the advantage that realizes these and other and according to purpose of the present invention, such as here embodiment and broadly described, method according to decoded audio signal of the present invention comprises: receive the sound signal that contains down-mix audio signal and spatial signal information, if in spatial signal information, comprise header, then extract configuration information from header, extraction is included in the spatial information in the spatial signal information, and uses configuration information and spatial information to convert down-mix audio signal to multi-channel signal.

Description of drawings

Fig. 1 is the pie graph of sound signal according to an embodiment of the invention.

Fig. 2 is the pie graph of sound signal according to another embodiment of the present invention.

Fig. 3 is the block scheme of the device of decoded audio signal according to an embodiment of the invention.

Fig. 4 is the block scheme of the device of decoded audio signal according to another embodiment of the present invention.

Fig. 5 is the process flow diagram of the method for decoded audio signal according to an embodiment of the invention.

Fig. 6 is the process flow diagram of the method for decoded audio signal according to another embodiment of the present invention.

Fig. 7 is the process flow diagram according to the method for the decoded audio signal of further embodiment of this invention.

Fig. 8 is according to an embodiment of the invention, is used for obtaining representing the process flow diagram of method of the positional information of quantity.

Fig. 9 is the process flow diagram of the method for decoded audio signal according to yet another embodiment of the invention.

Embodiment

Below in detail with reference to preferred embodiment of the present invention, these embodiment are shown in the accompanying drawing.

In order to understand the present invention, the apparatus and method to coding audio signal describe first before the apparatus and method of setting forth decoded audio signal.Yet, be not limited to hereinafter code device and method according to decoding device of the present invention and method.And the present invention is applicable to audio coding scheme and MP3 (MPEG1/2-layer III) and the AAC (advanced audio) that usage space information produces multichannel.

Fig. 1 is the pie graph that is passed to according to an embodiment of the invention the sound signal of audio signal decoder from audio signal encoding apparatus.

With reference to Fig. 1, sound signal comprises audio descriptor 101, down-mix audio signal 103 and spatial signal information 105.

Under the situation of the encoding scheme of the sound signal of using reproducing broadcast etc., sound signal also comprises auxiliary data except audio descriptor 101 and down-mix audio signal 103.The present invention includes the spatial signal information 105 as auxiliary data.In order to make audio signal decoder understand the essential information of audio coding decoding in the situation of not analyzing sound signal, sound signal selectively comprises audio descriptor 101.Audio descriptor 101 is made of the necessary a small amount of essential information of audio decoder, such as the identifier of the transfer rate of the sound signal that sends, channel number, the sample frequency of packed data, the codec of the current use of indication etc. etc.

Audio signal decoder can use audio descriptor 101 to understand one type the encoding and decoding of being used by sound signal.Specifically, use audio descriptor 101, audio signal decoder can understand whether the sound signal that receives is the signal that usage space information signal 105 and down-mix audio signal 103 are recovered multichannel.In this case, multichannel also comprises virtual surrounding except the multichannel of reality.By virtual surrounding technology, having the spatial signal information 105 combined and the sound signal of down-mix audio signal 103 can be heard by one or two sound channel.

The position of audio descriptor 101 is independent of down-mix audio signal 103 or the spatial signal information 105 that is included in the sound signal.For example, sound channel descriptor 101 is arranged in the independent field of indicative audio signal.

Under the situation that does not provide header to down-mix audio signal 103, audio signal decoder can use 101 pairs of multi-channel mixed frequency signals 103 of audio descriptor to decode.

Down-mix audio signal 103 is by multichannel is carried out the signal that multi-channel audio produces.Down-mix audio signal 103 can the down-mix unit (not shown) from be included in the audio signal encoding apparatus (not shown) produce or artificial the generation.

Down-mix audio signal 103 can be divided into the situation that comprises spatial signal information 105 and the situation that does not comprise header.

Comprise that in down-mix audio signal 103 under the situation of header, header is comprised in each frame by frame unit.Do not comprise in down-mix audio signal 103 under the situation of header that as the front instructions was mentioned, audio signal decoder used audio descriptor 101 decoded channels reduction audio signal 103.Down-mix audio signal 103 can adopt the form of the header that every frame comprises or not comprise the form of header.In addition, down-mix audio signal 103 is comprised in the sound signal in the same manner until end of text.

Spatial signal information 105 also is divided into the situation that comprises header and spatial information and only comprises spatial information and do not comprise the situation of header.The difference of the header of the header of spatial signal information 105 and down-mix audio signal 103 is that its header need not to be inserted in the same manner in each frame.Specifically, spatial signal information 105 can be used the frame that comprises header and the frame that does not comprise header in the lump.The most information in spatial signal information 105 headers of being included in are configuration informations by the decode configuration information of spatial information of version space information.

Fig. 2 is according to another embodiment of the present invention, is passed to the pie graph of the sound signal of audio signal decoder from audio signal encoding apparatus.

With reference to Fig. 2, sound signal comprises down-mix audio signal 103 and spatial signal information 105.And sound signal exists with ES (Basic Flow) form of arranging frame.

Each down-mix audio signal 103 and spatial signal information 105 are passed to audio signal decoder as ES form independently once in a while.And as shown in Figure 2, down-mix audio signal 103 and spatial signal information 105 are combined into a kind of ES form and are passed to audio signal decoder with wait.

Be transferred under the situation of audio signal decoder in the down-mix audio signal 103 that is combined into a kind of ES form and spatial signal information 105, spatial signal information 105 is comprised in the position of the auxiliary data (ancillary data) of down-mix audio signal 103 or additional data (extension data).

And, sound signal can comprise indication spatial signal information 105 whether with the signal identification information of down-mix audio signal 103 combinations.

A frame of spatial signal information 105 is divided into the situation that comprises header 201 and spatial information 203 and the situation that only comprises spatial information 203.Specifically, spatial signal information 105 can be used the frame that contains header 201 and the frame that does not contain header 201 together.

In the present invention, header 201 is inserted into spatial signal information 105 at least one times.Specifically, audio signal encoding apparatus can be inserted into header 201 each frame in the spatial signal information 105, periodically header 201 is inserted into the frame of each fixed intervals in the spatial signal information 105 or regardless of periodically header 201 being inserted in the frame of each random interval in the spatial signal information 105.

Sound signal can comprise the information (hereinafter being referred to as " header identification information ") that whether comprises header 201 in the indication frame 201.

Comprise in spatial signal information 105 under the situation of header 201, audio signal decoder extracts configuration information 205 also subsequently according to the configuration information 205 decoding spatial informations 203 that transmit (back) after header 201 from header 201.Because header 201 is the information by 203 decodings of version space information, so header 201 stage before transmit audio signals is transmitted.

Do not comprise in spatial signal information 105 under the situation of header 201,201 pairs of spatial informations 203 of header that the stage transmitted before audio signal decoder used are decoded.

Under the situation that header 201 is lost when sound signal is passed to audio signal decoder from audio signal encoding apparatus or under the situation that the sound signal with the data stream format transmission begins to decode from the intermediate portion for occasions such as broadcasting, the header 201 that transmits before then can't using.In this case, audio signal decoder extracts configuration information 205 decoded audio signals that configuration information 205 also can use extraction subsequently from the header 201 that is different from the last header 201 that at first inserts sound signal.In this case, the configuration information 205 that extracts from the header 201 that is inserted in sound signal can be identical or different with it from the last configuration information 205 that the header 201 that is transmitted from stage before extracts.

If header 201 changes, then extract configuration informations 205 from new header 201, to the configuration information 205 that extracts decode and decode the subsequently spatial information 203 of transmission header 201 after.If header 201 is constant, judge then whether new header 201 is identical with the old header 201 that transmits before.If these two headers 201 differ from one another, then detect in the sound signal on the sound signal transfer path and make a mistake.

The configuration information 205 that extracts from the header 201 of spatial signal information 105 is the information of version space information 203.

Spatial signal information 105 can comprise for distinguishing is using multi-channel mixed frequency signal 103 and spatial signal information 105 to produce the information (being referred to as hereinafter " time alignment information ") of the delay inequality between two kinds of signals of multichannel process by audio signal decoder.

Down-mix audio signal 103 and spatial signal information 105 are resolved and be separated into subsequently to the sound signal that sends audio signal decoder from audio signal encoding apparatus to by the demultiplex unit (not shown).

Decoded by the down-mix audio signal 103 that demultiplex unit separates.Decoded down-mix audio signal 103 usage space information signals 105 produce multichannel.Down-mix audio signal 103 and spatial signal information 105 are being made up to produce in the process of multichannel, audio signal decoder can use the time alignment information (not shown) that is included in from the configuration information 205 that the header 201 of spatial signal information 105 extracts adjust between two signals synchronously, the initial point position of combination two signals etc.

Positional information 207 to the time slot of its application parameter is comprised in the spatial information contained in the spatial signal information 105 203.As spatial parameter (spatial cues) have the CLD (sound channel energy level difference) of energy difference between the indicative audio signal, proximity between the indicative audio signal or similarity ICC (relevant between sound channel), use the CPC (sound channel predictive coefficient) of the coefficient of other signal designation prediction audio signal value.Hereinafter, each spatial cues or a branch of spatial cues are called as " parameter ".

Exist under the situation of N parameter in the frame in being contained in spatial signal information 105, this N parameter is applied to respectively the particular time-slot position of all frames.If the information of indicating a parameter to be applied to being included in which time slot of an all time slot in the frame is called as the positional information 207 of time slot, then audio signal decoder is with the positional information 207 of that time slot of using this parameter spatial information 203 of decoding.In this case, parameter is comprised in the spatial information 203.

Fig. 3 is used for the block diagram of device of decoded audio signal according to an embodiment of the invention.

With reference to Fig. 3, the device of decoded audio signal comprises receiving element 301 and extraction unit 303 according to an embodiment of the invention.

The receiving element 301 of audio signal decoder receives the sound signal that is transmitted with the ES form by audio signal encoding apparatus via input end IN1.

The sound signal that is received by audio signal decoder comprises audio descriptor 101 and down-mix audio signal 103 and may further include spatial signal information 105 as auxiliary data (ancillary data) or growth data (extensiondata).

The configuration information 205 that the header 201 of the extraction unit 303 of audio signal decoder from the sound signal that is included in reception extracts configuration information 205 and extract via output terminal OUT1 output subsequently.

Sound signal can comprise the header identification information that whether comprises header 201 in the frame of distinguishing.

Audio signal decoder uses in the header identification information differentiating frame that is included in the sound signal whether comprise header 201.If comprise header 201, then audio signal decoder extracts configuration information 205 from header 201.In the present invention, comprise at least a header 201 in the spatial signal information 105.

Fig. 4 is used for the block scheme of device of decoded audio signal according to another embodiment of the present invention.

With reference to Fig. 4, the device that is used for according to another embodiment of the present invention decoded audio signal comprises receiving element 301, demultiplex unit 401, core codec unit 403, multichannel generation unit 405, spatial information decoding unit 407 and extraction unit 303.

The receiving element 301 of audio signal decoder receives the sound signal that transmits with the bit stream form via input end IN2 from audio signal encoding apparatus.And receiving element 301 is delivered to demultiplex unit 401 with the sound signal that receives.

Demultiplex unit 401 will be separated into encoded down-mix audio signal 103 and encoded spatial signal information 105 by the sound signal that receiving element 301 sends.The multi-channel mixed frequency signal 103 of the coding that demultiplex unit 401 will separate from bit stream is sent to core codec unit 403 and will passes to extraction unit 303 from the encoded spatial signal information 105 that bit stream separates.

The down-mix audio signal 103 of coding is by 403 decodings of core codec unit and be sent to subsequently multichannel generation unit 405.Encoded spatial signal information 105 comprises header 201 and spatial information 203.

If comprise header 201 in encoded spatial signal information 105, then extraction unit 303 extracts configuration information 205 from header 201.Extraction unit 303 can use the existence of the header identification information resolution header 201 that is included in the sound signal.Specifically, header identification information represents whether comprise header 201 in contained in the spatial signal information 105 frame.But frame sequential or the bit sequence of header identification information indicative audio signal, if comprise header 201 in this frame, then sound signal comprises the configuration information that extracts from header 201.

Comprise under the situation of header 201 in via header identification information judgement frame, the header 201 of extraction unit 303 from be included in frame extracts configuration informations 205.The configuration information 205 of then decoding and extracting.

Spatial information decoding unit 407 is included in spatial information 203 in the frame according to configuration information 205 decoding of decoding.

And multichannel generation unit 405 uses through the down-mix audio signal 103 of decoding with through the spatial information 203 generation multi-channel signals of decoding and the multi-channel signal that produces via output terminal OUT2 output subsequently.

With reference to Fig. 5, audio signal decoder receives the spatial signal information 105 (S501) that is transmitted with the bit stream form by audio signal encoding apparatus.

Mention like that such as front explanation, spatial signal information 105 is divided into as the situation of the ES transmission that separates with down-mix audio signal 103 and with down-mix audio signal 103 combines the situation of transmission.

The down-mix unit 401 of sound signal is separated into encoded down-mix audio signal 103 and encoded spatial signal information 105 with the sound signal that receives.Encoded spatial signal information 105 comprises header 201 and spatial information 203.If comprise header 201 in a frame of spatial signal information 105, then audio signal decoder is identified this header 201 (S503).

Audio signal decoder extracts configuration information 205 (S505) from header 201.

And audio signal decoder uses the configuration information 205 decoding spatial informations 203 (S507) that extract.

With reference to Fig. 6, audio signal decoder receives the spatial signal information 105 (S501) that is transmitted with the bit stream form by audio signal encoding apparatus.

Mention like that such as front explanation, spatial signal information 105 be divided into the situation that transmits as the ES that separates with down-mix audio signal 103 and be included in the auxiliary data of down-mix audio signal 103 or growth data in and the situation of transmission.

The demultiplex unit 401 of sound signal is separated into encoded down-mix audio signal 103 and encoded spatial signal information 105 with the sound signal that receives.Encoded spatial signal information 105 comprises header 201 and spatial information 203.Audio signal decoder judges whether comprise header 201 (S601) in the frame.

If comprise header 201 in the frame, then audio signal decoder is identified this header 201 (S503).

Audio signal decoder extracts configuration information 205 (S505) from header 201 subsequently.

Audio signal decoder judges whether the configuration information 205 that extracts from header 201 is exactly the configuration information 205 (S603) that the first header 201 from be included in spatial signal information 105 extracts.

If extract configuration informations 205 from the header 201 that at first extracts from sound signal, this device decoding configuration information 205 (S611) of audio signal decoding and according to the spatial information 203 of configuration information 205 decoding transmission configuration information 205 after of decoding then.

If the header 201 that extracts from sound signal is not the header 201 that at first extracts from spatial signal information 105, then audio signal decoder judge the configuration information 205 that extracts from header 201 whether with the configuration information identical (S605) that extracts from the first header 201 (S605).

If configuration information 205 is identical with the configuration information 205 that extracts from the first header 201, then audio signal decoder uses the configuration information 205 decoding spatial informations 203 through decoding that extract from the first header 201.

If the configuration information 205 that extracts is different from the configuration information 205 that extracts from the first header 201, then whether make a mistake the sound signal on the transmission path of audio signal decoder judgement from audio signal encoding apparatus to audio signal decoder (S607).

If configuration information 205 changes, even configuration information 205 is different from the configuration information 205 that extracts from the first header 201, mistake does not occur yet.Therefore, audio signal decoder is updated to new header 201 (S609) with header 201.The configuration information 205 (S611) that audio signal decoder is decoded subsequently and extracted from the header 201 that upgrades.

Spatial information 203 decodings that audio signal decoder will transmit after configuration information 205 according to the configuration information 205 through decoding.

If constant configuration information 205 is different from the configuration information of extracting out from the first header 201, this means at the sound signal transfer path and make a mistake.Therefore, audio signal decoder is removed the spatial information 203 that is included in the frame that contains error configurations information 205 or the mistake (S613) of correcting spatial information 203.

With reference to Fig. 7, audio signal decoder receives the spatial signal information 105 (S501) that is transmitted with the bit stream form by audio signal encoding apparatus.

The demultiplex unit 401 of sound signal is divided into encoded down-mix audio signal 103 and encoded spatial signal information 105 with the sound signal that receives.In this case, the positional information 207 of the time slot of its application parameter is comprised in the spatial signal information 105.

Audio signal decoder extracts the positional information 207 (S701) of time slot from spatial information 203.

Audio signal decoder uses the positional information of the time slot that extracts the position of the time slot of its application parameter to be applied to this parameter the time slot (S703) of correspondence by adjusting.

Fig. 8 is the process flow diagram that obtains according to an embodiment of the invention the method for positional information sign amount.The positional information sign amount of time slot is the bit number that is assigned with to characterize the positional information 207 of time slot.

Its positional information sign amount of using the time slot of the first parameter can by deducting the parameter number from timeslot number, will subtracting each other the result and add 1, be got take 2 as the logarithm at the end and with the ceil function application the value after the addition and find in this logarithm value.Specifically, can pass through ceil (log to its positional information sign amount of using the time slot of the first parameter ₂(k-i+1)) find, wherein " k " and " i " is respectively timeslot number and parameter number.

Suppose that " N " is natural number, its positional information sign amount of using the time slot of (N+1) individual parameter is represented as the positional information 207 of it being used the time slot of N parameter.In this case, to its use N parameter time slot positional information 207 can by will to its use N parameter time slot and to its use the timeslot number that exists between the time slot of (N-1) individual parameter add to its use (N-1) individual coefficient time slot positional information and the value after the addition added 1 obtain (S801).Specifically, can find by j (N)+r (N+1)+1 its positional information of using the time slot of (N+1) individual parameter, wherein r (N+1) expression is used the time slot of (N+1) individual parameter and it is used the timeslot number that exists between the time slot of N parameter it.

If found to its positional information of time slot of using N parameter, then can obtain the time slot position information representation amount of position that expression wherein acts on the time slot of (N+1) individual parameter.Specifically, can by deduct from timeslot number the parameter number that is applied to a frame and to its use N parameter time slot positional information and (N+1) added that subtraction value obtains (S803).Specifically, can pass through ceil (log ₂(k-i+N+1-j (N))) obtain it is used the positional information sign amount of the time slot of (N+1) individual parameter, wherein " k ", " i " and " j (N) " are respectively timeslot number, parameter number and it are used the positional information 205 of the time slot of N parameter.

Under the situation of the positional information sign amount that obtains in the above described manner time slot, its positional information of using the time slot of (N+1) individual parameter is characterized measurer the allocation bit number that is inversely proportional to " N ".That is be the changing value that depends on " N " to its positional information sign amount of using the time slot of this parameter.

Fig. 9 is the process flow diagram according to the method for the decoded audio signal of further embodiment of this invention.

Audio signal decoder is from audio signal encoding apparatus received audio signal (S901).Sound signal comprises audio descriptor 101, down-mix audio signal 103 and spatial signal information 105.

The audio descriptor 101 that audio signal decoder will be included in the sound signal extracts (S903).Comprise the identifier that represents audio coding decoding in the audio descriptor 101.

Audio signal decoder use audio descriptor 101 identifies sound signal and comprises multi-channel mixed frequency signal 103 and spatial signal information 105.Specifically, audio signal decoder can usage space information signal 105 to tell the sound signal that transmits be the signal (S905) that is used to form multichannel.

In addition, audio signal decoder usage space information signal 105 converts down-mix audio signal 103 to multi-channel signal.As mentioning in the explanation of front, each is comprised in the spatial signal information 105 header 201 with predetermined space.

Commercial Application

As mentioning in the explanation of front, can select header is included in spatial signal information according to the method and apparatus of Code And Decode sound signal of the present invention.

In addition, in spatial signal information, comprise under the situation of a plurality of headers, even audio signal decoder is from the random point reproducing audio signal, according to the method and apparatus of the Code And Decode sound signal of the present invention spatial information of also decoding.

Although with reference to preferred embodiment the present invention is set forth and illustrate, yet those skilled in that art are to be understood that and can make various modifications and variations and do not break away from the spirit and scope of the present invention.Therefore, the present invention is intended to cover the modifications and variations that drop in appended claims and the equivalent scope thereof of the present invention.

Claims

1. the method for a decoded audio signal comprises:

Reception comprises the sound signal of down-mix audio signal and spatial signal information, and described sound signal also comprises header identification information;

Obtain described header identification information, described header identification information indicates the frame of described spatial signal information whether to comprise header;

When described header identification information indicates the frame of described spatial signal information to comprise header, from described header, extract configuration information, when described header identification information indicates the frame of described spatial signal information not comprise header, from the header that the stage transmits before, extract configuration information;

Extract spatial information from described spatial signal information;

Generate multi-channel signal based on described configuration information and spatial information from described down-mix audio signal, and

Wherein said configuration information comprises time alignment information,

When described spatial signal information embeds in the described down-mix audio signal, the time delay between the described spatial signal information of described time alignment message identification and the described down-mix audio signal.

2. the method for claim 1 is characterized in that, further comprises:

Use is included in the positional information that indication one parameter in the described spatial information will be applied to which time slot in the frame, and the parameter that is included in the described spatial signal information is applied to corresponding time slot.

3. the method for claim 1 is characterized in that, described sound signal comprises the signal identification the information whether described spatial signal information of indication is combined with down-mix audio signal.

4. the method for claim 1 is characterized in that, further comprises:

Service time, alignment information identified the reference position of the frame of auxiliary signal.

5. the device of a decoded audio signal comprises:

Receiving element receives the sound signal that comprises down-mix audio signal and spatial signal information, and described sound signal also comprises header identification information, and described header identification information indicates the frame of described spatial signal information whether to have corresponding header;

Extraction unit, wherein when described header identification information indicates the frame of described spatial signal information to comprise header, from described header, extract configuration information, when described header identification information indicates the frame of described spatial signal information not comprise header, from the header that the stage transmits before, extract configuration information;

The spatial information decoding unit extracts spatial information from described spatial signal information;

The multichannel generation unit generates multi-channel signal based on described configuration information and spatial information from described down-mix audio signal, and

Wherein said configuration information comprises time alignment information, when described spatial signal information is embedded in the described down-mix audio signal, and the time delay between the described spatial signal information of described time alignment message identification and the described down-mix audio signal.