CN106385660A - Audio signal processing based on object - Google Patents

Audio signal processing based on object Download PDF

Info

Publication number
CN106385660A
CN106385660A CN201510484949.8A CN201510484949A CN106385660A CN 106385660 A CN106385660 A CN 106385660A CN 201510484949 A CN201510484949 A CN 201510484949A CN 106385660 A CN106385660 A CN 106385660A
Authority
CN
China
Prior art keywords
cluster
gain
audio
rightarrow
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510484949.8A
Other languages
Chinese (zh)
Other versions
CN106385660B (en
Inventor
陈连武
芦烈
J·布里巴特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby Laboratories Licensing Corp
Original Assignee
Dolby Laboratories Licensing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corp filed Critical Dolby Laboratories Licensing Corp
Priority to CN201510484949.8A priority Critical patent/CN106385660B/en
Priority to US15/749,750 priority patent/US10277997B2/en
Priority to EP16751763.0A priority patent/EP3332557B1/en
Priority to PCT/US2016/045512 priority patent/WO2017027308A1/en
Publication of CN106385660A publication Critical patent/CN106385660A/en
Application granted granted Critical
Publication of CN106385660B publication Critical patent/CN106385660B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Stereophonic System (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

Embodiments of the invention relate to audio signal processing. An audio signal possesses a plurality of audio objects. The invention discloses a method of processing the audio signal. The method comprises the following steps of acquiring an object position aiming at each audio object; based on a set of the object position, a plurality of gains of objects to clusters and a measurement standard, determining cluster positions used for grouping the audio object into clusters, wherein the measurement standard indicates quality of the cluster positions and quality of the gains of objects to clusters; each cluster position in the cluster positions is a center of mass of a corresponding cluster; and one gain in the gains of objects to clusters defines a ratio of a corresponding audio object in one cluster. In the method, the set based on the object position, the cluster positions and the measurement standard is included and the gains of objects to clusters is determined; and based on the determined cluster positions and the gains of objects to clusters, a cluster signal is generated. The invention also discloses a corresponding system and a computer program product.

Description

Process object-based audio signal
Technical field
Example embodiment disclosed herein is usually directed to object-based Audio Processing, more specifically, Relate to generate the method and system of cluster signal from object-based audio signal.
Background technology
Traditionally, the audio content of multi-channel format (such as 5.1,7.1 etc.) passes through in work In room mixing different audio signals and create, or by true environment simultaneously record sound letter Number and generate.Recently, object-based audio content has become increasingly prevalent, because its Individually carry some audio objects and audio frequency bed (audio bed) so that audio content is permissible It is presented using compared with improved precision for traditional rendering method.Audio object refers to can The individual audio element existing during with the time for definition, but also comprise the shape with metadata Formula describes the spatial information of the position, speed and size (as an example) of each object.Audio frequency Bed or bed refer to be intended to audio frequency predefined, that fixing loudspeaker position is reproduced and lead to Road.
For example, cinema's track can include the screen occurring corresponding to the diverse location on screen The different sound element of the many of the image on curtain, dialogue, noise and audio and and background sound Happy environmental effect combines to create overall audio experience.Accurately playback needs sound with relatively Make it shown with screen as closely as possible in sound source location, intensity, movement and depth Corresponding mode and reproduced.
In the transmission period of audio signal, bed and object can individually be sent and subsequent quilt Space reproduction system re-creates skill using using the multiple speakers in known physical location Art purpose.In some cases, tens of or even hundreds of individual audio objects can be included Presenting for audio content.As a result, the going out of such object-based voice data Significantly increase the complexity assuming voice data in playback system.
Occur in the lot of audio signals in object-based content to propose to this content Coding and the new challenge of distribution.In some distribution and transmission system, transfer capability can be carried Only there is very little or do not have to send all of audio frequency bed and object for sufficiently large available bandwidth There is audio compression.However, in some cases, such as Blu-ray Disc, broadcast (wired, defend Star and ground), mobile (3G and 4G) and cross top (OTT) and distribute, available bandwidth can not Enough send all of bed being created by audio mixing person and object information.Although audio coding method (damaging or lossless) may apply to audio frequency to reduce required bandwidth, and audio coding may not Enough to reduce transmission audio frequency needed for bandwidth, particularly such as move 3G and 4G network it The very limited amount of network of class.
Some existing methods utilize the cluster of audio object in order to by input object and bed Quantity is reduced to export the more small set of cluster.Thus, computation complexity and storage demand are reduced. However, accuracy may be compromised because existing method only distributed in the way of relative coarseness right As.
Content of the invention
Example embodiment disclosed herein propose a kind of method for processing audio signal and System, for reducing the quantity of audio object by distributing to cluster by these objects, with When space audio reproduce accuracy in terms of keep performance.
In one aspect, example embodiment disclosed herein provides a kind of process audio signal Method.Audio signal has multiple audio objects.The method includes obtaining for each audio frequency pair The object's position of elephant;And based on object's position, multiple object to cluster gain and module Set, determine for by audio object be grouped cluster cluster position.Module indicates cluster position The quality put and object to cluster gain quality, each the cluster position in cluster position is the phase of cluster The barycenter of the cluster answered, and object defines corresponding audio frequency to one of cluster gain gain Ratio in a cluster for the object.The method is also included based on object's position, cluster position and tolerance The set of standard, determines object to cluster gain;And cluster position and object determined by being based on are extremely Cluster gain, generates cluster signal.
In yet another aspect, example embodiment disclosed herein provides one kind and is used for processing audio frequency The system of signal.Audio signal has multiple audio objects.This system includes being configured to obtain Object's position acquiring unit for the object's position of each audio object;And it is configured to base Set in object's position, multiple object to cluster gain and module determines for by audio frequency Object is grouped the cluster position determination unit of the cluster position of cluster.Module indicates the matter of cluster position Amount and object to cluster gain quality, each the cluster position in cluster position is corresponding the one of cluster The barycenter of individual cluster, and object to one of cluster gain gain define corresponding audio object exist Ratio in one cluster.This system also includes being configured to based on object's position, cluster position and degree The set of amount standard determines object to the object of cluster gain to cluster gain determination unit;And joined Be set to based on determined by cluster position and object to cluster Gain generating cluster signal cluster signal generation Unit.
By following description it will be appreciated that, comprise audio object and audio frequency bed based on object Audio signal greatly compressed for data stream, thus for those signals calculating and Bandwidth demand is significantly reduced.The accurately generating of some clusters can reproduce have high-precision, Wherein audience can correctly perceive the auditory scene of the positioning of each audio object so that body faces it The reproduction in border can correspondingly be realized.Simultaneously because effective compression lead to data transfer The requirement of rate reduces permission and is directed to any of playback system (such as loudspeaker array and earphone) Less fidelity compromised.
Brief description
Described in detail below by referring to accompanying drawing, example embodiment disclosed herein above-mentioned and Other objects, features and advantages will become more clearly understood from.In the accompanying drawings, disclosed herein show Example embodiment will be illustrated with example and nonrestrictive mode, wherein:
The flow chart that Fig. 1 illustrates the method for the process audio signal according to example embodiment;
Fig. 2 illustrates the exemplary flow of the object-based Audio Signal Processing according to example embodiment Cheng Tu;
Fig. 3 illustrates according to example embodiment for processing the system of audio signal;And
Fig. 4 illustrates the example computer system being adapted for carrying out example embodiment disclosed herein Block diagram.
In whole accompanying drawings, identical or corresponding reference refers to identical or corresponding part.
Specific embodiment
Real to example disclosed herein now with reference to the various example embodiment shown in accompanying drawing The principle applying example illustrates.It should be appreciated that the description of these embodiments only makes this area Technical staff better understood when and implements example embodiment disclosed herein further, and not It is intended to by any way scope be limited.
Object-based audio signal is used to be processed audio object and they are corresponding The system of metadata and process.The information such as position, speed, width are provided at metadata Within.These object-based audio signals are generally produced by the audio mixing person in operating room, and It is adapted to and presented by the different system with suitable processor.However, it is because public herein The embodiment opened is focused primarily in the cluster how assigning an object to reducing quantity simultaneously in sky Between the accuracy aspect of audio reproducing keep performance, to audio mixing with assume process and do not carry out specifically Explanation.
Assume that is that audio signal is divided into the individual of the analysis standing in entire disclosure Body frame.Such segmentation can be used in time domain waveform, and is applied to disclosed herein showing The wave filter group of example embodiment or any other transform domain can be employed.
The flow chart that Fig. 1 illustrates the method 100 of the process audio signal according to example embodiment. In step S101, obtain the object's position for each audio object in audio object.Base Audio object in object generally comprises the metadata with regard to the positional information of object for the offer.In base In the case that the audio content of object will be presented with higher accuracy, such information pair It is useful in various treatment technologies.
In step S102, the collection based on object's position, multiple object to cluster gain and module Close the cluster position determining for audio object being grouped cluster.The cluster position that module instruction determines The quality put and determine object to cluster gain quality.For example, such quality can be by generation Valency function representation, this will be described in the following.Cluster position refer to from some different each other The barycenter of the close cluster of audio object set.Cluster can be chosen in a different manner, these Mode for example includes:It is randomly chosen cluster position;Initial clustering is applied on multiple audio objects To obtain cluster position (for example, k average cluster);And based on previous for audio signal The cluster position of individual time frame determines the cluster position of the current time frame for audio signal.Object is extremely One of cluster gain gain defines each audio object and is grouped in a corresponding cluster Ratio, and these gains instruction audio object how to be grouped in cluster.
In step s 103, the set based on object's position, cluster position and module, Determine object to cluster gain.Each audio object can be assigned with object to cluster gain for Serve as coefficient.In other words, if for specific audio object with respect to a cluster object extremely Cluster gain is larger, and this object can be spatially near this cluster.Certainly, for an audio frequency pair As the object larger with respect to some clusters to cluster gain means for identical audio object phase Object for other clusters can be relatively small to cluster gain.
The determination that step S102 and S103 define cluster position is based in part on object to cluster gain And object is based in part on object's position it is meant that this two determine step to the determination of cluster gain Suddenly it is complementary.The quality of this determination can be indicated by the value being associated with module. Generally, the trend that the value being associated with module declines or converge to predetermined value can be used to Maintain this determination process, until the satisfactory enough position of quality.Predefined threshold value can be by Set, the value that therefore it can and be associated with module is compared.As a result, In some embodiments, the determination of cluster position and object to cluster gain will be alternately performed, until this Value is less than till predefined threshold value.
Alternately, another predefined threshold value can be set, therefore its can and with degree The rate of change of the value that amount standard is associated is compared.As a result, in certain embodiments, Cluster position and object will keep this determination process until being associated with module to cluster gain The rate of change (for example, fall off rate) of value is less than predefined threshold value.
In an embodiment, cost function goes for representing the value being associated with module, Thus it can reflect the quality of cluster position of determination and the object of determination to the matter of cluster gain Amount.Calculating accordingly, with respect to cost function will be explained in detail following.
By considering the various modules of cluster process, cost function includes the item of various accumulations. In one embodiment, each module can include the sound that (A) reconstructs in cluster signal The position of frequency object and the site error between the position of the audio object in audio signal;(B) In the position of cluster and the distance between the position of audio object error;(C) object is to cluster gain Summation and unit one deviation;(D) cluster signal is presented to one or more playback systems And the audio object in audio signal is assumed presenting to one or more playback systems Error;And the interframe of (E) variable between current time frame and previous time frame is not Concordance.Cost function is used for comparing and was divided in audio object before and after cluster process Group in several clusters before and after signal be useful.Therefore, cost function can be anti- Reflect the effective index of the quality of cluster.
For module (A), because input audio object can be reconstructed by output cluster, Error between primary object position and reconstruct object's position can be used to the sky of measurement object Between position difference, description cluster process have for positional information how accurate.
" site error " can be relevant to by its signal across output cluster position pcAfter distribution Audio object locus, it is related to sky before and after cluster process for the audio object Between position.Especially, when home position is by vectorDuring expression, (for example, it can be by 3 flutes Karr coordinate representation), reconstruct positionThe source of amplitude translation can be expressed as, such as:
p → o ′ = Σ c g o , c p → c - - - ( 1 )
Subsequently, cost E being associated with site errorPCan be expressed as:
E P = Σ o w o | | p → o Σ c g o , c - Σ c g o , c p → c | | 2 - - - ( 2 )
Wherein woRepresent the weight of o-th object, its can be the energy of object, loudness or Part loudness.gO, cRepresent the gain that o-th object is presented to c-th cluster, right in other words As to cluster gain.
For module (B), due to being presented to audio object, betwixt there is big distance Each cluster in may introduce big tone color and change, object to cluster distance can be used to measure sound Color changes.When audio object is not represented but the void that translated across numerous clusters by simple sound source (cluster) (phantom), when source of sound represents, tone color change is expected to.It is well known that due to working as It is likely to occur when one and identical signal are by two or more (virtual) loudspeaker reproduction The interaction of comb filter, the source of amplitude translation can have the tone colors different from simple sound source.
Item " range error " can be by EDRepresent, it can be by the position of audio objectWith cluster PositionThe distance between subtract each other, will be by away from primary object position if reflecting audio object Cluster represented by then cost increase:
E D = Σ o w o Σ c g o , c 2 | | p → o - p → c | | 2 - - - ( 3 )
For module (C), object to cluster gain normalization error can be used to measure Energy (loudness) before and after cluster process changes.
Item " deviation " can be by ENRepresent, it is related to gain normalization, or more specifically, It is related to the deviation between the summation of gain and the unit () for specific cluster barycenter:
E N = Σ o w o ( 1 - Σ c g o , c ) 2 - - - ( 4 )
For module (D), due to there is different presenting for different playback systems Output, for this module (monophonic matter e.g. on 7.1.4 speaker playback system Amount) one or more may need to be designated with reference to playback systems.By comparing specified Assume output and the difference presenting between output of cluster with reference to the primary object on playback system, The monophonic quality of cluster result can be measured.
Item " assuming error " can be by ERRepresent, it is related to for the error with reference to playback system, This reference playback system be used to measure by primary object be presented to reference to playback system with by cluster Be presented to the difference with reference to playback system, this reference playback system can be ears, 5.1,7.14, 9.16 waiting.
E R = Σ s n s Σ o w o ( g o , s - Σ c g o , c g c , s ) 2 - - - ( 5 )
Wherein n s = 1 Σ o w o g o , s 2 + a - - - ( 6 )
Wherein gO, sRepresent the gain that o-th object is presented to s-th output channels, gC, sTable Show the gain that c-th cluster is presented to s-th output channels, and nsIt is that normalization assumes area Not so that the error that presents in each sound channel is comparable.Parameter a is in order to avoid working as Very little or introduce when even zero and excessive assume area with reference to the signal on playback system Not.
In one embodiment, using subscript s, summation on a speaker can be specifically pre- Determine to be performed on one or more speakers of loudspeaker layout.Alternately, cluster and object are same When be presented to the bigger set of the speaker covering multiple loudspeaker layout.For example, if One layout is the layout of 5 sound channels, and the second layout will include double track layout, cluster and right As both can concurrently be presented to 5 sound channels and two channel layout.Subsequently, error term ER? It is estimated on all of 7 speakers and jointly optimized by mistake with being simultaneous for two kinds of loudspeaker layout Difference item.
For module (E), because cluster process is performed as the function of frame, Some variables (such as object to cluster gain, cluster position and reconstruct object's position) in cluster process Interframe discordance can be used to measure the module of this target.In one embodiment, The time that the interframe discordance of the object's position of reconstruct can be used to measure cluster result puts down Slippery.
Item " interframe discordance " can be by ECRepresent, it is related to reconstruct the particular variables of object Interframe inconsistent.AssumeWithIt is the primary object in t frame and t-1 frame Position,WithIt is the reconstruct object's position in t frame and t-1 frame, and It is the object reconstruction object's position in t frame.As defined by above equation (1), reconstruct PositionThe source of amplitude translation can be expressed as.
For retaining interframe smoothness, the object reconstruction object's position in t frame can be expressed Be in t-1 frame reconstruct to position with from t-1 frame to the shifted by delta of the object of t frameoCombination:
q → o ( t ) = p → o ′ ( t - 1 ) + Δ o ( t - 1 , t ) = p → o ′ ( t - 1 ) + p → o ( t ) - p → o ( t - 1 ) - - - ( 7 )
Subsequently, cost E being associated with interframe discordanceCCan be expressed as:
E C = Σ o w o | | q → o Σ c g o , c - Σ c g o , c p → c | | 2 - - - ( 8 )
Above module can independently be measured, or conduct is above-described tolerance mark The overall cost of accurate combination.In one embodiment, overall cost can be cost item (A) Weighted sum to (E):
E=αPEPDEDNENRERCEC(9)
In another embodiment, total cost can also be the maximum of cost item:
E=max { αPEP, αDED, αNEN, αRER, αCEC} (10)
Wherein αP, αD, αN, αR, αCThe weight of expression cost item (A) to (E).
Gain gO, c, positionWithMatrix can be written as:
G O C = g → 1 . . . g → O - - - ( 11 )
P O = p → 1 . . . p → O - - - ( 12 )
Q O = q → 1 . . . q → O - - - ( 13 )
P C = p → 1 . . . p → C - - - ( 14 )
Object weight can be written as diagonal matrix:
Subsequently, different cost function items can be written as following:
E P = Σ o w o | | g → o 1 → C p → o - g → o P C | | 2 = | | W O 1 / 2 ( d i a g ( G O C 1 C * O ) P O - G O C P C | | 2 = | | W O 1 / 2 ( HP O - G O C P C | | 2 - - - ( 16 )
Wherein H=diag (GOC1C*O), diag () represents the computing obtaining diagonal matrix.Table Show have C × 1 yuan complete 1 vector, or make whole coefficients be equal to+1 length C to Amount, and 1C*ORepresent all 1's matrix with C × O unit.
E D = Σ o w o Σ c g o , c 2 | | p → o - p → c | | 2 = Σ c Σ o w o g o , c 2 | | p → o - p → c | | 2 = Σ o w o g → o Λ o g → o T - - - ( 17 )
Wherein ΛORepresent that there is diagonal elementDiagonal matrix.
E N = Σ o w o ( 1 - Σ c g o , c ) 2 = Σ o w o ( 1 - 2 g → o 1 → C + g → o 1 → C 1 → C T g → o T ) - - - ( 18 )
E R = Σ o w o Σ s n s ( g o , s - Σ c g o , c g c , s ) 2 = Σ o w o ( g → o → s - g → o G C S ) N s ( g → o → s - g → o G C S ) T - - - ( 19 )
Wherein NSRepresent that there is diagonal element nSDiagonal matrix,Represent instruction by o-th Object is presented to the vector of the gain with reference to speaker, GCSRepresent and comprise cluster to speaker gain Matrix.
E C = Σ o w o | | g → o 1 → C q → o - g → o P C | | 2 = | | W O 1 / 2 ( HQ O - G O C P C ) | | 2 - - - ( 20 )
Using item defined above, the details of determination process will be given in the following description.
Return to Fig. 1, in step S104, based on determine in step S102 and S103 Cluster position and object, to cluster gain, generate cluster signal to be presented.The cluster signal being generated leads to Often have more much smaller than the quantity of the audio object being included in audio content or audio signal Cluster quantity so that significantly being dropped to the demand for the computing resource that assumes auditory scene Low.
Fig. 2 illustrates the exemplary flow of the object-based Audio Signal Processing according to example embodiment Journey 200.
Frame 210 can produce according to the audio content to be treated of example embodiment within big Amount audio object, audio frequency bed and metadata.Frame 220 is used for cluster process, and it is by multiple sounds Frequency object is grouped in the cluster of relatively small number.At frame 230, cluster signal and newly-generated unit Data is output together so that being expressed the frame of the renderer for special audio playback system 240 present.In other words, Fig. 2 shows and is related to work out 210, cluster 220, distribution 230 Overview with the ecosystem assuming 240.After cluster, cluster signal and metadata can be by It is distributed to multiple renderers that purpose is different speaker playback settings or headphone reproduction.
Assume that, audio content is by bed (or static object, or traditional sound channel) Represented by (dynamic) object.Object includes audio signal and instruction as the function of time Space assumes the metadata of the associated of information.In order to reduce the data transfer rate of multiple beds and object, Apply multiple beds and object as the cluster inputting, and produce the less set of object (referring to cluster) to represent original contents in the way of data efficient.
Cluster process generally includes the set determining cluster position and by object set (or presenting) For both clusters.This two processes have the cross correlation of complexity, because object is presented in cluster Can depending on the position of cluster, and totally assume quality can depending on cluster position and object extremely Cluster gain.Desirably with collaborative method optimizing cluster position and object to cluster gain.
In one embodiment, the object of optimization can be by as above institute to cluster gain and cluster position The minimum cost function stated and obtain.However, due to not obtaining optimum object together extremely The scheme of the closing form of cluster gain and cluster position, a kind of exemplary scenario is using the class EM (phase Hope and maximize) iterative process correspondingly to determine object to cluster gain and cluster position.In E step In, given cluster position PC, object is to cluster gain GOCCan be by minimizing cost function and quilt Determine;In M step, given object is to cluster gain GOC, cluster position PCCan be by minimum Change cost function and be determined.Stopping criterion is used to decide whether to continue or stops this iteration.
Given cluster position PC, the object of minima realizing cost function E is to cluster gain GOCCan With by obtaining at solution frame 222 in fig. 2 with minor function:
∂ ∂ G O C E = α P ∂ ∂ G O C E P + α D ∂ ∂ G O C E D + α R ∂ ∂ G O C E R + α C ∂ ∂ G O C E C + α N ∂ ∂ G O C E N = 0 - - - ( 21 )
Wherein, for module (A):
∂ ∂ G O C E P = ∂ ∂ g → 1 E P ∂ ∂ g → 2 E P . . . ∂ ∂ g → O E P = 2 w 1 g → 1 ( 1 → C p → 1 p → 1 T 1 → C T - P C p → 1 T 1 → C T - 1 → C p → 1 P C T + P C P C T ) 2 w 2 g → 2 ( 1 → C p → 2 p → 2 T 1 → C T - P C p → 2 T 1 → C T - 1 → C p → 2 P C T + P C P C T ) . . . 2 w O g → O ( 1 → C p → o p → o T 1 → C T - P C p → o T 1 → C T - 1 → C p → o P C T + P C P C T )
For module (B):
∂ ∂ G O C E D = ∂ ∂ g → 1 E D ∂ ∂ g → 2 E D . . . ∂ ∂ g → O E D = w 1 g → 1 ( Λ 1 + Λ 1 T ) w 2 g → 2 ( Λ 2 + Λ 2 T ) . . . w O g → O ( Λ O + Λ O T )
For module (C):
∂ ∂ G O C E N = ∂ ∂ g → 1 E N ∂ ∂ g → 2 E N . . . ∂ ∂ g → O E N = - 2 w 1 1 → C T + 2 w 1 g → 1 1 → C 1 → C T - 2 w 2 1 → C T + 2 w 2 g → 2 1 → C 1 → C T . . . - 2 w O 1 → C T + 2 w O g → O 1 → C 1 → C T
For module (D):
∂ ∂ G O C E R = ∂ ∂ g → 1 E R ∂ ∂ g → 2 E R . . . ∂ ∂ g → O E R = w 1 ( - 2 g → o → s N s G C S T + 2 g → 1 G C S N s G C S T ) w 2 ( - 2 g → o → s N s G C S T + 2 g → 2 G C S N s G C S T ) . . . w o ( - 2 g → o → s N s G C S T + 2 g → O G C S N s G C S T )
For module (E):
∂ ∂ G O C E C = ∂ ∂ g → 1 E C ∂ ∂ g → 2 E C . . . ∂ ∂ g → O E C = 2 w 1 g → 1 ( 1 → C q → 1 q → 1 T 1 → C T - P C q → 1 T 1 → C T - 1 → C q → 1 P C T + P C P C T ) 2 w 2 g → 2 ( 1 → C q → 2 q → 2 T 1 → C T - P C q → 2 T 1 → C T - 1 → C q → 2 P C T + P C P C T ) . . . 2 w O g → O ( 1 → C q → o q → o T 1 → C T - P C q → o T 1 → C T - 1 → C q → o P C T + P C P C T )
By solving above-mentioned equation, obtain object to cluster gain matrix, such as:
G O C = g → 1 . . . g → O - - - ( 22 )
Wherein:
g → o = ( α P B P + α D B D + α N B N + α R B R + α C B C ) ( α P A P + α D A D + α N A N + α R A R + α C A C ) - 1 - - - ( 23 )
Wherein:
BP=0
BD=0
B N = - 2 w o 1 → C T
B R = w o ( - 2 g → o → s N s G N S T )
BC=0
A P = 2 w o ( 1 → C p → o p → o T 1 → C T - P C p → o T 1 → C T - 1 → C p → o P C T + P C P C T )
AD=wooo T)
AR=wo(2GCSNSGCS T)
A C = 2 w o ( 1 → C q → o q → o T 1 → C T - P C q → o T 1 → C T - 1 → C q → o P C T + P C P C T )
As can be seen here, object can be determined based on cluster position to cluster gain.
Given object is to cluster gain GOC, the local minimum of cost function E and the cluster of optimum Position PCCan be by obtaining at solution frame 221 in fig. 2 with minor function:
∂ ∂ P C E = α P ∂ ∂ P C E P + α D ∂ ∂ P C E D + α R ∂ ∂ P C E R + α C ∂ ∂ P C E C + α N ∂ ∂ P C E N = 0 - - - ( 24 )
However, due to the closing form scheme not being directed to above equation, using gradient descent method Obtain optimum cluster position PC
P C ( i + 1 ) = P C ( i ) - σ ∂ ∂ P C E - - - ( 25 )
Wherein i represents the iterationses that gradient declines, and σ represents learning procedure.For module (A), (B) and (C), the gradient of each cost item can be as following derivation:
E P = | | W O - 1 2 ( HP O - G O C P C ) | | 2 = t r { ( P O T H T W O - 1 / 2 - P C T G O C T W O - 1 / 2 ) ( W O - 1 / 2 HP O - W O - 1 / 2 G O C P C ) } = t r { P O T H T W O HP O - P O T H T W O G O C P C - P C T G O C T W O HP O + P C T G O C T W O G O C P C } - - - ( 26 )
Wherein tr { } represents the matrix tracks function of the diagonal element of summation matrix.
∂ ∂ P C E P = - ( P O T H T W O G O C ) T - G O C T W O HP O + ( G O C T W O G O C + G O C T W O G O C ) P C - - - ( 27 )
∂ ∂ p → c E D = - 2 Σ o w o g o , c 2 p → o + 2 p → c Σ o w o g o , c 2 - - - ( 28 )
∂ ∂ P C E N = 0 - - - ( 30 )
∂ ∂ P C E R = ∂ ∂ p 1 x E R , ∂ ∂ p 1 y E R , ∂ ∂ p 1 z E R ∂ ∂ p 2 x E R , ∂ ∂ p 2 y E R , ∂ ∂ p 2 z E R . . . ∂ ∂ p C x E R , ∂ ∂ p C y E R , ∂ ∂ p C z E R - - - ( 31 )
Wherein pCxRepresent c-th output cluster (from 1 to c) along 3 cartesian coordinate systems The position of x-axis, pCyRepresent c-th and export cluster along the position of the y-axis in 3 cartesian coordinate systems Put, pCzRepresent c-th and export cluster along the position of the z-axis in 3 cartesian coordinate systems.For Module (D), has:
∂ ∂ p c x E R = 2 Σ s n s Σ o w o ( g o , s - Σ c g o , c g c , s ) ( - g o , c ∂ ∂ p c x g c , s ) - - - ( 32 )
∂ ∂ p c y E R = 2 Σ s n s Σ o w o ( g o , s - Σ c g o , c g c , s ) ( - g o , c ∂ ∂ p c y g c , s ) - - - ( 33 )
∂ ∂ p c z E R = 2 Σ s n s Σ o w o ( g o , s - Σ c g o , c g c , s ) ( - g o , c ∂ ∂ p c z g c , s ) - - - ( 34 )
Wherein gC, sRepresent and cluster be presented to reference to the gain in playback system, WithRepresent the gradient assuming gain.
For example, for the Atmos renderer of standard, gain can be calculated as following:
gC, s(pcx, pcy, pcz)=fsx(pcx)fsy(pcy)fsz(pcz) (35)
Wherein fsx()、fsy() and fsz() represents the corresponding of the Atmos renderer in s sound channel Ground is with regard to the gain function of x position, y location and z location, and is directed to module (E):
∂ ∂ P C E C = - ( Q O T H T W O G O C ) T - G O C T W O HQ O + ( G O C T W O G O C + G O C T W O G O C ) P C - - - ( 36 )
As can be seen here, cluster position can be determined to cluster gain based on object.
Initialization can be there is many ways in which for the cluster position of this iterative process.For example, random Initialization or the initialization based on k average can be used to initialize and process frame for each Cluster position.However, in order to avoid converging to different local minimums in adjacent frame, obtaining The cluster position of the previous frame taking can be used to initialize the cluster position of present frame.Additionally, example As the cluster position selecting there is the cost of minimum from multiple different initial methods can be applied The mixed method put is initializing determination process.
After any one of the step that execution is represented by frame 221 and 222 step, cost letter Number by be estimated at frame 223 with the value of test cost function whether sufficiently small thus stop should repeatedly Generation.When the value of cost function is less than the rate of descent of predefined threshold value or this cost function value When very little, this iteration will be stopped.Predefined threshold value manually can be set in advance by user Put.In another embodiment, the step being represented by frame 221 and 222 can be by alternately real Apply, till the value of cost function or its rate of change are equal to predefined threshold value.At some In the case of use, the step only predetermined number that execution is represented by the frame 221 and 222 in Fig. 2 The number of times of amount can be enough, rather than execute these steps and reach threshold value until global error.
It is understood that above-described EM alternative manner is only example embodiment, other Rule can also be employed thus jointly estimating cluster position and object to cluster gain.
This iterative step or determination process ensure that generation has the multiple of improved accuracy Cluster is so that the reproduction on the spot in person of audio content can be realized.Simultaneously because effective compress The requirement to data transmission rate leading to reduces permission for any of playback system (such as Loudspeaker array and earphone) less fidelity compromised.
Fig. 3 illustrates according to example embodiment for processing the audio frequency including multiple audio objects The system 500 of signal.As illustrated, system 300 includes being configured to obtain being directed to each sound The object's position acquiring unit 301 of the object's position of frequency object;And be configured to based on object Position, the set of multiple object to cluster gain and module determine for dividing audio object The cluster position determination unit 302 of the cluster position of group cluster.Module indicates the quality of cluster position And object is to the quality of cluster gain, each the cluster position in cluster position is corresponding of cluster The barycenter of cluster, and object to one of cluster gain gain defines corresponding audio object one Ratio in individual cluster.This system 300 also include being configured to based on object's position, cluster position and The set of module determines object to the object of cluster gain to cluster gain determination unit;And quilt Cluster position determined by being configured to and object are to cluster Gain generating cluster signal to be presented Cluster signal generation unit 304.
In the exemplary embodiment, system 300 can also include alternately determining unit, and it is configured For alternately executing the determination of cluster position and object to the determination of cluster gain, predetermined until meeting Condition.In a further embodiment, predetermined condition can include at least one of the following: The value being associated with module is less than predefined threshold value, and is associated with module The rate of change of value is less than another predefined threshold value.
In another example embodiment, module can include at least one of the following:? Site error between the position of audio object of reconstruct and object's position in cluster signal;In cluster position Put and the distance between object's position error;Object to cluster gain summation with one deviation;Will Cluster signal presents and presents to one or more with by audio signal to one or more playback systems Error is assumed between playback system;And between current time frame and previous time frame The interframe discordance of variable.In further example embodiment, variable can include object At least one of position of audio object to cluster gain, cluster position and reconstruct.Alternative Ground, alternately determining unit can be configured to the set of weights of the set based on module Close, be alternately performed the determination of cluster position and object to the determination of cluster gain.
In another example embodiment, system 300 can also include cluster position initialization unit, It is configured to initialize cluster position based at least one in following:It is randomly chosen cluster position; On multiple audio objects, application initial clustering is to obtain cluster position;And based on for audio frequency letter Number previous time frame cluster position, determine for audio signal current time frame cluster position Put.
For the sake of clarity, some selectable unit (SU)s of system 300 do not show that in figure 3.So And it should be appreciated that above in reference to Fig. 1 to 2 described by feature be all applied to system 300. Additionally, the part of system 300 can be hardware module or software unit module.For example, one In a little embodiments, system 300 can partially or even wholly be realized with software/or firmware, for example It is embodied as the computer program being embodied in computer-readable medium.Alternatively or additionally Ground, system 300 can partially or even wholly be based on hardware and realize, such as integrated circuit (IC), application specific integrated circuit (ASIC), SOC(system on a chip) (SOC), scene can be compiled Journey gate array (FPGA) etc..The scope of the present invention is not limited to this aspect.
Fig. 4 shows the example computer system being adapted for carrying out example embodiment disclosed herein 400 block diagram.As illustrated, computer system 400 includes CPU (CPU) 401, it can be according to the program being stored in read only memory (ROM) 402 or from storage Area 408 is loaded into the program of random access memory (RAM) 403 and executes various process. In RAM 403, when CPU 401 executes various process etc., always according to required storage There is required data.CPU 401, ROM 402 and RAM 403 are via bus 404 each other It is connected.Input/output (I/O) interface 405 is also connected to bus 404.
Connected to I/O interface 405 with lower component:Importation 406 including keyboard, mouse etc.; Including cathode ray tube (CRT), liquid crystal display (LCD) etc. and speaker etc. Output par, c 407;Storage part 408 including hard disk etc.;And inclusion such as LAN card, The communications portion 409 of the NIC of modem etc..Communications portion 409 is via such as The network execution communication process of the Internet etc.Driver 410 connects to I/O also according to needs Interface 405.Detachable media 411, such as disk, CD, magneto-optic disk, semiconductor storage Device etc., is arranged in driver 410 computer program so that reading from it as needed It is mounted into storage part 408 as needed.
Especially, according to example embodiment disclosed herein, above with reference to Fig. 1 to 2 description Process may be implemented as computer software programs.For example, example embodiment bag disclosed herein Include a kind of computer program, it includes the calculating being tangibly embodied on machine readable media Machine program, this computer program comprises the program code for executing method 100.Such In embodiment, this computer program can be downloaded from network and be pacified by communications portion 409 Dress, and/or be mounted from detachable media 411.
In general, various example embodiment disclosed herein can hardware or special circuit, Implement in software, logic or its any combinations.Some aspects can be implemented within hardware, and Other side can by the firmware of controller, microprocessor or other computing device or Implement in software.When each side of example embodiment disclosed herein be illustrated or described as block diagram, Flow chart or using some other figures represent when, it will be understood that square frame described herein, device, System, techniques or methods can be as nonrestrictive example in hardware, software, firmwares, specially With in circuit or logic, common hardware or controller or other computing device, or its some combination Implement.
And, each frame in flow chart can be counted as method and step, and/or computer program The operation that the operation of code generates, and/or be interpreted as executing the logic of multiple couplings of correlation function Component.For example, example embodiment disclosed herein includes computer program, its bag Include the computer program visibly realized on a machine-readable medium, this computer program comprise by It is configured to execute the program code of method described above.
In the context of the disclosure, machine readable media can be comprised or store for or have Any tangible medium with regard to the program of instruction execution system, device or equipment.Machine readable is situated between Matter can be machine-readable signal medium or machinable medium.Machine readable media is permissible Including but not limited to electronics, magnetic, optical, electromagnetism, infrared or semiconductor system, Device or equipment, or its any appropriate combination.The more detailed example of machinable medium Including the electrical connection with one or multiple wire, portable computer diskette, hard disk, with Machine storage memorizer (RAM), read only memory (ROM), erasable programmable are read-only Memorizer (EPROM or flash memory), light storage device, magnetic storage apparatus, or it arbitrarily closes Suitable combination.
Computer program code for executing the method for the present invention can be compiled with one or more Cheng Yuyan writes.These computer program codes can be supplied to general purpose computer, dedicated computing The processor of machine or other programmable data processing meanss is so that program code is by computer Or when other programmable data processing meanss execution, cause in flow chart and/or block diagram Function/the operation of regulation is carried out.Program code can completely on computers, partly calculate On machine, as independent software kit, part on computers and part on the remote computer or Completely on remote computer or server or in one or more remote computers or server Between distribution and execute.
Although in addition, operation is depicted with particular order, this should not be considered as requiring This generic operation is completed with the particular order that illustrates or with sequential order, or executes all diagrams Operation is to obtain expected result.In some cases, multitask or parallel processing are probably favourable 's.Similarly, although discussed above contain some specific implementation details, this should not It is construed to limit the scope of any invention or claim, and should be interpreted that specific to being directed to The description of the specific embodiment of invention.Retouch in the context of separate embodiment in this specification The some features stated can also combined implementation in single embodiment.On the contrary, in single enforcement Various features described in the context of example can also be discretely any in multiple embodiment fire Implement in suitable sub-portfolio.
For the various modifications of the example embodiment of the aforementioned present invention, change and will look into together with accompanying drawing When seeing described above, obvious are become to those skilled in the technology concerned.Any and all modification Unrestriced and the present invention example embodiment scope will be still fallen within.Additionally, aforementioned specification and There is the benefit inspiring in accompanying drawing, the those skilled in the art being related to these embodiments will think The other examples embodiment illustrating to herein.
Correspondingly, example embodiment disclosed herein can be embodied as arbitrary shape described herein Formula.For example, the example embodiment (EEE) being exemplified below describes some aspects of the present invention Some structures, feature and function.
A kind of method processing object-based voice data of EEE 1., including:
Determine multiple cost functions based on module for reference to more than first sound Frequency object becomes more than second audio object.
By jointly optimizing the locus of more than second audio object and assuming gain Become more than second audio object to minimize cost function with reference to more than first audio object.
Method according to EEE 1 for the EEE 2., wherein said multiple modules are included at least One:
Space representation
Tone color preserves
Loudness preserves
Monophonic quality
Time smoothing degree
Method according to EEE 2 for the EEE 3., wherein space representation can be by object reconstruction Measured by site error.
Method according to EEE 2 for the EEE 4., wherein tone color retain can by object to cluster away from From measured.
Method according to EEE 2 for the EEE 5., wherein loudness retain and can be increased by object to cluster Measured by beneficial normalization error.
Method according to EEE 2 for the EEE 6., wherein monophonic quality can be by least one Individual or multiple predefined with reference to presenting measured by error on playback system.
Method according to EEE 2 for the EEE 7., wherein putting into practice smoothness can be by cluster result At least one of measured by the interframe discordance of variable.
Method according to EEE 7 for the EEE 8., wherein variable can be object to cluster gain, Cluster position or reconstruct object's position.
Method according to EEE 1 for the EEE 9., wherein cost function can be based on multiple degree The combination of the cost item of amount standard.
Method according to EEE 9 for the EEE 10., wherein different weights is applied to multiple The described cost item of module.
Method according to EEE 10 for the EEE 11., wherein said different weight is in response to people Class inputs and is determined.
Method according to EEE 11 for the EEE 12., wherein class E-M iterative optimization method is permissible It is used to minimize cost function.
Method according to any one of aforementioned EEE for the EEE 13., wherein one or more ginsengs Examine speaker setting to be determined by mankind's input.
Method according to any one of aforementioned EEE for the EEE 14., wherein can with reference to renderer To be any one in speaker renderer or earphone renderer.

Claims (15)

1. a kind of method processing the audio signal including multiple audio objects, including:
Obtain the object's position for each audio object in described audio object;
Based on the set of described object's position, multiple object to cluster gain and module, really The fixed cluster position for described audio object is grouped cluster, described module indicates described cluster The quality of position and described object to cluster gain quality, each the cluster position in described cluster position Putting is the barycenter of a corresponding cluster in described cluster, and described object to cluster gain one Individual gain defines ratio in one of described cluster cluster for the corresponding audio object;
Based on the described set of described object's position, described cluster position and module, determine Described object is to cluster gain;And
Cluster position determined by being based on and object, to cluster gain, generate cluster signal.
2. method according to claim 1, further includes:
Alternately execute the described determination of described cluster position and described object to the institute of cluster gain State determination, until meeting predetermined condition.
3. method according to claim 2, wherein said predetermined condition include following in At least one:
The value being associated with described module is less than predefined threshold value, and
The rate of change of the described value being associated with described module is less than another predefined threshold Value.
4. according to the method in claim 2 or 3, wherein said module include with At least one in lower:
In described cluster signal between the position of audio object of reconstruct and described object's position Site error;
In described cluster position and the distance between described object's position error;
Described object to cluster gain summation with one deviation;
Described cluster signal is presented and to one or more playback systems with by described audio signal is in Now to assuming error between one or more of playback systems;And
The interframe discordance of the variable between current time frame and previous time frame.
5. method according to claim 4, wherein said variable includes described object extremely In the described position of the audio object of cluster gain, described cluster position and described reconstruct at least one Individual.
6. method according to claim 4, wherein said alternately executes described cluster position The described determination put and described object are based on module to the described determination of cluster gain The weighted array of described set.
7. according to the method in any one of claims 1 to 3, further include:
Described cluster position is initialized based at least one in following:
It is randomly chosen described cluster position;
Apply initial clustering on the plurality of audio object, to obtain described cluster position; And
Based on the described cluster position of the previous time frame for described audio signal, determine Described cluster position for the current time frame of described audio signal.
8. a kind of system for processing the audio signal including multiple audio objects, including:
Object's position acquiring unit, is configured to obtain each sound in described audio object The object's position of frequency object;
Cluster position determination unit, is configured to:Based on described object's position, multiple object to cluster Gain and the set of module, determine the cluster position for described audio object is grouped cluster Put, described module indicates the quality of described cluster position and described object to the matter of cluster gain Amount, each the cluster position in described cluster position is the barycenter of the corresponding cluster in described cluster, And described object to one of cluster gain gain defines corresponding audio object in described cluster One of ratio in cluster;
Object, to cluster gain determination unit, is configured to based on described object's position, described cluster position Put and module described set, determine described object to cluster gain;And
Cluster signal generation unit, be configured to be based on determined by cluster position and object to cluster gain, Generate cluster signal.
9. system according to claim 8, further includes:
Alternately determining unit, be configured to alternately to execute the described determination of described cluster position and Described object to cluster gain described determination, until meeting predetermined condition.
10. system according to claim 9, wherein said predetermined condition include following in At least one:
The value being associated with described module is less than predefined threshold value, and
The rate of change of the described value being associated with described module is less than another predefined threshold Value.
11. systems according to claim 9 or 10, wherein said module includes At least one in below:
In described cluster signal between the position of audio object of reconstruct and described object's position Site error;
In described cluster position and the distance between described object's position error;
Described object to cluster gain summation with one deviation;
Described cluster signal is presented and to one or more playback systems with by described audio signal is in Now to assuming error between one or more of playback systems;And
The interframe discordance of the variable between current time frame and previous time frame.
12. systems according to claim 11, wherein said variable includes described object To the described position of the audio object of cluster gain, described cluster position and described reconstruct at least One.
13. systems according to claim 11, wherein said alternately determining unit enters one Step is configured to the weighted array of the described set based on module, alternately executes described cluster The described determination of position and described object to cluster gain described determination.
14. systems any one of according to Claim 8 to 10, further include:
Cluster position initialization unit, is configured to described based at least one initialization in following Cluster position:
It is randomly chosen described cluster position;
Apply initial clustering on the plurality of audio object, to obtain described cluster position; And
Based on the described cluster position of the previous time frame for described audio signal, determine Described cluster position for the current time frame of described audio signal.
A kind of 15. computer programs for processing the audio signal including multiple audio objects produce Product, described computer program is tangibly stored in non-transient computer-readable media simultaneously And inclusion machine-executable instruction, described machine-executable instruction makes machine hold when executed The step of row method according to any one of claim 1 to 7.
CN201510484949.8A 2015-08-07 2015-08-07 Processing object-based audio signals Active CN106385660B (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN201510484949.8A CN106385660B (en) 2015-08-07 2015-08-07 Processing object-based audio signals
US15/749,750 US10277997B2 (en) 2015-08-07 2016-08-04 Processing object-based audio signals
EP16751763.0A EP3332557B1 (en) 2015-08-07 2016-08-04 Processing object-based audio signals
PCT/US2016/045512 WO2017027308A1 (en) 2015-08-07 2016-08-04 Processing object-based audio signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510484949.8A CN106385660B (en) 2015-08-07 2015-08-07 Processing object-based audio signals

Publications (2)

Publication Number Publication Date
CN106385660A true CN106385660A (en) 2017-02-08
CN106385660B CN106385660B (en) 2020-10-16

Family

ID=57916386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510484949.8A Active CN106385660B (en) 2015-08-07 2015-08-07 Processing object-based audio signals

Country Status (1)

Country Link
CN (1) CN106385660B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110166927A (en) * 2019-05-13 2019-08-23 武汉大学 One kind is based on the modified virtual sound image method for reconstructing of positioning
CN110537373A (en) * 2017-04-25 2019-12-03 索尼公司 Signal processing apparatus and method and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101361405A (en) * 2006-01-03 2009-02-04 Slh音箱公司 Method and system for equalizing a loudspeaker in a room
CN103593430A (en) * 2013-11-11 2014-02-19 胡宝清 Clustering method based on mobile object spatiotemporal information trajectory subsections
WO2015017037A1 (en) * 2013-07-30 2015-02-05 Dolby International Ab Panning of audio objects to arbitrary speaker layouts
WO2015105748A1 (en) * 2014-01-09 2015-07-16 Dolby Laboratories Licensing Corporation Spatial error metrics of audio content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101361405A (en) * 2006-01-03 2009-02-04 Slh音箱公司 Method and system for equalizing a loudspeaker in a room
WO2015017037A1 (en) * 2013-07-30 2015-02-05 Dolby International Ab Panning of audio objects to arbitrary speaker layouts
CN103593430A (en) * 2013-11-11 2014-02-19 胡宝清 Clustering method based on mobile object spatiotemporal information trajectory subsections
CN103593430B (en) * 2013-11-11 2017-03-22 胡宝清 Clustering method based on mobile object spatiotemporal information trajectory subsections
WO2015105748A1 (en) * 2014-01-09 2015-07-16 Dolby Laboratories Licensing Corporation Spatial error metrics of audio content

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110537373A (en) * 2017-04-25 2019-12-03 索尼公司 Signal processing apparatus and method and program
CN110537373B (en) * 2017-04-25 2021-09-28 索尼公司 Signal processing apparatus and method, and storage medium
CN110166927A (en) * 2019-05-13 2019-08-23 武汉大学 One kind is based on the modified virtual sound image method for reconstructing of positioning
CN110166927B (en) * 2019-05-13 2020-05-12 武汉大学 Virtual sound image reconstruction method based on positioning correction

Also Published As

Publication number Publication date
CN106385660B (en) 2020-10-16

Similar Documents

Publication Publication Date Title
CN105874533B (en) Audio object extracts
JP6330034B2 (en) Adaptive audio content generation
Li et al. Scene-aware audio for 360 videos
US20180232471A1 (en) Methods, systems, and computer readable media for acoustic classification and optimization for multi-modal rendering of real-world scenes
CN105989852A (en) Method for separating sources from audios
WO2019199359A1 (en) Ambisonic depth extraction
Tylka et al. Soundfield navigation using an array of higher-order ambisonics microphones
CN106303897A (en) Process object-based audio signal
CN108694740A (en) Information processing equipment, information processing method and user equipment
JP6486489B2 (en) Metadata storage audio object clustering
Geronazzo et al. Do we need individual head-related transfer functions for vertical localization? The case study of a spectral notch distance metric
EP3332557B1 (en) Processing object-based audio signals
Amatriain et al. The allosphere: Immersive multimedia for scientific discovery and artistic exploration
TW202022853A (en) Method and apparatus for decoding encoded audio signal in ambisonics format for l loudspeakers at known positions and computer readable storage medium
US11393228B2 (en) Methods, systems, articles of manufacture and apparatus to generate digital scenes
US20170245089A1 (en) Method, computer readable storage medium, and apparatus for determining a target sound scene at a target position from two or more source sound scenes
CN107980225A (en) Use the apparatus and method of drive signal drive the speaker array
CN105992120A (en) Upmixing method of audio signals
TW201909170A (en) Use multi-layer descriptions to generate enhanced sound field descriptions or modified sound field description concepts
CN107113526A (en) Projection, which is based on, from audio content extracts audio object
Kon et al. Deep neural networks for cross-modal estimations of acoustic reverberation characteristics from two-dimensional images
CN106385660A (en) Audio signal processing based on object
Kim et al. Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras
CN108476365A (en) Apparatus for processing audio and method and program
Ratnarajah et al. Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant