CN118176749A

CN118176749A - Information processing apparatus and method, and program

Info

Publication number: CN118176749A
Application number: CN202280073688.9A
Authority: CN
Inventors: 畠中光行; 知念徹; 辻实; 户栗康裕; 本间弘幸
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2021-11-12
Filing date: 2022-10-31
Publication date: 2024-06-11

Abstract

The present technology relates to an information processing apparatus and method and a program capable of reproducing content based on the intention of a content creator. The information processing apparatus includes a control unit that: generating a plurality of metadata sets including metadata of a plurality of objects, the metadata including object position information indicating positions of the objects seen from a control viewpoint when a median plane in space is set to face a direction of a target position from the control viewpoint; generating control viewpoint information for each of a plurality of control viewpoints, the control viewpoint information including control viewpoint position information indicating a position of the control viewpoint in space and information indicating a metadata set related to the control viewpoint among a plurality of metadata sets; and generates content data including metadata sets different from each other and configuration information including control viewpoint information of a plurality of control viewpoints. The present technology is applicable to an information processing apparatus.

Description

Information processing apparatus and method, and program

Technical Field

The present technology relates to an information processing apparatus, an information processing method, and a program, and in particular, to an information processing apparatus, an information processing method, and a program each capable of realizing content reproduction based on the intention of a content creator.

Background

Conventional free viewpoint audio is mainly used for playing games, in which a positional relationship (i.e., coincidence between an image and sound) established in an image displayed during the game is an important factor. Thus, the coincidence is achieved using the object audio in the absolute coordinate system (for example, see patent document 1).

On the other hand, in the field of music content, unlike games, the audibility balance has a higher priority than the image-sound protocol in order to improve the musical performance. Thus, the image sound consistency is not ensured not only for 2-channel stereo but also for 5.1-channel multi-channel content.

Furthermore, musical properties have a higher priority even in 3DoF (degrees of freedom) business services. Thus, the contents provided by a large number of services in this field include only sound, and an image-sound protocol is not ensured.

[ Reference List ]

[ Patent literature ]

[ Patent document 1]

PCT patent publication No. WO2019/198540

Disclosure of Invention

[ Technical problem ]

Meanwhile, by adopting the foregoing method of representing the position of the object by coordinates in the absolute coordinate system, it is possible to provide enhanced realism, but it is difficult to create free viewpoint content capable of satisfying the musicality expected by the music creator. In other words, it is difficult to realize content reproduction of free viewpoint content based on the intention of the content creator.

The present technology has been developed in consideration of such a situation, and an object of the present technology is to realize content reproduction based on the intention of a content creator.

[ Solution to the problem ]

An information processing apparatus according to a first aspect of the present technology includes a control unit. The control unit generates a plurality of metadata sets, each metadata set including metadata associated with a plurality of objects, the metadata containing object position information indicating a position of an object viewed from the control viewpoint when a direction from the control viewpoint toward the target point in space is designated as a direction toward the median plane. For each control viewpoint of the plurality of control viewpoints, the control unit generates control viewpoint information containing control viewpoint position information indicating a position of the corresponding control viewpoint in space and information indicating a metadata set associated with the corresponding control viewpoint among the plurality of metadata sets. The control unit generates configuration information including a plurality of metadata sets different from each other and content data including control viewpoint information related to a plurality of control viewpoints.

The information processing method or program according to the first aspect of the present technology includes the steps of: generating a plurality of metadata sets, each metadata set comprising metadata associated with a plurality of objects; the metadata includes object position information indicating a position of the object viewed from the control viewpoint when a direction from the control viewpoint toward a target point in space is designated as a direction toward a median plane, for each of a plurality of control viewpoints, a step of generating control viewpoint information including control viewpoint position information indicating a position of the corresponding control viewpoint in space and information indicating metadata sets associated with the corresponding control viewpoint among the plurality of metadata sets, and a step of generating content data including a plurality of metadata sets different from each other and configuration information including control viewpoint information associated with the plurality of control viewpoints.

According to a first aspect of the present technology, a plurality of metadata sets are generated, each metadata set including metadata associated with a plurality of objects, the metadata containing object position information indicating a position of an object viewed from a control viewpoint when a direction from the control viewpoint toward a target point in space is designated as a direction toward a median plane. For each control viewpoint of the plurality of control viewpoints, control viewpoint information is generated, the control viewpoint information containing control viewpoint position information indicating a position of the corresponding control viewpoint in space and information indicating a metadata set associated with the corresponding control viewpoint among the plurality of metadata sets. Content data including a plurality of metadata sets different from each other and configuration information including control viewpoint information associated with a plurality of control viewpoints is generated.

An information processing apparatus according to a second aspect of the present technology includes an acquisition unit, a listener position information acquisition unit, and a position calculation unit. The acquisition unit acquires object position information indicating a position of an object viewed from a control viewpoint when a direction from the control viewpoint toward a target point in space is specified as a direction toward a median plane, and control viewpoint position information indicating a position of the control viewpoint in space. A listener position information acquisition unit acquires listener position information indicating a listener position in space. The position calculation unit calculates listener reference object position information indicating a position of an object viewed from a listener position based on the listener position information, control viewpoint position information associated with the plurality of control viewpoints, and object position information associated with the plurality of control viewpoints.

The information processing method or program according to the second aspect of the present technology includes the steps of: the method includes the steps of acquiring object position information indicating a position of an object viewed from a control viewpoint when a direction from the control viewpoint toward a target point in space is designated as a direction toward a median plane, and control viewpoint position information indicating a position of the control viewpoint in space, acquiring listener position information indicating a listener position in space, and calculating listener reference object position information indicating a position of the object viewed from the listener position based on the listener position information, control viewpoint position information associated with a plurality of control viewpoints, and object position information associated with the plurality of control viewpoints.

According to the second aspect of the present technology, object position information indicating a position of an object viewed from a control viewpoint when a direction from the control viewpoint toward a target point in space is specified as a direction toward a median plane, and control viewpoint position information indicating a position of the control viewpoint in space are acquired. Listener position information indicating a listener position in space is acquired. Listener reference object position information indicating a position of an object viewed from a listener position is calculated based on the listener position information, control viewpoint position information associated with the plurality of control viewpoints, and object position information associated with the plurality of control viewpoints.

Drawings

Fig. 1 is a diagram illustrating 2D audio and 3D audio.

Fig. 2 is a diagram illustrating object positioning.

Fig. 3 is a diagram illustrating the CVP and the target point TP.

Fig. 4 is a diagram illustrating a positional relationship between the visual punctuation TP and the CVP.

Fig. 5 is a diagram illustrating object positioning.

Fig. 6 is a diagram illustrating a CVP and an object positioning pattern.

Fig. 7 is a diagram of an example format for presenting configuration information.

Fig. 8 is a diagram presenting an example of a frame length coefficient.

Fig. 9 is a diagram showing a format example of CVP information.

FIG. 10 is a diagram of an example format for rendering an object metadata set.

Fig. 11 is a diagram describing a positioning example of a CVP in a free view space.

Fig. 12 is a diagram describing a positioning example of a reverberation object.

Fig. 13 is a diagram depicting a configuration example of the information processing apparatus.

Fig. 14 is a flowchart illustrating the content creation process.

Fig. 15 is a diagram describing a configuration example of a server.

Fig. 16 is a flowchart illustrating the allocation process.

Fig. 17 is a diagram describing a configuration example of the client.

Fig. 18 is a flowchart illustrating reproduction audio data generation processing.

Fig. 19 is a diagram illustrating selection of a CVP for interpolation processing.

Fig. 20 is a diagram illustrating a three-dimensional position vector of an object.

Fig. 21 is a diagram illustrating the synthesis of the three-dimensional position vector of the object.

Fig. 22 is a diagram illustrating vector synthesis.

Fig. 23 is a diagram illustrating vector synthesis.

Fig. 24 is a diagram illustrating the contribution ratio of each vector at the time of vector synthesis.

Fig. 25 is a diagram illustrating listener reference object position information according to the orientation of the listener's face.

Fig. 26 is a diagram illustrating listener reference object position information according to the orientation of the listener's face.

Fig. 27 is a diagram illustrating a packet of the CVP.

Fig. 28 is a diagram illustrating CVP group and interpolation processing.

Fig. 29 is a diagram illustrating CVP group and interpolation processing.

Fig. 30 is a diagram illustrating CVP group and interpolation processing.

Fig. 31 is a diagram illustrating CVP group and interpolation processing.

Fig. 32 is a diagram showing a format example of configuration information.

Fig. 33 is a diagram showing a format example of CVP group information.

Fig. 34 is a flowchart illustrating reproduction audio data generation processing.

Fig. 35 is a diagram showing an example of a positioning mode of the CVP.

Fig. 36 is a diagram showing an example of positioning of CVP or the like in a common absolute coordinate system.

Fig. 37 is a diagram presenting an example of listener reference object position information and listener reference gain.

Fig. 38 is a diagram showing an example of a positioning mode of the CVP.

Fig. 39 is a diagram showing an example of positioning of a CVP or the like in a common absolute coordinate system.

Fig. 40 is a diagram presenting an example of listener reference object position information and listener reference gain.

Fig. 41 is a diagram describing a selection example of a CVP used for interpolation processing.

Fig. 42 is a diagram presenting an example of listener reference object position information and listener reference gain.

Fig. 43 is a diagram presenting an example of listener reference object position information and listener reference gain.

Fig. 44 is a diagram presenting an example of configuration information.

Fig. 45 is a flowchart illustrating the contribution coefficient calculation process.

Fig. 46 is a flowchart illustrating normalized contribution coefficient calculation processing.

Fig. 47 is a flowchart illustrating normalized contribution coefficient calculation processing.

Fig. 48 is a diagram illustrating selection of a CVP on the reproduction side.

Fig. 49 is a diagram describing an example of a CVP selection screen.

Fig. 50 is a diagram presenting an example of configuration information.

Fig. 51 is a diagram describing a configuration example of a client.

Fig. 52 is a flowchart illustrating the selective interpolation process.

Fig. 53 is a diagram describing a configuration example of a computer.

Detailed Description

Hereinafter, an embodiment to which the present technology is applied will be described with reference to the drawings.

< First embodiment >

< Present technology >

The present technology is a technology for providing free viewpoint content with art.

First, 2D audio and 3D audio will be described with reference to fig. 1.

For example, as depicted in the left part of fig. 1, a sound source allowing 2D audio is only provided at a position equal to the height of the listener's ear. In this case, the 2D audio can represent the front-back and left-right movement of the sound source.

On the other hand, as depicted in the right part of the figure, an object corresponding to a sound source of 3D audio is allowed to be positioned at a position above or below the height of the listener's ear. Therefore, expression of movement of the sound source (object) in the up-down direction can also be achieved.

In addition, 3DoF content and 6DoF content may be used as the content using 3D audio.

For example, in the case of 3DoF content, a user can watch and listen to the content while rotating his or her head in up-down and left-right directions and in an oblique direction in space. This type of 3DoF content is also referred to as fixed viewpoint content.

On the other hand, in the case of 6DoF content, the user can watch and listen to the content while rotating his or her head in the up-down direction and the left-right direction and the tilt direction in the space, and also move to any position in the space. This type of 6DoF content is also referred to as free view content.

The content to be described below may be audio content including only audio, or content including video and audio accompanying the video. Hereinafter, various contents will be simply referred to as contents without particularly distinguishing the contents. An example of creating 6DoF content (i.e., free viewpoint content) will be described in detail below. In addition, it is assumed hereinafter that audio objects will also be simply referred to as objects.

To increase the artistry (musicality) in the process of content creation, some objects are intentionally placed at locations other than where they are viewed, rather than at the physical locations of the objects.

This manner of object localization can be readily expressed using polar coordinate system object localization techniques. Fig. 2 depicts the difference between the positioning of objects in an absolute coordinate system and in a polar coordinate system in an example of actual band performance.

For example, in the case of the example of a band performance depicted in fig. 2, the objects (audio objects) corresponding to the sound and guitar are not physically located at the position of the sound and guitar, but are located in consideration of musical properties.

The left part of fig. 2 depicts the physical localization positions of band members playing a musical composition in a three-dimensional space, i.e., localization positions in a physical space (absolute coordinate space). Specifically, the object OV11 is a sound (singer), the object OD11 is a drum, the object OG11 is a guitar, and the object OB11 is a bass guitar.

Specifically, in this example, the object OV11 (sound) is positioned at a position shifted rightward from the front (middle plane) of the user corresponding to the listener of the content, and the object OG11 (guitar) is positioned near the right end as viewed from the user.

During content creation, objects (audio objects) corresponding to individual members (instruments) of the band are positioned as shown in the right part of the figure in view of musical performance. Note that the consideration of musical performance refers to improvement of listening easiness as a musical piece.

The localization of the audio object in polar coordinate space is depicted in the right part of the figure.

Specifically, the object OV21 indicates a fixed position of an audio object corresponding to the object OV11, that is, a sound (sound) of a sound.

The object OV11 is located on the right side with respect to the median plane. However, considering that sound is a key person in the content, the creator positions the object OV21 corresponding to the sound at a higher position in the median plane, i.e., at a higher position of the center when viewed from the user, to achieve a distinct polar representation of the object OV 21.

Each of the object OG21-1 and the object OG21-2 is an audio object of a string accompaniment sound generated by the guitar, the audio object corresponding to the object OG11. Note that, in the following, the object OG21-1 and the object OG21-2 are also simply referred to as them without particular need to distinguish them.

In this example, the mono object OG21 is not each positioned in the case where there is no change in the physical position of the guitar's home in three-dimensional space (i.e., in the position of the object OG 11), but is positioned in both right and left positions on the front as viewed from the user in view of music knowledge. In particular, the audio representation of the overlay listener (user) may be achieved by positioning the object OG21 in respective left and right positions (with a left distance from each other) in front as viewed from the user. That is, the expression of a broad sense (coverage sense) can be realized.

Each of the object OD21-1 and the object OD21-2 is an audio object corresponding to the object OD11 (drum), and each of the object OB21-1 and the object OB21-2 is an audio object corresponding to the object OB11 (bass guitar).

Note that, in the following, each of the object OD21-1 and the object OD21-2 will also be simply referred to as the object OD21, in the case where the distinction between the object OD21-1 and the object OD21-2 is not particularly required. Similarly, each of the object OB21-1 and the object OB21-2 will also be simply referred to as the object OB21 without a particularly required distinction therebetween.

For the purpose of stabilization, the object OD21 is positioned at a low position with a leftward distance between the object OD21 in the left-right direction when viewed from the user. For stability purposes, the object OB21 is positioned slightly above the drum near the center (object OD 21).

In this way, the creator locates the respective objects in the polar coordinate space in consideration of musical properties to create free viewpoint content.

Object localization in a polar coordinate system in this way is more suitable for creating free viewpoint content to which artistry (musicality) of creator intention is added than object localization in an absolute coordinate system that determines object positions at unique physical positions. The present technology is a free viewpoint audio technology that can be realized using an object localization mode expressed in a plurality of polar coordinate systems based on the above-described object localization method in the polar coordinate system.

Meanwhile, in order to create 3DoF content based on the polar coordinate system object localization, the creator assumes one listener position within the space, and localizes each object around the center (listener position) located at the listener using the polar coordinate system.

At this time, the metadata of each object mainly includes three elements of an orientation angle, an elevation angle, and a gain.

The azimuth angle herein is an angle formed in the horizontal direction and indicating the position of the object viewed from the listener. Elevation angle is an angle formed in the vertical direction and indicates the position of an object viewed from the listener. The gain is the gain of the audio data of the object.

The creation tool outputs the above-described metadata and audio data (object audio data) for reproducing the sound of the object corresponding to the metadata as a deliverable for each object.

Here, development of a 3DoF content creation method for a free viewpoint (6 DoF) (i.e., for free viewpoint content) will be considered.

For example, as shown in fig. 3, it is assumed that a creator of free viewpoint content defines a plurality of control viewpoints (hereinafter, also referred to as CVPs), which are positions of viewpoints that the creator desires to indicate within a free viewpoint space (three-dimensional space).

For example, each of the CVPs is a position that is desired to be designated as a listener position during reproduction of the content. The following assumes that the ith CVP is also denoted as CVPi in particular.

In the example described in fig. 3, three CVPs (control viewpoints) of CVPs 1 to 3 are defined in a free viewpoint space, in which a user corresponding to a listener listens to content.

Here, it is assumed that a coordinate system representing absolute coordinates of an absolute position in the free viewpoint space is a common absolute coordinate system, which is a rectangular coordinate system having an origin 0 at a predetermined position in the free viewpoint space, and axes including an X axis, a Y axis, and a Z axis intersect with each other at right angles as shown in the center of the figure.

In this example, in the figure, the X-axis represents an axis in the lateral direction, the Y-axis represents an axis in the depth direction, and the Z-axis represents an axis in the longitudinal direction. Further, the position of the origin O in the free viewpoint space is set at any position according to the intention of the creator of the content. Alternatively, for example, the position may be set at the center of a place assumed to be free viewpoint space.

Coordinates representing the respective positions of the CVPs 1 to 3 (i.e., absolute coordinate positions of the respective CVPs) are (X1, Y1, Z1), (X2, Y2, Z2) and (X3, Y3, Z3), respectively, in a common absolute coordinate system herein.

Further, the creator of the content defines one position (one point) in the free viewpoint space as a target point TP that is supposed to be viewed from all CVPs. The target point TP is a position that is a reference for interpolation processing performed for interpolating position information associated with an object. Specifically, it is assumed that virtual listeners located at the respective CVPs each face in a direction toward the target point TP.

In this example, the coordinates (absolute coordinate positions) representing the target point TP in the common absolute coordinate system are (x _tp、y_tp、z_tp).

Also, a polar coordinate space (hereinafter also referred to as CVP polar coordinate space) around the center located at the CVP position is formed for each CVP.

The position within each CVP polar coordinate space is represented by coordinates (polar coordinates) in a polar coordinate system (hereinafter also referred to as CVP polar coordinate system) including an origin O' located at the CVP position, i.e., an absolute coordinate position of the CVP, for example, an X-axis, a y-axis, and a z-axis intersect each other at right angles.

Specifically, in this example, the y-axis positive direction corresponds to a direction extending from the position of the CVP to the target point TP, the x-axis corresponds to an axis in the left-right direction viewed from a virtual listener present at the CVP, and the z-axis corresponds to an axis in the up-down direction viewed from a virtual listener present at the CVP.

When each of the CVP and the target point TP is set (specified) by the content creator, the relationship between the Yaw angle (Yaw) as an angle in the horizontal direction and the Pitch angle (Pitch) as an angle in the vertical direction is determined as information indicating the positional relationship between the corresponding CVP and the target point TP.

The angle "Yaw" in the horizontal direction is a horizontal angle formed by the Y-axis of the common absolute coordinate system and the Y-axis of the CVP polar coordinate system. Specifically, the angle "Yaw" is an angle in the horizontal direction with respect to the Y-axis of the common absolute coordinate system, and represents the orientation of the face of a virtual listener existing at the CVP and viewing the target point TP.

Further, the angle "Pitch" in the vertical direction is an angle formed by the Y-axis of the CVP polar coordinate system with respect to an X-Y plane containing the X-axis and the Y-axis of the common absolute coordinate system. Specifically, the angle "Pitch" is an angle in the vertical direction with respect to the X-Y plane sharing the absolute coordinate system, and represents the orientation of the face of a virtual listener existing at the CVP and viewing the target point TP.

Specifically, it is assumed that the coordinates representing the absolute coordinate positions of the target points TP are (xtp, ytp, ztp), and the coordinates representing the absolute coordinate positions of the predetermined CVPs (i.e., the coordinates in the common absolute coordinate system) are (xcvp, ycvp, zcvp).

In this case, the angle "Yaw" in the horizontal direction and the angle "Pitch" in the vertical direction of the CVP are calculated by the following equation (1). In other words, the relationship indicated by the following equation (1) holds.

[ Mathematics 1]

Yaw ＝ -1.0*atan(xcvp - xtp)/(ycvp - ytp)

Pitch ＝ acos(a/sqrt(a2 + b2))

Wherein a=sqrt ((xcvp-xtp) 2+ (ycvp-ytp) 2)

b ＝ zcvp - ztp... (1)

In fig. 3, for example, a straight line represented by a broken line and obtained by projecting the Y-axis of the CVP polar coordinate system of CVP2 onto the X-Y plane and an angle "Yaw2" formed by a straight line represented by a broken line and parallel to the Y-axis of the common absolute coordinate system correspond to the angle "Yaw" in the horizontal direction calculated by equation (1) of CVP 2. Similarly, the angle "Pitch2" formed by the Y-axis of the CVP polar coordinate system of CVP2 and the X-Y plane corresponds to the angle "Pitch" in the vertical direction calculated by equation (1) of CVP 2.

The transmitting side (generating side) of the free viewpoint content transmits CVP position information indicating the absolute coordinate position of the CVP in the common absolute coordinate system, and CVP orientation information including yaw angle and pitch angle calculated by equation (1) for CVP as configuration information to the receiving side (reproducing side).

Note that coordinates (absolute coordinate values) representing the target point TP in the common absolute coordinate system may be transmitted from the transmitting side to the receiving side as an alternative means for transmitting the CVP orientation information. Specifically, the target point information indicating the target point TP in the common absolute coordinate system (free viewpoint space) may be stored in the configuration information instead of the CVP orientation information. In this case, the receiving side (reproducing side) calculates the yaw angle and pitch angle for each CVP using equation (1) described above based on the received coordinates representing the target point TP.

Further, the CVP orientation information may include not only a yaw angle and a pitch angle of the CVP but also a rotation angle (Roll) with respect to a common absolute coordinate system in the CVP polar coordinate system as a rotation angle around a rotation axis corresponding to the y-axis in the CVP polar coordinate system. Hereinafter, it is assumed that the yaw angle, pitch angle, and rotation angle included in the CVP orientation information will also be referred to as CVP yaw angle information, CVP pitch angle information, and CVP rotation angle information, respectively.

The CVP orientation information will be further described herein while focusing on the CVP3 described in fig. 3.

Fig. 4 presents the positional relationship between the target point TP and the CVP3 described in fig. 3.

When setting (specifying) the CVP3, a CVP polar coordinate system, i.e., a polar coordinate space, around the center of the CVP3 is defined. In this polar coordinate space, the direction toward the target point TP as viewed from the CVP3 corresponds to the direction toward the median plane (the direction in which the angular orientation in the horizontal direction and the angular elevation angle in the vertical direction are both 0). In other words, the direction extending from the CVP3 to the target point TP corresponds to the y-axis positive direction in the CVP polar coordinate system around the center of the CVP 3.

Assuming here that a plane containing the CVP3 (i.e., an origin O ' of the CVP polar coordinate system of the CVP 3) is an X ' -Y ' plane in parallel with an X-Y plane containing the X axis and a Y axis of the common absolute coordinate system, a line LN11 is a straight line obtained by projecting the Y axis of the CVP polar coordinate system of the CVP3 onto the X ' -Y ' plane. Further, the line LN12 is a straight line that is included in the X '-Y' plane and parallel to the Y-axis of the common absolute coordinate system.

In this case, the angle "Yaw3" formed by the line LN11 and the line LN12 corresponds to CVP Yaw information associated with CVP3 and calculated by equation (1), and the angle "Pitch3" formed by the y-axis and the line LN11 corresponds to CVP Pitch information associated with CVP3 and calculated by equation (1).

The CVP orientation information including the CVP Yaw information and the CVP pitch information thus obtained is information indicating a direction in which a virtual listener located at the CVP3 faces (i.e., a direction from the CVP3 toward the target point TP in the free viewpoint space). In other words, the CVP orientation information is considered as information indicating a relative relationship between orientations (directions) in the common absolute coordinate system and the CVP polar coordinate system.

Further, when determining the CVP polar coordinate system of each CVP, a relative positional relationship indicated using the angular azimuth in the horizontal direction and the angular elevation in the vertical direction is maintained between the corresponding CVP and each object (audio object), each of which indicates the position (direction) of the corresponding object viewed from the corresponding CVP.

The relative positions of predetermined objects viewed from the CVP3 will be described with reference to fig. 5. Note that fig. 5 shows the positional relationship between the target point TP and the CVP3 that is the same as the positional relationship shown in fig. 3.

In this example, four audio objects containing object obj1 are positioned within the free viewpoint space.

The state of the entire free viewpoint space is depicted in the left part of the figure. Specifically, in this example, the object obj1 is located near the target point TP.

The creator of the content can determine (specify) the location of the object of the corresponding CVP in such a way that: the object is located at a different absolute positioning position in the free view space of each CVP.

For example, the creator can individually specify the positioning position of the object obj1 in the free viewpoint space viewed from the CVP1 and the positioning position of the object obj1 in the free viewpoint space viewed from the CVP 3. These positioning positions do not have to coincide with each other.

The right part of the figure depicts the state when the target point TP and the object 1 are viewed from the CVP3 in the CVP polar coordinate space of the CVP 3. As is apparent from the figure, the object obj1 is located in the left front as viewed from the CVP 3.

In this case, a relative positional relationship determined by the angle of the horizontal direction orientation angle azimuth_obj1 and the angle of the vertical direction elevation angle elevation angle_obj 1 is established between the CVP3 and the object obj1. In other words, the relative position of the object obj1 viewed from the CVP3 can be expressed by coordinates (polar coordinates) defined in the CVP polar coordinate system, which include the angle azimuth_obj1 in the horizontal direction and the angle elevation_obj1 in the vertical direction.

Object localization based on such polar expressions is the same localization method as used for 3DoF content creation. In other words, the present technique enables targeting of 6DoF content using a polar expression similar to that of 3DoF content.

As described above, the present technology can locate an object in a three-dimensional space for each of a plurality of CVPs set in a free viewpoint space by using the same method as that for 3DoF content. In this way, object positioning patterns corresponding to a plurality of CVPs are formed.

During creation of the free view content, the creator specifies the positioning position of all objects for each CVP set by the creator.

Note that the positioning pattern of the object of each CVP is not limited to a pattern that processes only one of the CVPs. The same positioning pattern may be applied to a plurality of CVPs. In this way, it is possible to specify object positions for a plurality of CVPs in a wider range within the free viewpoint space while effectively reducing the creation cost.

Fig. 6 presents an example of an association between configuration information (CVP set) for managing a CVP and an object location pattern (object set).

The (n+2) CVP is depicted in the left part of the figure. Information associated with (n+2) CVPs is stored in the configuration information. Specifically, for example, the configuration information contains CVP position information and CVP orientation information of each CVP.

On the other hand, N object position patterns (i.e., object positioning patterns) different from each other are present at the right side portion in the drawing.

For example, the object position pattern information indicated by the character "OBJ position 1" indicates the positioning positions of all objects in the CVP polar coordinate system in the case where the objects are located in one specific positioning pattern determined by the creator or the like.

Thus, for example, the positioning pattern of the object indicated by "OBJ position 1" is different from the positioning pattern of the object indicated by "OBJ position 2".

Further, arrows pointing from respective CVPs indicated in the left part to object position patterns indicated in the right part in the drawing each represent a link relationship between the CVP and the object position pattern. According to the present technology, the combination patterns of the information associated with the respective CVPs and the position information associated with the object exist independently of each other, and are set to manage the relationship between the CVPs and the object positions by linking, as shown in fig. 6.

Specifically, in accordance with the present technique, for example, a set of object metadata is prepared for each positioning mode of an object.

For example, the object metadata set includes object metadata for a corresponding one of the objects. Each object metadata contains object location information associated with an object corresponding to the positioning mode.

The object position information includes polar coordinates or the like indicating a positioning position of the corresponding object in the CVP polar coordinate system, for example, in the case where the object is located in the corresponding positioning pattern. More specifically, for example, the object position information is coordinate information indicated by coordinates (polar coordinates) in a polar coordinate system similar to the CVP polar coordinate system, and indicates the object position viewed from the CVP when the direction from the CVP toward the target point TP in the free viewpoint space is the direction toward the median plane.

Also stored in the configuration information is a metadata set coefficient indicating an object metadata set corresponding to an object location mode (object location mode) set by the creator for each CVP for the corresponding CVP. Further, the receiving side (reproduction side) obtains the object metadata set in the corresponding object location pattern based on the metadata set coefficients contained in the configuration information.

Linking the CVP and the object location pattern (object metadata set) according to the above-described metadata set coefficients is considered to maintain mapping information between the CVP and the object location pattern. In this way, data management and visual recognizability of the format on the mounting surface are further improved, and reduction in memory volume can also be achieved.

For example, if an object metadata set is prepared for each object location pattern, during the process of creating the tool, the creator of the content is allowed to use a combined pattern of object location information associated with a plurality of existing objects as an object location pattern shared by a plurality of CVPs.

Specifically, for example, an object position pattern indicated by "OBJ position 2" presented in the right part of the drawing is designated for the CVP2 presented in the let part in the drawing. In this state, the object position pattern indicated by the same "OBJ position 2" as the object position pattern of the CVP2 may also be designated as CVPN +1.

In this case, the object position pattern referenced by CVP2 (related to CVP 2) and the object position pattern referenced by CVPN +1 are both constituted by the same pattern of "OBJ position 2". Therefore, the relative positioning position of the object indicated by "OBJ position 2" viewed from CVP2 is the same as the relative positioning position of the object indicated by "OBJ position 2" viewed from CVPN +1.

However, the positioning position of the object indicated by "OBJ position 2" at CVP2 in the free viewpoint space is different from the positioning position of the object indicated by "OBJ position 2" at CVPN +1 in the free viewpoint space. These positions differ from each other for the following reasons. For example, the object position information associated with the object position pattern "OBJ position 2" is represented by polar coordinates in a polar coordinate system. In this case, the position of the origin of the polar coordinate system in the free viewpoint space at the time of referencing "OBJ position 2" at CVP2 and the direction of the y-axis (direction toward the median plane) are different from the position and direction at the time of referencing "OBJ position 2" at CVPN +1. In other words, the position and direction of the y-axis and the position and direction of the origin in the CVP polar coordinate system of CVP2 are different from CVPN +1.

Furthermore, although N object location patterns are prepared in this example, CVP and object metadata do not require complicated management even after new object location patterns are additionally generated. Thus, the processing of the system can be realized. As a result, the visual recognizability of the data in the software improves, and convenience in installation can be achieved.

According to the present technology, as described above, by using the 3DoF method, 6DoF content (free viewpoint content) can be created only by locating objects in polar coordinates associated with respective CVPs.

The content creator selects the set of audio data corresponding to the objects used at the respective CVP. The audio data may be used as common data for a plurality of CVPs. Note that audio data corresponding to an object and used only at a specific CVP may be added.

Thus, by using audio data common to the respective CVPs to control the position, gain, etc. of the object of each CVP in this way, fewer redundant transmissions may be achieved.

A creation tool for generating 6DoF content (free viewpoint content) according to an operation of a creator outputs two types of data structures called configuration information and object metadata sets in the form of files or binary data.

Fig. 7 is a diagram showing an example of a format (syntax) of configuration information.

In the example presented in fig. 7, the configuration information includes a frame length coefficient "FrameLengthIndex", object number information "NumOfObjects", CVP number information "NumOfControlViewpoints", and metadata set number information "NumOfObjectMetaSets".

The frame length coefficient "FrameLengthIndex" is a coefficient indicating the length of one frame of audio data for reproducing the sound of the object (i.e., indicating the number of samples constituting one frame).

For example, fig. 8 presents a correspondence between respective values of the frame length coefficient "and the frame length indicated by the frame length coefficient.

According to the present example, for example, in the case where the value of the frame length coefficient is "5", the frame length is set to "1024". In other words, one frame includes 1024 samples.

Returning now to the description of fig. 7. The object number information "NumOfObjects" is information indicating the number of pieces of audio data constituting the content, that is, the number of objects (audio objects). The CVP number information "NumOfControlViewPoints" is information indicating the number (number of bars) of CVPs set by the creator. The metadata set number information "NumOfObjectMetaSets" is information indicating the number (number of pieces) of the object metadata set.

Further, the configuration information contains the same number of CVP information "ControlViewpointInfo (i)" as the CVP number indicated by the CVP number "NumOfControlViewpoints" as information associated with the CVP. In other words, the CVP information is stored for each CVP set by the creator.

Also stored in the configuration information is coordinate mode information "CoordinateMode [ i ] [ j ]" of each CVP as flag information indicating a description method of object position information contained in object metadata of each object.

For example, a value of "0" of the coordinate mode information indicates that the object position information is described using absolute coordinates in a common absolute coordinate system. On the other hand, a value of "1" of the coordinate mode information indicates that the object position information is described using the polar coordinates in the CVP polar coordinate system. Note that, in the case where the value of the coordinate mode information is assumed to be "1", the following description is continued.

Further, fig. 9 presents an example of the format (syntax) of the CVP information "ControlViewpointInfo (i)" contained in the configuration information.

In this example, the CVP information contains CVP coefficient "ControlViewpointIndex [ i ]" and metadata set coefficient "Associated ObjectMetaSetIndex [ i ]".

The CVP coefficient "ControlViewpointIndex [ i ]" is coefficient information for identifying a CVP corresponding to the CVP information.

The metadata set coefficient "AssociatedObjectMetaSetIndex [ i ]" is coefficient information (specification information) indicating an object metadata set specified by the creator of the CVP indicated by the CVP coefficient. In other words, the metadata set coefficients are information indicating the object metadata set associated with the CVP.

Further, the CVP information includes CVP position information and CVP orientation information.

Specifically, stored as the CVP position information are an X coordinate "CVPosX [ i ]", a Y coordinate "CVPosY [ i ]" and a Z coordinate "CVPosZ [ i ]", each representing the position of the CVP in a common absolute coordinate system.

Further, CVP yaw information "CVYaw [ i ]", CVP pitch information "CVPitch [ i ]", and CVP roll information "CVRoll [ i ]", are stored as CVP orientation information.

FIG. 10 is a diagram presenting an example of an object metadata set, and more particularly, a format (syntax) example of an object metadata set group.

In this example, "NumOfObjectMetaSets" indicates the number of stored object metadata sets. The number of object metadata sets may be acquired from the metadata set number information contained in the configuration information. Further, "ObjectMetaSetIndex [ i ]" indicates the coefficient of the object metadata set, and "NumOfObjects" indicates the number of objects.

"NumOfChangePoints" indicates the number of change points each corresponding to the time of the change of the content of the corresponding object metadata set.

In this example, the object metadata sets are not stored between change points. Further, for each object metadata set in the object metadata sets, frame coefficients "frame_index [ i ] [ j ] [ k ]", and Gain "Gain [ i ] [ j ] [ k ]", for identifying change points, "PosA [ j ] [ k ]", "PosB [ i ] [ j ] [ k ]", and "PosC [ i ] [ j ] [ k ]", are stored for each change point of the corresponding object. The Gain "Gain [ i ] [ j ] [ k ]" is the Gain of an object (audio data) viewed from the CVP.

The frame coefficient "frame_index [ i ] [ j ] [ k ]" is a coefficient indicating a frame of audio data associated with an object and corresponding to a change point. The receiving side (reproduction side) identifies the sampling position of the audio data corresponding to the change point based on the frame coefficient "frame_index [ i ] [ j ] [ k ]" and the frame length coefficient "FrameLengthIndex" contained in the configuration information.

The positions "PosA [ i ] [ j ] [ k ]," PosB [ i ] [ j ] [ k ], "PosC [ i ] [ j ] [ k ]" indicating the position of the object respectively indicate the angular orientation in the horizontal direction, the angular elevation angle in the vertical direction, and the radius of the horizontal direction indicating the position (polar coordinate) of the object in the CVP polar coordinate system. In other words, information including "PosA [ i ] [ j ] [ k ]," PosB [ i ] [ j ] [ k ], "and" PosC [ i ] [ j ] [ k ] "corresponds to object position information.

It should be noted that the value of the coordinate mode information herein is assumed to be "1". In the case where the value of the coordinate mode information is "0", the "PosA [ i ] [ j ] [ k ]", "PosB [ i ] [ j ] [ k ]" and "PosC [ i ] [ j ] [ k ]" correspond to an X-coordinate, a Y-coordinate and a Z-coordinate, respectively, which represent the position of the object in a common absolute coordinate system.

As described above, the frame coefficients for each object, as well as the object position information and gains associated with the respective object, are stored for each change point in each object metadata set in the format presented in fig. 10.

The position of each object may be constantly fixed or may be dynamically changed in the time direction.

According to the format example described in fig. 10, each time position corresponding to the change in the object position (i.e., each position of the above-described change point) is recorded as a frame coefficient "frame_index [ i ] [ j ] [ k ]". Further, for example, based on the object position information and the gain at each change point on the receiving side (reproduction side), the gain between the object position information and the change point is obtained by interpolation processing using a line.

As described above, dynamic change of the object position can be handled by adopting the format presented in fig. 10, and thus, the necessity of holding data for the entire time (frame) is eliminated. Thus, the file size can be reduced.

Fig. 11 shows an example of the localization of a CVP in free view space during the creation of an actual live content as free view content.

In this example, the space covering the entire live performance venue corresponds to free viewpoint space. At a live performance site, an artist as an object performs a musical composition on, for example, stage ST 11. In addition, audience seats are provided in such a manner as to surround stage ST11 in a live performance place.

The position of the origin 0 in the free viewpoint space (i.e., in the common absolute coordinate system) is set according to the intention of the creator of the content. However, in this example, the origin O is located at the center of the live performance venue.

Further, according to the present example, the creator sets the target point TP at stage ST11 and sets seven CVPs out of CVPs 1 to CVP7 in the live performance venue.

As described above, each direction from each CVP toward the target point TP is designated as a direction toward the median plane of each CVP (CVP polar coordinate system).

Thus, for example, for a user whose viewpoint position (listener position) is set to the direction of the CVP1 facing the target point TP, the content picture is presented as if the user were viewing the stage ST11 directly in front of the stage ST11, as indicated by the arrow Q11.

Also, for example, for a user whose viewpoint position is set to the direction of the CVP4 facing the target point TP, a content picture is presented as if the user viewed the stage ST11 obliquely from the front of the stage ST11, as indicated by an arrow Q12. Further, for the user facing the direction of the target point TP with the viewpoint position set as the CVP5, for example, a content picture is presented as if the user viewed the stage ST11 obliquely from behind the stage ST11, as indicated by an arrow Q13.

In the case of creating such free viewpoint content, the creator's object localization work for one CVP is equivalent to the 3DoF content creation work.

In the example depicted in fig. 11, in addition to the object locating work for one CVP for processing 6DoF, only the creator of the free-viewpoint content needs to set another six CVPs and perform the object locating work for CVPs to process 6DoF. Thus, according to the present technology, creation of free viewpoint content can be achieved by performing a similar work to that for 3DoF content.

At the same time, the reverberant component in space is initially generated by physical propagation or reflection in space.

Thus, for example, in the case where a physical reverberation component arriving as a free viewpoint space through reflection or propagation within a concert place is regarded as an object, a reverberation object in a CVP polar coordinate system (CVP polar coordinate space) is positioned as described in fig. 12. Note that fig. 12 describes the positional relationship between the target points TP and CVP3 in a manner similar to that shown in fig. 3.

In fig. 12, reverberant sound generated from a sound source (player) in the vicinity of the target point TP and traveling toward the CVP3 is indicated by a dotted arrow. In other words, the arrows in the figure represent the physical reverberation path. Specifically, four reverberation paths are shown herein.

Thus, if these reverberant sounds are not changed to be located in the CVP polar coordinate space of the CVP3 as reverberant objects, the respective reverberant objects are located at the positions P11 to P14.

As described above, in the case of using a signal having a strong reverberation component in a narrow region (free viewpoint space) without change, a higher sense of realism can be provided. However, the elimination of the original music signal caused by reverberation generally reduces musical performance.

By the creation tool of the present technology employing object localization in the CVP polar coordinate system, the content creator can process the reverberation component as an object (reverberation object), determine an arrival direction considered to be optimal from the musical aspect, and localize the object according to the determination result. In this way, generating a reverberation effect in space can be achieved.

Specifically, for example, in the case where the CVP3 is designated as the listener position, if the reverberation object is located at the positions P11 to P14, the sound mainly including the reverberation sound is concentrated on a narrow area in front of the concert place. Thus, the creator locates these reverberant objects at positions P '11 to P'14 behind the listener present at CVP 3. In other words, the positioning positions of the reverberation objects are shifted from the positions P11 to P14 to the positions P '11 to P'14 to increase musical performance in consideration of the reverberation path or the like.

As described above, by intentionally locating the entire reverberant object behind the listener, reverberant component cancellation (i.e., difficulty in listening to the music itself) can be avoided.

< Configuration example of information processing apparatus >

An information processing apparatus that provides a creation tool and creates the above-described free viewpoint content will be described below.

Such an information processing apparatus includes, for example, a personal computer or the like, and has the configuration shown in fig. 13.

The information processing apparatus 11 shown in fig. 13 includes an input unit 21, a display unit 22, a recording unit 23, a communication unit 24, a sound output unit 25, and a control unit 26.

For example, the input unit 21 includes a mouse, a keyboard, a touch panel, buttons, switches, and the like, and supplies signals corresponding to operations performed by the content creator to the control unit 26. The display unit 22 displays an image such as a display screen of the content generation tool under the control of the control unit 26.

The recording unit 23 holds various types of data recorded therein, such as audio data of respective objects for content creation, and configuration information and an object metadata set supplied from the control unit 26, and supplies the recorded data to the control unit 26 as needed.

The communication unit 24 communicates with an external device such as a server. For example, the communication unit 24 transmits data supplied from the control unit 26 to a server or the like, and also receives data transmitted from the server or the like and supplies the data to the control unit 26.

The sound output unit 25 includes, for example, a speaker, and outputs sound based on the audio data supplied from the control unit 26.

The control unit 26 controls the overall operation of the information processing apparatus 11. For example, the control unit 26 generates (creates) free viewpoint content based on the signal supplied from the input unit 21 according to the operation performed by the creator.

< Description of content creation Process >

Next, the operation of the information processing apparatus 11 will be described. Specifically, hereinafter, the content creation process performed by the information processing apparatus 11 will be described with reference to a flowchart presented in fig. 14.

For example, when the control unit 26 reads out and executes the program recorded in the recording unit 23, the generation tool that generates the free viewpoint content operates.

After the generation of the tool is started, the control unit 26 supplies predetermined image data to the display unit 22, and causes the display unit 22 to display the display screen of the generation tool. For example, an image of a free viewpoint space or the like is displayed on a display screen.

Further, for example, the control unit 26 reads audio data of each object constituting the free viewpoint content created from now on from the recording unit 23 according to an operation performed by the creator as needed, and supplies the read audio data to the sound output unit 25 to reproduce the sound of the object.

The creator checks an image or the like indicating a free viewpoint space, which is displayed on the display screen as needed, while listening to the sound of the subject, and operates the input unit 21 to perform the content creation operation.

In step S11, the control unit 26 sets the target point TP.

For example, the creator designates an arbitrary position (point) in the free viewpoint space as the object point TP by operating the input unit 21.

When an operation for specifying the target point TP is performed by the creator, a signal corresponding to the operation performed by the creator is supplied from the input unit 21 to the control unit 26. Thus, the control unit 26 designates the creator-designated position contained in the free viewpoint space as the target point TP based on the signal supplied from the input unit 21. That is, the control unit 26 sets the target point TP.

In addition, the control unit 26 may set the position of the origin 0 in the free viewpoint space (common absolute coordinate system) according to the operation of the creator.

In step S12, the control unit 26 sets each of the number of CVPs and the number of object metadata sets to 0. Specifically, the control unit 26 sets CVP number information "NumOfControlViewPoints" to 0 and sets metadata number setting information "NumOfObjectMetaSets" to 0.

In step S13, the control unit 26 determines whether the editing mode selected by the operation performed by the creator is the CVP editing mode based on the signal received from the input unit 21.

It is assumed here that the editing modes include a CVP editing mode for editing a CVP, an object metadata set editing mode for editing an object metadata set, and a link editing mode for associating (linking) a CVP with an object metadata set.

In the case where it is determined in step S13 that the selected mode is the CVP edit mode, the control unit 26 determines in step S14 whether to change the CVP configuration.

For example, in a case where the creator performs an operation for adding (setting) a new CVP or issuing an instruction to delete an existing CVP in the CVP editing mode, it is determined that the CVP configuration is to be changed.

In the case where it is determined in step S14 that the CVP configuration is not changed, the process returns to step S13 to repeat the above-described process.

On the other hand, in the case where it is determined in step S14 that the CVP configuration is to be changed, the process then proceeds to step S15.

In step S15, the control unit 26 updates the number of CVPs based on the signal supplied from the input unit 21 according to the operation performed by the creator.

For example, in the case where the creator performs an operation for adding (setting) a new CVP, the control unit 26 updates the number of CVPs by adding 1 to the value of the CVP number information "NumOfControlViewPoints" that is currently reserved. On the other hand, in the case where the creator performs an operation for deleting one existing CVP, for example, the control unit 26 updates the number of CVPs by subtracting 1 from the value of the currently reserved CVP number information "NumOfControlViewPoints".

In step S16, the control unit 26 edits the CVP according to the creator' S operation.

For example, when an operation for designating (adding) a CVP is performed by the creator, the control unit 26 designates a position designated by the creator as a position of a newly added CVP in the free viewpoint space based on a signal supplied from the input unit 21. In other words, the control unit 26 sets a new CVP. Further, when an operation for deleting any CVP is performed by the creator, the control unit 26 deletes the CVP specified by the creator in the free view space based on the signal supplied from the input unit 21.

After completing the editing of the CVP, the process then returns to step S14 to repeat the above-described process. Specifically, editing of the CVP is newly performed.

Further, in the case where it is determined in step S13 that the selected mode is not the CVP editing mode, in step S17, the control unit 26 determines whether the selected mode is the object metadata set editing mode.

In the case where it is determined in step S17 that the selected mode is the object metadata set editing mode, the control unit 26 determines in step S18 whether to change the object metadata set.

For example, in the case where the creator performs an operation for adding (setting) a new object metadata set or issuing an instruction to delete an existing object metadata set in the object metadata set editing mode, it is determined that the object metadata set is to be changed.

In the case where it is determined in step S18 that the object metadata set is not changed, the process returns to step S13 to repeat the above-described process.

On the other hand, in the case where it is determined in step S18 that the object metadata set is to be changed, the process then proceeds to step S19.

In step S19, the control unit 26 updates the number of object metadata sets based on the signal supplied from the input unit 21 according to the operation performed by the creator.

For example, in the case where the creator performs an operation for adding (setting) a new object metadata set, the control unit 26 updates the number of object metadata sets by adding 1 to the value of the currently held metadata set number information "NumOfObjectMetassets". On the other hand, in the case where the creator performs an operation for deleting one existing object metadata set, for example, the control unit 26 updates the number of object metadata sets by subtracting 1 from the value of the currently held metadata set number information "NumOfObjectMetaSets".

In step S20, the control unit 26 edits the object metadata set in accordance with the operation of the creator.

For example, when an operation for setting (adding) a new object metadata set is performed by the creator, the control unit 26 generates a new object metadata set based on a signal supplied from the input unit 21.

At this time, for example, the control unit 26 causes the display unit 22 to display an image of the CVP polar coordinate space as needed, and the creator designates a position (point) in the image of the CVP polar coordinate space as an object positioning position of the new object metadata set.

When the generator performs an operation of specifying one or more object positioning positions, the control unit 26 specifies a position specified by the generator in the CVP polar coordinate space as an object positioning position, and generates a new object metadata set.

Further, for example, when an operation for deleting any object metadata set is performed by the creator, the control unit 26 deletes the object metadata set specified by the creator based on a signal supplied from the input unit 21.

After completing editing of the object metadata set, the process then returns to step S18 to repeat the above-described process. Specifically, editing of the object metadata set is newly performed. Note that the existing object metadata set may be changed as an edit of the object metadata set.

Further, in the case where it is determined in step S17 that the selected mode is not the object metadata set editing mode, in step S21, the control unit 26 determines whether the selected mode is the link editing mode.

In the case where it is determined in step S21 that the selected mode is the link edit mode, in step S22, the control unit 26 associates the CVP with the object metadata set according to the operation performed by the creator.

Specifically, for example, the control unit 26 generates a metadata set coefficient "Associated ObjectMetaSetIndex [ i ]" representing the object metadata set specified by the creator for the CVP specified by the creator based on the signal supplied from the input unit 21. In this way, a link between the CVP and the object metadata set is achieved.

After the process in step S22 is completed, the process then returns to step S13 to repeat the above-described process.

On the other hand, in the case where it is determined in step S21 that the selected mode is not the link edit mode (i.e., an instruction to end the free viewpoint content creation work is issued), the process proceeds to step S23.

In step S23, the control unit 26 outputs the content data via the communication unit 24.

For example, the control unit 26 generates configuration information based on the setting results of the target point TP, the CVP, and the object metadata set, and the link result between the CVP and the object metadata set.

Specifically, for example, the control unit 26 generates configuration information including the frame length coefficient, the object number information, the CVP number information, the metadata set number information, the CVP information, and the coordinate mode information described with reference to fig. 7 and 9, respectively. At this time, the control unit 26 calculates CVP orientation information by calculation similar to equation (1) above as needed to generate CVP information including a CVP coefficient, a metadata set coefficient, CVP position information, and CVP orientation information.

Further, the control unit 26 generates a plurality of object metadata sets for each change point by the process in step S20 as described with reference to fig. 10, each object metadata set containing a frame coefficient of each object and object position information and gain associated with the corresponding object.

In this way, content data including audio data of a corresponding object, configuration information, and a plurality of sets of object metadata different from each other are generated for one free-viewpoint content. The control unit 26 supplies the generated content data to the recording unit 23 to record the data therein, and supplies the content data to the communication unit 24 as needed.

The communication unit 24 outputs the content data supplied from the control unit 26. Specifically, the communication unit 24 transmits the content data to the server via the network at any time. Note that the content data may be supplied to a recording medium or the like and provided to the server in the form of a recording medium.

After the content data is output, the content creation process ends.

In the above manner, the information processing apparatus 11 provides settings of the target point TP, CVP, object metadata set, and the like according to the operation performed by the creator, and generates content data including audio data, configuration information, and object metadata set.

In this way, the reproduction side can reproduce the free viewpoint content based on the object localization specified by the creator. Thus, content reproduction having musical properties can be achieved based on the intention of the content creator.

< Configuration example of server >

Next, a server that receives a supply of content data of free viewpoint content from the information processing apparatus 11 and distributes the content data to a client will be described.

Such a server has, for example, the configuration depicted in fig. 15.

The server 51 described in fig. 15 is constituted by an information processing apparatus such as a computer. The server 51 includes a communication unit 61, a control unit 62, and a recording unit 63.

The communication unit 61 communicates with the information processing apparatus 11 and the client under the control of the control unit 62. For example, the communication unit 61 receives content data of free viewpoint content transmitted from the information processing apparatus 11 and supplies the content data to the control unit 62, and also transmits an encoded bit stream supplied from the control unit 62 to the client.

The control unit 62 controls the overall operation of the server 51. For example, the control unit 62 has an encoding unit 71. The encoding unit 71 encodes content data of the free-view content to generate an encoded bitstream.

The recording unit 63 holds various types of data recorded therein, such as content data of free viewpoint content supplied from the control unit 62, and supplies the recorded data to the control unit 62 as needed. Further, hereinafter, it is assumed that content data of free viewpoint content received from the information processing apparatus 11 is recorded in the recording unit 63.

< Description of distribution Process >

When a request for distributing free-viewpoint content is issued from a client connected to the server 51 via a network, the server 51 performs distribution processing for distributing free-viewpoint content in response to the request. Hereinafter, the distribution processing performed by the server 51 will be described with reference to a flowchart in fig. 16.

In step S51, the control unit 62 generates an encoded bit stream.

Specifically, the control unit 62 reads out content data of the free viewpoint content from the recording unit 63. Thereafter, the encoding unit 71 of the control unit 62 encodes the audio data, the configuration information, and the plurality of object metadata sets constituting the respective objects of the read content data to generate an encoded bitstream. The control unit 62 supplies the encoded bit stream thus obtained to the communication unit 61.

In this case, the encoding unit 71 encodes the audio data, the configuration information, and the object metadata set according to an encoding method for MPEG (moving picture experts group) -I or MPEG-H, for example. In this way, the data transfer amount can be reduced. Furthermore, the audio data of the object is common to all CVPs. Thus, no matter the number of CVPs, only one audio data need be stored for each object.

In step S52, the communication unit 61 transmits the encoded bit stream received from the control unit 62 to the client. After that, the distribution process ends.

Note that described in this example is the case where each encoded audio data, configuration information, and object metadata set are multiplexed into one encoded bitstream. However, each of the configuration information and the object metadata group may be transmitted to the client at a timing different from the transmission timing of the audio data. For example, the configuration information and the object metadata set may be sent to the client first, and then only the audio data may be sent to the client.

In the above manner, the server 51 generates an encoded bitstream containing audio data, configuration information, and an object metadata set, and transmits the generated encoded bitstream to the client. In this way, the client can reproduce the content having musical property and satisfying the intention of the content creator.

< Configuration example of client >

Further, a client that receives the encoded bitstream from the server 51 and generates reproduction audio data for reproducing free viewpoint content has a configuration such as that described in fig. 17.

For example, the client 101 described in fig. 17 is constituted by an information processing apparatus such as a personal computer and a smart phone. The client 101 includes a listener position information acquisition unit 111, a communication unit 112, a decoding unit 113, a position calculation unit 114, and a rendering processing unit 115.

The listener position information acquisition unit 111 acquires listener position information indicating an absolute position (i.e., a listener position) of a user corresponding to the listener in the free viewpoint space as information input by the user, and supplies the acquired listener position information to the position calculation unit 114.

For example, the listener position information includes absolute coordinates or the like indicating the listener position in the free viewpoint space (i.e., in a common absolute coordinate system).

Note that the listener position information acquisition unit 111 may also acquire listener orientation information indicating the orientation (direction) of the face of the listener in the free viewpoint space (common absolute coordinate system) and supply the acquired listener orientation information to the position calculation unit 114.

The communication unit 112 receives the encoded bit stream transmitted from the server 51 and supplies the encoded bit stream to the decoding unit 113. Specifically, the communication unit 112 functions as an acquisition unit that acquires audio data, configuration information, and an object metadata set of each object, each encoded and contained in an encoded bitstream.

The decoding unit 113 decodes the encoded bit stream supplied from the communication unit 112, i.e., audio data, configuration information, and object metadata sets of respective objects of respective encodings. The decoding unit 113 supplies the audio data associated with the respective objects and obtained by decoding to the rendering processing unit 115, and also supplies each of the configuration information and the object metadata set obtained by decoding to the position calculating unit 114.

The position calculation unit 114 calculates listener reference object position information indicating the position (listener position) of the corresponding object viewed from the listener based on the listener position information supplied from the listener position information acquisition unit 111 and the configuration information and object metadata set supplied from the decoding unit 113.

Each position of the object indicated by the listener reference object position information is information indicating a relative position (listener position) of the corresponding object viewed from the listener and indicated by coordinates (polar coordinates) in a polar coordinate system having an origin (reference) located at the listener position.

For example, listener reference object position information is calculated by interpolation processing based on CVP position information, object position information, and listener position information associated with all CVPs and some of the CVPs. The interpolation process may be any process, such as vector synthesis. Note that the CVP orientation information and the listener orientation information may also be used to calculate listener reference object position information.

Further, the position calculation unit 114 calculates a listener reference gain for each object at the listener position indicated by the listener position information by performing interpolation processing based on the gain obtained for each object and included in the object metadata set supplied from the decoding unit 113. The listener reference gain is the gain of the corresponding object viewed from the listener position.

The position calculation unit 114 supplies the listener reference gain and the listener reference object position information of the corresponding object at the listener position to the rendering processing unit 115.

The rendering processing unit 115 performs rendering processing based on the audio data associated with the respective objects supplied from the decoding unit 113 and the listener reference gain and listener reference object position information supplied from the position calculating unit 114 to generate reproduced audio data.

For example, the rendering processing unit 115 generates reproduced audio data by performing rendering processing in a polar coordinate system specified by MPEG-H, such as VBAP (vector-based amplitude panning). The reproduction audio data is audio data for reproducing free viewpoint content sound containing sound of all objects.

< Description of reproduction Audio data Generation Process >

The operations performed by the client 101 will be described later. Specifically, the reproduced audio data generating process performed by the client 101 is described hereinafter with reference to a flowchart set forth in fig. 18.

In step S81, the communication unit 112 receives the encoded bit stream transmitted from the server 51, and supplies the encoded bit stream to the decoding unit 113.

In step S82, the decoding unit 113 decodes the encoded bit stream supplied from the communication unit 112.

The decoding unit 113 supplies the audio data associated with the respective objects and obtained by decoding to the rendering processing unit 115, and also supplies each of the configuration information and the object metadata set obtained by decoding to the position calculating unit 114.

Note that the configuration information and the object metadata set may be received at a timing different from the reception timing of the audio data.

In step S83, the listener position information acquisition unit 111 acquires the listener position information at the current time and supplies the acquired listener position information to the position calculation unit 114. Note that the listener position information acquisition unit 111 may also acquire listener orientation information and supply the acquired listener orientation information to the position calculation unit 114.

In step S84, the position calculation unit 114 performs interpolation processing based on the listener position information supplied from the listener position information acquisition unit 111, and the configuration information and the object metadata set supplied from the decoding unit 113.

Specifically, for example, the position calculation unit 114 calculates listener reference object position information by performing vector synthesis as interpolation processing, calculates listener reference gain by performing interpolation processing, and then supplies the listener reference object position information and the listener reference gain thus calculated to the rendering processing unit 115.

Note that when interpolation processing is performed, the current time (sampling) may be the time between change points, and thus, the object position information and gain at the current time may not be stored in the object metadata. In this case, the position calculation unit 114 calculates the object position information and the gain at the CVP at the current time by performing interpolation processing based on the object position information and the gain at a plurality of change points near the current time (such as a time immediately before the current time or immediately after the current time).

In step S85, the rendering processing unit 115 performs rendering processing based on the audio data associated with each object supplied from the decoding unit 113 and the listener reference gain and listener reference object position information supplied from the position calculation unit 114.

For example, the rendering processing unit 115 performs gain correction on the audio data of the respective objects based on the listener reference gains of the respective objects.

Thereafter, the rendering processing unit 115 performs a rendering process (such as VBAP) based on the audio data of the corresponding object after the gain correction and listener reference object position information to generate reproduced audio data.

The rendering processing unit 115 outputs the generated reproduced audio data to blocks of a subsequent stage such as a speaker.

In this way, reproduction of the free-viewpoint content (6 DoF content) can be realized at a listening position located at any position within the free-viewpoint space (i.e., from multiple viewpoints).

In step S86, the client 101 determines whether to end the processing. For example, in the case where the reception of the encoded bit stream and the generation of the reproduced audio data are completed for all frames of the free-viewpoint content, it is determined in step S86 that the processing ends.

In the case where it is determined in step S86 that the processing has not ended, the processing returns to step S81 to repeat the above-described processing.

On the other hand, in the case where it is determined in step S86 that the processing ends, the client 101 terminates the operation of the corresponding unit, and the reproduction audio data generation processing ends.

In the above manner, the client 101 performs interpolation processing based on the listener position information, the configuration information, and the object metadata set and calculates the listener reference gain and the listener reference object position information at the listener position.

In this way, content reproduction with musical property can be achieved according to the intention of the content creator based on the listener position, instead of a simple physical relationship between the listener and the object, and thus, the interest of the content can be sufficiently conveyed to the listener.

< Interpolation Process >

Here, a specific example of the interpolation processing performed in step S84 in fig. 18 is described. Here, the case of performing polar vector synthesis will be specifically described.

For example, as shown in fig. 19, it is assumed that any viewpoint position indicated by the position information of the listener in the free viewpoint space (common absolute coordinate system) (i.e., the position of the listener at the current time) is the listener position LP11. Fig. 19 depicts a state of the free viewpoint space in a bird's eye view when viewed from above.

For example, assuming that the predetermined object is an object of interest, listener reference object position information indicating a position PosF of the object of interest at which the origin is located in the polar coordinate system of the listener position LP11 is required to generate reproduced audio data at the listener position LP11 by the rendering process.

Thus, for example, the position calculation unit 114 selects a plurality of CVPs around the listener position LP11 as CVPs for interpolation processing. In this example, three CVPs among CVPs 0 to 2 are selected as CVPs for interpolation processing.

For example, three or more predetermined number of CVPs located at positions around the listener position LP11 and at the shortest distance from the listener position LP11 may be selected. In this way, any choice of CVP may be made. Further, interpolation processing may be performed using all CVPs. In this case, the position calculation unit 114 may identify the position of each CVP in the common absolute coordinate system with reference to the CVP position information contained in the configuration information.

After selecting CVPs 0 to CVP2 as CVPs for interpolation processing, the position calculation unit 114 obtains an object three-dimensional position vector shown in fig. 20, for example.

The position of the object of interest in the polar coordinate space of the CVP0 is depicted in the left part of fig. 20. In this example, position Pos0 is a locating position of the object of interest as viewed from CVP 0. The position calculation unit 114 calculates a vector V11 having a start point located at the origin O' of the CVP polar coordinate system of the CVP0 and an end point located at the position Pos0 as an object three-dimensional position vector of the CVP 0.

Note that the position of the object of interest in the polar coordinate space of the CVP1 is depicted in the central part of the figure. In this example, the position Pos1 is a positioning position of the object of interest as viewed from the CVP 1. The position calculation unit 114 calculates a vector V12 having a start point located at the origin O' of the CVP polar coordinate system of the CVP1 and an end point located at the position Pos1 as an object three-dimensional position vector of the CVP 1.

Similarly, the position of the object of interest in the polar coordinate space of the CVP2 is depicted in the right part of the figure. In this example, the position Pos2 is a locating position of the object of interest as viewed from the CVP 2. The position calculation unit 114 calculates a vector V13 having a start point located at the origin O' of the CVP polar coordinate system of the CVP2 and an end point located at the position Pos2 as an object three-dimensional position vector of the CVP 2.

Here, a specific calculation method of the three-dimensional position vector of the object will be described.

For example, an absolute coordinate system (rectangular coordinate system) whose origin is at the origin O' of the CVP polar coordinate system of CVPi and whose x, y, and z axes correspond to the x, y, and z axes of the CVP polar coordinate system of CVPi without change is referred to as a CVP absolute coordinate system (CVP absolute coordinate space).

The object three-dimensional position vector is a vector indicated by coordinates in the CVP absolute coordinate system.

For example, suppose that the polar coordinates representing the position Posi of the object of interest in the CVP polar coordinate system of CVPi are (Azi [ i ], ele [ i ], rad [ i ]). Az [ i ], ele [ i ], rad [ i ] correspond to PosA [ i ] [ j ] [ k ], posB [ i ] [ j ] [ k ] and PosC [ i ] [ j ] [ k ] described with reference to FIG. 10. Further, it is assumed that the gain of the object of interest viewed from CVPi is expressed as g [ i ]. The gain g [ i ] corresponds to the gain [ i ] [ j ] [ k ] described with reference to fig. 10.

Furthermore, it is assumed that the absolute coordinates representing the position Posi of the object of interest in the CVP absolute coordinate system of CVPi are (vx [ i ], vy [ i ], vz [ i ]).

In this case, the object three-dimensional position vector of CVPi is (vx [ i ], vy [ i ], vz [ i ]). The object three-dimensional position vector can be obtained by the following equation (2).

[ Math figure 2]

vx[i]＝-sin(Azi[i])*cos(Ele[i])

vy[i]＝cos(Azi[i])*sin(Ele[i])

vz[i]＝rad[i]*sin(Ele[i]) ... (2)

The position calculation unit 114 reads, from the CVP information associated with CVPi and contained in the configuration information, metadata set coefficients indicating an object metadata set to be referred to by CVPi. Further, the position calculation unit 114 reads object position information and gain associated with the object of interest at CVPi from object metadata associated with the object of interest and constituting an object metadata set indicated by the read metadata set coefficient.

Thereafter, the position calculation unit 114 calculates equation (2) based on the object position information associated with the object of interest at CVPi to obtain an object three-dimensional position vector (vx [ i ], vy [ i ], vz [ i ]). This calculation of equation (2) is a conversion from polar coordinates to absolute coordinates.

After obtaining vectors V11 to V13 as the three-dimensional position vectors of the object depicted in fig. 20 using equation (2), the position calculation unit 114 calculates vector sums of the vectors V11 to V13, for example, as depicted in fig. 21. Note that the components in fig. 21 that are similar to the corresponding components in fig. 20 are given the same reference numerals, and descriptions of these components will be omitted where appropriate.

In this example, the sum of the vectors V11 to V13 is calculated, and a vector V21 is obtained as a result of the calculation. In other words, the vector V21 is obtained by vector synthesis based on the vectors V11 to V13.

More specifically, the vector V21 is obtained by synthesizing the vectors V11 to V13 based on weights (object three-dimensional position vectors) corresponding to the contribution rates of the respective CVPs for calculating the vector V21 indicating the position PosF. It should be noted that, for simplicity of description, it is assumed that the contribution rate of each CVP is 1 in fig. 21.

The vector V21 is a vector indicating the position PosF of the object of interest in the absolute coordinate system as viewed from the listener position LP 11. The absolute coordinate system is a coordinate system having an origin located at the listener position LP11 and a y-axis positive direction corresponding to a direction from the listener position LP11 toward the target point TP.

For example, assuming that the absolute coordinates of the position PosF indicating that the origin is located in the absolute coordinate system of the listener position LP11 are (vxF, vyF, vzF), the vector V21 is represented as (vxF, vyF vzF).

Further, assuming that the gain of the object of interest viewed from the listener position LP11 is gF and the contribution rates of CVP0 to CVP2 are dep [0] to dep [2], respectively, the vector (vxF, vyF, vzF) (i.e., the vector V21) and the gain gF can be obtained by the following equation (3).

[ Math 3]

vxF＝dep[0]*vx[0]+dep[1]*vx[1]+dep[2]*vx[2]

vyF＝dep[0]*vy[0]+dep[1]*vy[1]+dep[2]*vy[2]

vzF＝dep[0]*vz[0]+dep[1]*vz[1]+dep[2]*vz[2]

gF＝dep[0]*g[0]+dep[1]*g[1]+dep[2]*g[2] ... (3)

The listener reference object position information can be obtained by converting the vector V21 obtained in the above-described manner into polar coordinates indicating the position PosF of the object of interest in the polar coordinate system whose origin is located at the listener position LP 11. Further, the gain gF obtained by equation (3) is a listener reference gain.

According to the present technique, one target point TP common to all CVPs is set. In this way, the desired listening reference object position information and the desired listener reference gain can be obtained by simple calculation.

The carrier synthesis will be further addressed here.

For example, assume that the position LP21 in the free viewpoint space is the listener position as described in the left part of fig. 22. It is also assumed that CVPs 1 to 5 are set in the free view space, and listener reference object position information is obtained using the CVPs 1 to 5. It should be noted that fig. 22 describes an example in which the CVP is located in a two-dimensional plane to simplify the description.

In this example, CVPs 1 to 5 are positioned around the center located on target point TP. Further, the y-axis positive direction of the CVP polar coordinate system of each CVP corresponds to the direction from the corresponding CVP toward the target point TP.

Furthermore, the positions OBP1 to OBP5 are the positions of the same object of interest as viewed from CVP1 to CVP5, respectively. Thus, the polar coordinates of the indicated positions OBP1 to OBP5 are expressed in the CVP polar coordinate system and correspond to the object position information associated with the CVPs 1 to 5, respectively.

In this case, it is assumed that each CVP is axially rotated in such a manner that the y-axis of the CVP polar coordinate system coincides with the vertical direction (i.e., upward direction in the drawing). Furthermore, the object of interest is repositioned in such a way that the origin O' of the CVP polar coordinate system of each CVP after rotation coincides with the origin of one and the same CVP polar coordinate system. In this case, the positions OBP1 to OBP5 of the object of interest at the respective CVPs exhibit a relationship as presented by the right-hand portion of the figure when viewed from the origin of the CVP polar coordinate system. In other words, the object position assuming that each CVP is located at the origin and the median plane is designated as the Y-axis positive direction is described in the right part of the figure.

In the CVP polar coordinate system of each CVP, there is a limit that the direction toward the median plane coincides with the direction toward the target point TP. Therefore, the positional relationship shown on the right side in the drawing can be easily obtained.

Further, it is assumed that the vectors V41 to V45 are vectors each having a start point located at a position (i.e., origin) of the corresponding CVP and each having an end point located at positions OBP1 to OBP5 of the object of interest in the right part of the figure. The vectors V41 to V45 here correspond to the vectors V11 to V13 depicted in fig. 20.

Therefore, as described in fig. 23, a vector V51 indicating the position of the object of interest viewed from the listener position LP21 may be obtained by synthesizing the vectors V41 to V45 using the contribution ratio of the Corresponding Vector (CVP) as weights. The vector V51 herein corresponds to the vector V21 described in fig. 21. Note that, in order to simplify the description in the case of fig. 23, it is assumed that the contribution rate of each CVP is 1.

Further, for example, the contribution ratio of each CVP during vector synthesis may be obtained based on a distance ratio from the listener position to the corresponding CVP in the free viewpoint space (common absolute coordinate system).

Specifically, as shown in fig. 24, for example, it is assumed that the listener position indicated by the listener position information is position F, and three positions of the CVP used for the interpolation process are positions a to C. It is further assumed that the absolute coordinates of position F in the common absolute coordinate system are (xf, yf, zf), and the absolute coordinates of positions A, B and C in the common absolute coordinate system are (xa, ya, za), (xb, yb, zb) and (xc, yc, zc), respectively. Note that absolute coordinates indicating positions of respective CVPs in a common absolute coordinate system may be obtained with reference to CVP position information included in the configuration information.

At this time, the position calculation unit 114 obtains a ratio (distance ratio) of each of the distance AF from the position F to the position a, the distance BF from the position F to the position B, and the distance CF from the position F to the position C, and designates the reciprocal of each distance ratio as a ratio of the contribution ratio (dependency ratio) of the CVP at the corresponding position.

Specifically, suppose AF: BF: cf=a: b: c holds, and the dependence of the respective CVPs located at the positions a to C on the listener position (listener reference object position information) is dp (AF), dp (BF), and dp (CF), respectively, the position calculation unit 114 calculates the following equation (4).

[ Mathematics 4]

Degree of dependence dp (AF): dp (BF): dp (CF) =1/a: 1/b: (4.) 1/c.

Note that a, b, and c in equation (4) are represented by the following equation (5).

[ Math 5]

a＝sqrt((xa-xf)²+(ya-yf)²+(za-zf)²)

b＝sqrt((xb-xf)²+(yb-yf)²+(zb-zf)²)

c＝sqrt((xc-xf)²+(yc-yf)²+(zc-zf)²) ... (5)

Further, the position calculation unit 114 normalizes the degrees of dependence dp (AF), dp (BF), and dp (CF) presented in equation (4) by the calculation of the following equation (6), and obtains ndp (AF), ndp (BF), and ndp (CF) as the degrees of dependence after normalization to acquire the final contribution rate. Note that a, b, and c in equation (6) are also obtained by equation (5).

[ Math figure 6]

ndp(AF):ndp(BF):ndp(CF)＝(1/a)/t:(1/b)/t:(1/c)/t

Wherein t=1/a+1/b+1/c … (6)

The contribution rates ndp (AF) to ndp (CF) thus obtained correspond to the contribution rates dep [0] to dep [2] in the equation (3), respectively. As the distance from the listener position to the corresponding CVP decreases, each of the contribution rates of the CVPs approaches 1. Note that the method for obtaining the contribution rate of each CVP is not limited to the method described in the above example, and may be any other method.

The position calculation unit 114 calculates the contribution ratio of each CVP by obtaining the distance ratio from the listener position to each CVP based on the listener position information and the CVP position information.

In order to sum up the above points, first, the position calculation unit 114 selects a CVP for interpolation processing based on the listener position information and the CVP position information contained in the configuration information. Note that the CVPs used for interpolation processing may be some of all CVPs that are CVPs around the listening position, or all CVPs may be used for interpolation processing.

The position calculation unit 114 calculates an object three-dimensional position vector for each selected CVP based on the object position information.

For example, assuming that the object three-dimensional position vector of the jth object viewed from the ith CVPi is expressed as (obj_vector_x [ i ] [ j ], obj_vector_y [ i ] [ j ], obj_vector_z [ i ] [ j ]), the object three-dimensional position vector can be obtained by calculating the following equation (7).

Note that, here, it is assumed that the polar coordinates indicated by the object position information associated with the jth object viewed from the ith CVPi are (Azi [ i ] [ j ], ele [ i ] [ j ], rad [ i ] [ j ]).

[ Math 7]

Obj_vector_x[i][j]＝-sin(Azi[i][j])*cos(Ele[i][j])

Obj_vector_y[i][j]＝cos(Azi[i][j])*sin(Ele[i][j])

Obj_vector_z[i][j]＝rad[i][j])*sin(Ele[i][j])...(7)

Equation (7) presented above is an equation similar to equation (2) above.

Subsequently, the position calculation unit 114 performs a calculation similar to that of the above equations (4) to (6) based on the listening position information and the CVP position information of each CVPi included in the configuration information to obtain the contribution rate dp (i) of each CVPi as a weighting factor during the interpolation process. The contribution rate dp (i) is a weighting factor determined based on a ratio of distances from the listener position to the corresponding CVPi (more specifically, a reciprocal ratio of the distances).

Further, the position calculation unit 114 calculates the following equation (8) based on the object three-dimensional position vector obtained by calculating equation (7), the contribution rate dp (i) of each CVPi, and the gain obj_gain [ i ] [ j ] of the j-th object viewed from the corresponding CVPi. In this way, listener reference object position information (intp_x (j), intp_y (j), intp_z (j)) and listener reference gain intp_gain (j) of the jth object are obtained.

[ Math figure 8]

Intp_x(j)＝∑_{i＝0,numOfCvp}dp(i)*Obj_vector_x[i][j]

Intp_y(j)＝∑_{i＝0,numOfCvp}dp(i)*Obj_vector_y[i][j]

Intp_z(j)＝∑_{i＝0,numOfCvp}dp(i)*Obj_vector_z[i][j]

Intp_gain(j)＝∑_{i＝0,numOfCvp}dp(i)*Obj_gain[i][j]

Wherein

I coefficients of CVP

J: coefficients of Object

NumOfCvp number of cvps..(8)

The weighted vector sum is obtained by equation (8). Specifically, the sum of the three-dimensional position vectors of the object calculated and multiplied by the contribution rate dp (i) for each CVPi is obtained as the listener reference object position information, while the sum of the gains calculated and multiplied by the contribution rate dp (i) for each CVPi is obtained as the listener reference gain. Equation (8) presented above is an equation similar to equation (3) above.

It should be noted that the listener reference object position information obtained by the equation (8) includes absolute coordinates in an absolute coordinate system having an origin at the listener position and a direction from the listener position toward the target point TP designated as the y-axis positive direction, that is, a direction toward the median plane.

However, the rendering processing unit 115 performing rendering processing in the polar coordinate system needs listener reference object position information indicated by polar coordinates.

Therefore, the position calculation unit 114 converts the listener reference object position information (intp_x (j), intp_y (j), intp_z (j)) represented by absolute coordinates and obtained by the equation (8) into listener reference object position information (intp_azi (j), intp_ele (j), intp_rad (j)) indicated by polar coordinates by calculating the following equation (9).

[ Math figure 9]

Intp_azi(j)＝arctan(Intp_x(j)/Intp_y(j))

Intp_ele(j)＝arctan(Intp_z(j)/sqrt(Intp_x(j)*Intp_x(j)+Intp_y(j)*Intp_y(j)))

Intp_rad(j)＝sqrt(Intp_x(j)*Intp_x(j)+Intp_y(j)*Intp_y(j)+Intp_z(j)*Intp_z(j))...(9)

The position calculation unit 114 outputs the listener reference object position information (intp_azi (j), intp_ele (j), intp_rad (j)) thus obtained as final listener reference object position information to the rendering processing unit 115.

It should be noted that the listener reference object position information represented by the polar coordinates and obtained by the equation (9) includes polar coordinates in a polar coordinate system having an origin located at the listener position and a direction from the listener position toward the target point TP designated as the y-axis positive direction, that is, a direction toward the positive plane.

However, a listener actually located at the listener position does not necessarily face in a direction toward the target point TP. Therefore, in the case where the listener position information acquisition unit 111 acquires the listener orientation information, coordinate system rotation processing or the like may be further performed on the listener reference object position information represented by the polar coordinates and obtained by equation (9) to obtain final listener reference object position information.

In this case, for example, the position calculation unit 114 rotates the position of the object viewed from the listener position by a rotation angle determined based on the target point TP, the listener position information, and the listener orientation information known to the client 101 side. The rotation angle (correction amount) at this time is an angle formed by the direction from the listener position toward the target point TP and the direction (direction) of the face of the listener indicated by the listener orientation information in the free viewpoint space.

It should be noted that the target point TP in the common absolute coordinate system (free viewpoint space) may be calculated by the position calculation unit 114 based on the CVP position information and the CVP orientation information of the plurality of CVPs.

Through the above-described processing, listener reference object position information that is indicated by polar coordinates and that indicates a more accurate position of an object viewed from the listener can be finally obtained.

A specific calculation example of listener reference object position information according to the orientation of the face of the listener will be described with reference to fig. 25 and 26. Note that corresponding parts in fig. 25 and 26 are given the same reference numerals, and description of these parts will be omitted as appropriate.

For example, it is assumed that the target point TP, the respective CVPs, and the listener position LP41 are positioned as shown in fig. 25 while viewing the X-Y plane (common absolute coordinate system) of the free viewpoint space.

Note that in this example, each circle without hatching (with diagonal lines) represents a CVP. It is assumed that the angle in the vertical direction indicated by the CVP pitch information constituting the CVP orientation information is 0 degrees for each CVP. In other words, it is assumed that the free viewpoint space is substantially a two-dimensional plane. Further, the target point TP here corresponds to the position of the origin O of the common absolute coordinate system.

Further, it is assumed that a straight line connecting the target point TP and the listener position LP41 is a line LN31, a straight line indicating the orientation of the listener's face indicated by the listener orientation information is a line LN32, and a straight line passing through the listener position LP41 and parallel to the Y-axis of the common absolute coordinate system is a line LN33.

In the case where the Y-axis positive direction has an angle of 0 degrees in the horizontal direction, an angle formed in the horizontal direction and indicating the orientation of the face of the listener (i.e., an angle formed by line LN32 and line LN 33) is θcur_az. Further, in the case where the Y-axis direction positive direction has an angle of 0 degrees in the horizontal direction, an angle formed in the horizontal direction and indicating a direction toward the target point TP when viewed from the arbitrary listener position LP41 (i.e., an angle formed by the line LN31 and the line LN 33) is θtp_az.

In this case, the direction toward the median plane is a direction from any listener position LP41 toward the target point TP. Thus, the angle intp_azi (j) of the listener reference object position information in the horizontal direction of each object may be corrected by the correction amount θcor_az, which is the angle of the line LN31 and the line LN 32. Specifically, the position calculation unit 114 adds the correction amount θcor_az to the angle intp_azi (j) in the horizontal direction to obtain the angle of the final listener reference object position information in the horizontal direction.

The correction amount θcor_az can be obtained by calculating the following equation (10).

[ Math figure 10]

Correction amount θcor_az=θcur_az+θtp_ az.. (10)

Further, for example, it is assumed that when viewing a free viewpoint space (common absolute coordinate system) in a direction parallel to the X-Y plane, the target point TP and the listening position LP41 are positioned as shown in fig. 26.

It is assumed here that a straight line connecting the target point TP and the listener position LP41 is a line LN41, a straight line indicating the orientation of the listener's face indicated by the listener's orientation information is a line LN42, and a straight line passing through the listener position LP41 and parallel to the X-Y plane in the common absolute coordinate system is a line LN43.

Further, it is assumed that the Z-coordinate constituting the listener position information and indicating the listener position LP41 in the common absolute coordinate system is Rz, and the Z-coordinate indicating the target point TP in the common absolute coordinate system is TPz.

In this case, the absolute value of the angle (elevation angle) of the vertical direction of the object point TP viewed from the listener position LP41 in the free viewpoint space is the angle θtp_el formed by the line LN41 and the line LN 43.

The angle (elevation angle) indicating the direction of the face of the listener formed in the vertical direction in the free viewpoint space is an angle θcur_el formed by a line LN43 corresponding to the horizontal line and a line LN42 indicating the direction of the face of the listener. In this case, the angle θcur_el has a positive value when the listener faces upward on the horizontal line. The angle thetacu _ el has a negative value when the listener is facing down the horizontal line.

According to the present example, it is sufficient if the correction amount θcor_el of the angle formed by the line LN41 and the line LN42 is corrected with reference to the angle intp_ele (j) of the object position information in the vertical direction for the listener of each object. Specifically, the position calculation unit 114 adds the correction amount θcor_el to the angle intp_ele (j) in the vertical direction to obtain the angle of the final listener reference object position information in the vertical direction.

The correction amount θcor_el can be obtained by calculating the following equation (11).

[ Mathematics 11]

Correction amount thetacor_el

=Θtp_el- θcur_el (when rz++Tp)

= - Θtp_el- θcur_el (when Rz < Tp), (11)

Note that the above has been an example of performing vector synthesis as the interpolation processing. Alternatively, the listener reference object position information may be obtained by interpolation processing using CVPs around the listener position based on the theorem Ceva.

For example, according to the interpolation process using the Ceva theorem, the interpolation process is realized by constructing a triangle using three CVPs surrounding the listener position, and performing mapping on the triangle constructed by the object positions corresponding to the three CVPs based on the Ceva theorem.

In this case, when the listener position is located in an area outside the triangle of the CVP, it is impossible to perform interpolation processing. However, the above-described method of vector synthesis can obtain listener reference object position information even in the case where the listener position is located outside the area surrounded by the CVP. Furthermore, the method of vector synthesis can easily obtain listener reference object position information using a smaller processing volume.

< Second embodiment >

< CVP group >

Meanwhile, for example, the use of the present technology allows the position of a listener to be freely moved within a space enclosed by a building or the like in a live performance venue while reproducing a sound field from a viewpoint intended by a creator for the listener. Further, considering the case where the listener moves out of the live performance venue, a considerable difference may be generated between the sound field outside the live performance venue and the sound field inside the live performance venue. Thus, many sounds generated inside the live performance venue should not be heard outside the live performance venue.

However, according to the method of the first embodiment described above, even when sounds outside the live performance site are set, sounds that are generated within the live performance site and that are not originally intended to be mixed can be heard during reproduction of the sound field at any position outside the live performance site due to the effect of the combined pattern of the object position information inside the live performance site.

Thus, for example, three areas may be provided, including an area inside the live performance venue, an area outside the live performance venue, and a transition area from inside the live performance venue to outside the live performance venue, and the CVP to be used may be separated for each area. In this case, an area where the listener currently exists is selected according to the position of the listener, and the listener reference object position information is obtained using only the CVPs belonging to the selected area. Note that the number of areas to be divided may be set to any number on the creator side or set according to each site of the live performance.

In this way, free viewpoint content audio reproduction can be achieved using appropriate listener reference object position information and listener reference gains while avoiding mixing of sounds between the inside and outside of a live performance venue.

Note that several methods are considered to define a region as a division of free view space. As a general example, a method of using concentric circles, polygons, or the like from a predetermined center coordinate may be employed. Further, for example, any number of small regions having various other shapes may be provided.

A specific example of dividing the free view space into a plurality of areas and selecting a CVP for interpolation processing will be described below.

For example, it is assumed that the free viewpoint space is divided into three regions of a group region R11 to a group region R13 as shown in fig. 27. Note that each small circle in fig. 27 represents a CVP.

The group region R11 is a circular region (space), the group region R12 is an annular region surrounding the outside of the group region R11, and the group region R13 is an annular region surrounding the outside of the group region R12.

In this example, it is assumed that the group region R12 is a region in a transition portion between the group region R11 and the group region R13. Thus, for example, an area (space) inside the live performance venue may be designated as a group area R11, an area outside the live performance venue may be designated as a group area R13, and an area between the inside and outside of the live performance venue may be designated as a group area R12. Note that the respective sets of regions are set in such a manner that overlapping portions (regions) are not generated.

In this example, CVPs used for interpolation processing are divided into groups according to the position of a listener. In other words, the creator groups the CVPs in correspondence with the group areas by specifying the ranges of the group areas.

For example, the grouping is implemented as follows: each of the CVPs located within the free view space belongs to at least any one of the CVP group GP1 corresponding to the group region R11, the CVP group GP2 corresponding to the group region R12, and the CVP group GP3 corresponding to the group region R13. In this case, one CVP may belong to a plurality of different CVP groups.

Specifically, the grouping is achieved as follows: the CVP located in the group region R11 belongs to the CVP group GP1, the CVP located in the group region R12 belongs to the CVP group GP2, and the CVP located in the group region R13 belongs to the CVP group GP3.

Thus, for example, a CVP located at a position P61 within the group region R11 belongs to the CVP group GP1, and a CVP located at a position P62 that is a boundary position between the group region R11 and the group region R12 belongs to both the CVP group GP1 and the CVP group GP 2.

Further, the CVP located at the position P63 belongs to the CVP group GP2 and the CVP group GP3, where the position P63 is the boundary position between the group region R12 and the group region R13, and the CVP located at the position P64 within the group region R13 belongs to the CVP group GP3.

For example, if such grouping is applied to a live performance place as the free viewpoint space described in fig. 11, the state described in fig. 28 is generated.

In this example, for example, each of CVPs 1 to 7 indicated by black circles in the figure is contained in a group area (group space) corresponding to the inside of a live performance venue, and each of CVPs indicated by white circles is contained in a group area corresponding to the outside of the live performance venue.

Note that, for example, a specific CVP inside a live performance venue and a specific CVP outside the live performance venue may be linked (associated with each other) to each other in configuration information. In this case, when the listener is located between two CVPs, for example, the listener reference object position information can be obtained by vector synthesis using the two CVPs. Further, in this case, the gain of the predetermined object to be muted may be set to 0.

The example of fig. 28 will be described in further detail with reference to fig. 29 and 30. Note that corresponding parts in fig. 29 and 30 are given the same reference numerals, and description of these parts will be omitted where appropriate.

For example, as shown in fig. 29, it is assumed that a circular area around the center at the origin O of the common absolute coordinate system in the free viewpoint space is designated as an area inside the live performance place.

Specifically, herein, a circular region R31 drawn by a broken line corresponds to a region inside the live performance site, and a region outside the region R31 corresponds to a region outside the live performance site.

In addition, CVPs 1 through 15 are located inside the live performance venue, while CVPs 16 through 23 are located outside the live performance venue.

In this case, for example, an area centered within a circle of the origin O and the radius area 1_edge is designated as one group area R41, and a CVP group including the CVPs 1 to 15 included in the group area R41 is designated as a CVP group GPI corresponding to the group area R41. The group region R41 is a region inside the live performance place.

Further, as shown in the right part of the figure, a region between the boundary of a circle having the center at the origin O and having the radius region 1_edge and the boundary of a circle having the center at the origin O and having the radius region 2_edge is designated as a group region R42. The group region R42 is a transition region between the inside and the outside of the live performance venue.

The group including CVPs of CVPs 8 to 23 and included in the group region R42 is designated as a CVP group GPM corresponding to the group region R42.

Specifically, in this example, CVPs 8 to 15 are located at the boundary between the group region R41 and the group region R42. Thus, CVPs 8 to 15 belong to both CVP group GPI and CVP group GPM.

Further, as shown in fig. 30, a region which is arranged outside a circle centered on the origin O and the radius region 2_edge and which contains the boundary of the circle is designated as a group region R43. The group region R43 is a region outside the live performance venue.

The group including the CVPs 16 to 23 and the CVPs contained in the group region R43 is designated as a CVP group GPO corresponding to the group region R43. In particular in this example, the CVPs 16 to 23 are located at the boundary between the group region R42 and the group region R43. Thus, CVPs 16 to 23 belong to both CVP group GPM and CVP group GPO.

In the case of defining the group area and the CVP group as described above, the position calculation unit 114 performs interpolation processing in the following manner to obtain listener reference object position information and listener reference gain.

Specifically, as shown in the left part of fig. 29, for example, when the listener position is within the group region R41, the position calculation unit 114 performs interpolation processing using some or all of the CVPs 1 to 15 belonging to the CVP group GPI.

Further, as described in the right part of fig. 29, for example, when the listener position is within the group region R42, the position calculation unit 114 performs interpolation processing using some or all of the CVPs 8 to 23 belonging to the CVP group GPM.

Further, as shown in fig. 30, for example, when the listener position is within the group region R43, the position calculation unit 114 performs interpolation processing using some or all of the CVPs 16 to 23 belonging to the CVP group GPO.

Note that the above is an example of defining group areas each having a concentric shape. However, for example, regions R71 and R72 having center positions different from each other and each having a circular shape and including transition regions overlapping each other may be defined as shown in fig. 31.

In this example, CVPs 1 to 7 are contained in the region R71, and CVPs 5, 6, and 8 to 12 are contained in the region R72. Further, the CVP5 and the CVP6 are included in a transition region, which is a region where the region R71 and the region R72 overlap each other.

Here, the region other than the transition region included in the region R71, the region other than the transition region included in the region R72, and the transition region are designated as group regions, respectively.

In this case, for example, when the listener position is within the region included in the region R71 but outside the transition region, interpolation processing is performed using some or all of the CVPs 1 to 7.

Further, when the listener position is located within the transition region, interpolation processing is performed using, for example, CVP5 and CVP 6. Further, when the listener position is within the region included in the region R72 but outside the transition region, interpolation processing is performed using some or all of the CVPs 5, 6, and 8 to 12.

As described above, in the case where the creator is allowed to specify a group area (i.e., CVP group), the configuration information has a format presented in fig. 32, for example.

In the example shown in fig. 32, a format substantially similar to that shown in fig. 7 is adopted. The configuration information includes a frame length coefficient "FrameLengthIndex", object number information "NumOfObjects", CVP number information "NumOfControlViewpoints", metadata number setting information "NumOfObjectMetaSets", CVP information "ControlViewpointInfo (i)", and coordinate mode information "CoordinateMode [ i ] [ j ]".

Also, the configuration information presented in fig. 32 further includes a CVP group information present flag "CVP _group_present".

The CVP group information present flag "CVP _group_present" is flag information indicating whether CVP group information "CvpGrouppInfo2D ()" as information related to a CVP group is included in configuration information.

For example, in the case where the CVP group information present flag has a value of "1", the CVP group information "CvpGroupInfo D ()" stores configuration information. In the case where the CVP group information present flag has a value of "0", the CVP group information "CvpGroupInfo D ()" is not stored in the configuration information.

Further, for example, CVP group information "CvpGroupInfo D ()" included in the configuration information has the format presented in fig. 33. Note that the free viewpoint space is described here as an example of a case of a two-dimensional region (space) for simplifying the description. Needless to say, however, the CVP group information presented in fig. 33 may be further applied to a case where the free viewpoint space is a three-dimensional area (space).

In this example, "numOfCVPGroup" represents the number of CVP groups, i.e., CVP group count. The CVP group information stores therein information of the same number of pieces as the number of CVP groups associated with the CVP group described below.

"Vertex_idx" represents a top count coefficient. The vertex number coefficient is coefficient information indicating the number of vertices of a group area corresponding to the CVP group.

For example, in the case where the value range of the number of vertices is 0to 5, the group area is identified as a polygon area having the number of vertices calculated by adding 3 to the value of the number of vertices coefficient. Further, in the case where the value of the vertex number coefficient is 225, for example, the group area is identified as a circular area.

In the case where the value of the vertex number coefficient is 255, that is, in the case where the shape type of the group area is circular, the normalized X coordinate "center_x [ i ]", the normalized Y coordinate "center_y [ i ]" and the normalized radius "radius [ i ]" are stored in the CVP group information as information for identifying the group area (boundary of the group area) having a circular shape.

For example, the normalized X-coordinate "center_x [ i ]" and the normalized Y-coordinate "center_y [ i ]" are information items indicating the X-coordinate and the Y-coordinate, respectively, of the center of a circle corresponding to a group region in a common absolute coordinate system (free viewpoint space), and the normalized radius "radius [ i ]" is the radius of a circle corresponding to the group region. In this way, it is possible to identify which region corresponds to a group region in the free viewpoint space.

Further, in the case where the value of the vertex number coefficient is any one of values in the range from 0 to 5, that is, in the case where the group area is a polygonal area, the normalized X coordinate "sender_pos_x [ j ] and the normalized Y coordinate" sender_pos_y [ j ] are stored in the CVP group information for each vertex of the group area.

For example, the normalized X-coordinate "boundary_pos_x [ j ]" and the normalized Y-coordinate "boundary_pos_y [ j ]" are information items indicating the X-coordinate and Y-coordinate of the jth vertex of the polygonal region as a group region in a common absolute coordinate system (free viewpoint space), respectively.

A polygon area that is a group area in the free viewpoint space may be determined based on the normalized X-coordinate and the normalized Y-coordinate of each vertex described above.

Further, intra-group CVP information "numOfCVP _ ingroup [ i ]" indicating the number of CVPs belonging to the CVP group, and further, intra-group CVP coefficients "CvpIndex _ ingroup [ i ] [ j ]" the same number as that indicated by the intra-group CVP information are stored in the CVP group information. The intra-group CVP coefficient "CvpIndex _ ingroup [ i ] [ j ]" is coefficient information for identifying the jth CVP belonging to the ith CVP group.

For example, the value of the intra-group CVP coefficient indicating the predetermined CVP may be equal to the value of the CVP coefficient contained in the CVP information and indicating the predetermined CVP.

As described above, the CVP group information includes the number of CVP groups, the number of vertices coefficient indicating the shape type of the group area, information for identifying the group area, the intra-group number CVP information, and the intra-group CVP coefficient. Specifically, the information for identifying the group area is regarded as information for identifying the boundary of the group area.

It should be noted that in the case of generating the configuration information of the format presented in fig. 32, the information processing apparatus 11 basically performs the content creation process described with reference to fig. 14 in a similar manner.

In this case, however, the creator performs an operation for specifying a group area and CVPs belonging to the CVP group at any timing (for example, in step S11 and step S16).

In this case, the control unit 26 determines (sets) a group area and CVPs belonging to the CVP group according to an operation performed by the creator. Thereafter, in step S23, the control unit 26 generates configuration information of the CVP group information presented in fig. 32 and contained in fig. 33 as needed based on the set result of the group area and the CVPs belonging to the CVP group.

< Description of reproduction Audio data Generation Process >

Further, in the case where the configuration information has the format presented in fig. 32, for example, the client 101 performs the reproduced audio data generating process presented in fig. 34.

The reproduced audio data generating process performed by the client 101 will be described hereinafter with reference to a flowchart presented in fig. 34.

Note that the processing from step S121 to step S123 is similar to the processing from step S81 to step S83 in fig. 18. Therefore, a description of the process is omitted.

In step S124, the position calculation unit 114 identifies a CVP group corresponding to a group area including the listener position based on the listener position information and the configuration information.

For example, the position calculation unit 114 identifies group areas (hereinafter also referred to as target group areas) containing listening positions based on normalized X coordinates and normalized Y coordinates, each of which corresponds to information in the CVP group information contained in the configuration information and is provided for identifying the area corresponding to each group area.

Further, when the listener position is located at the boundary position between the plurality of group areas, the plurality of group areas are designated as target group areas.

When the target group area is identified in this way, the identification of the CVP group corresponding to the target group area is regarded as completed.

In step S125, the position calculation unit 114 designates each CVP belonging to the identified CVP group as a target CVP, and acquires an object metadata set associated with the target CVP.

For example, the position calculation unit 114 reads intra-group CVP coefficients of the CVP group corresponding to the target group area from the CVP group information to identify CVPs belonging to the CVP group, that is, target CVPs.

Further, the position calculation unit 114 reads the metadata set coefficients associated with each target CVP from the CVP information to identify an object metadata set corresponding to the target CVP, and reads the identified object metadata set.

After the process of step S125 is completed, the processes of step S126 to step S128 are performed. Thereafter, the reproduction audio data generation process ends. The process is similar to the process from step S84 to step S86 in fig. 18. Therefore, a description of the process is omitted.

However, in step S126, interpolation processing is performed using some or all of the target CVPs identified in step S124 and step S125. Specifically, listener reference object position information and listener reference gain are calculated using the CVP position information and object position information associated with the target CVP.

In this way, reproduced audio data for reproducing an appropriate sound field can be obtained according to the position of the listener, such as a case where the listener exists inside the live performance venue and a case where the listener exists outside the live performance venue.

In the above manner, the client 101 performs interpolation processing using an appropriate CVP based on the listener position information, the configuration information, and the object metadata set, and calculates the listener reference gain and the listener reference object position information at the listener position.

In this way, content reproduction having musical properties can be achieved according to the intention of the content creator, and thus, the interest of the content can be sufficiently conveyed to the listener.

< Third embodiment >

< Object position information and gain interpolation processing >

Meanwhile, there are a plurality of CVPs set in advance by the content creator in the free view space. Specific examples of using the reciprocal ratio of the distance from any current position of the listener (listener position) to each CVP for interpolation processing for obtaining the listener reference object position information and the listener reference gain have been described above.

In this example, for example, it is assumed that a large value of the object gain is set for a CVP located at a long distance from the listener position.

In this case, even in a state in which the CVP is initially located at a distant position from the listener position, it may not be possible to reduce the auditory effect generated by the gain of the object of the CVP located at a distant position from the listener position and applied to the listener reference gain (i.e., the sound of the object heard by the listener). In this case, the audio image movement of the sound of the object presented to the listener thus becomes unnatural, and therefore, the sound quality of the content deteriorates.

Hereinafter, a case where an unnatural audio image moves due to the positional relationship between the listener position and the CVP will also be referred to as case a.

Further, assuming that the gain of an object at a specific CVP is set to 0, the content creator may not know the object position of the object, and thus, may remain ignored. Specifically, the content creator may ignore the object whose gain is set to 0, without setting the object position. As a result, the object position information may be set to an inappropriate value.

However, such ignored object position information is also used for interpolation processing for obtaining listener reference object position information. In this case, the position of the object viewed from the listener may be located at a position unexpected by the content creator under the influence of the ignored and inappropriate object position information.

A case where the position of the object viewed from the listener and indicated by the listener reference object position information is located at a position where the content creator does not intend under the influence of the ignored object position is also referred to as a case B hereinafter.

By reducing the occurrence of the above-described case a and case B, higher quality content reproduction based on the intention of the content creator can be achieved.

Thus, the third embodiment aims to reduce the occurrence of the above-described case a and case B.

For example, for the case a where the listener reference gain is affected by a large object gain at a CVP located a distance from the current listener position, sensitivity coefficients for raising the distance to the nth power for sensitivity adjustment are applied using all CVPs.

In this way, weighting of the degree of dependence (contribution rate) of the corresponding CVP of the interpolation process for obtaining the listener reference gain is achieved. The method of reducing the occurrence of the case a by applying the sensitivity coefficient will hereinafter also be specifically referred to as method SLA1.

By appropriately controlling the sensitivity coefficient, the degree of influence of the CVP far from the current listener position can be further reduced. Thus, occurrence of the case a can be reduced. In this way, for example, when the listener moves between CVPs, a reduction in unnatural gain fluctuations can be achieved.

The value of the sensitivity coefficient, i.e., the value of N, is a Float value or the like. The value of the sensitivity coefficient of each CVP may be written with configuration information as a default value corresponding to the intention of the content creator and transmitted to the client 101, or the value of the sensitivity coefficient may be set on the listener side.

Furthermore, the sensitivity coefficient of each CVP may be a value common to all objects. Alternatively, the sensitivity coefficient may be set individually for each object according to the intention of the content creator for each CVP. Furthermore, a sensitivity coefficient common to all subjects or a sensitivity coefficient of each subject may be set for each group including one or more CVPs.

On the other hand, for the case B in which listener reference object position information that does not satisfy the intention of the content creator is calculated by adding object position information associated with an object that is ignored (such as an object having a gain of 0) to the elements of the vector sum in the interpolation process, the gain is added to the contribution item, or only objects each having a gain greater than 0 are used. Specifically, the method SLB1 or the method SLB2 described below is applied to reduce the occurrence of the case B.

According to the method SLB1, an object having a gain of a predetermined threshold value or less at the CVP is regarded as an object having a gain of 0 (hereinafter also referred to as a mute object). Further, object position information at the CVP for an object regarded as a mute object is not used for interpolation processing. In other words, the CVP is excluded from the target CVP for interpolation processing.

The method SLB2 uses a mute flag indicating whether the gain of an object specified by the content creator or the like is 0, i.e., whether the object is a mute object.

Specifically, the object position information associated with the CVP corresponding to the mute object based on the mute flag is not used for the interpolation process. In other words, the CVP corresponding to the object identified in advance as the unused object is excluded from the target CVPs for interpolation processing.

According to the above-described methods SLB1 and SLB2, by excluding the CVP of the object regarded as the ignored object having the gain 0 from the processing objects, it is possible to realize appropriate interpolation processing using only the CVP corresponding to the object not regarded as the object having the gain 0.

Specifically, the method SLB2 can eliminate the necessity of processing for checking whether the gain of each object is regarded as 0 for all CVPs, as is performed for each frame by the method SLB 1. Therefore, the processing load can be further reduced.

< CVP locate mode PTT1>

Next, an example of an actual CVP positioning mode in which the above case a or case B occurs will be described.

First, fig. 35 describes a first CVP positioning mode (hereinafter also referred to as CVP positioning mode PTT 1). In this example, the objects are located on the front side with respect to each of the CVPs.

In fig. 35, each circle of a given value represents one CVP. In particular, the numerical value given to the inside of the circle representing the corresponding CVP indicates what number the corresponding CVP is. Hereinafter, it is assumed that the kth CVP, which gives the value k (k: 1,2, and up to 6), will be referred to as CVPk in particular.

It is assumed that this example focuses on a certain object OBJ71 existing in the free viewpoint space.

For example, for each of CVPs 1 to 6 located in the free view space, object position information and gain associated with the object OBJ71 as viewed from the corresponding CVP are determined.

It is considered here that the listener reference object position information and the listener reference gain associated with the predetermined listener position LP71 are obtained by performing interpolation processing using the above equations (7) to (11) based on the object position information and the gains at the CVPs 1 to 6.

In this case, for example, if the gain of the object OBJ71 has the same value other than 0 for each of the CVPs 1 to 6, the above-described case a and case B do not occur.

On the other hand, for example, if the gain of the object OBJ71 at each of the CVPs 1 to 3 is larger than the gain of the object OBJ71 at each of the CVPs 5 and 6, case a may occur.

This is because even in a state where each ratio of the CVP1 to the CVP3 is small (i.e., the contribution rate dp (i) obtained by the calculation similar to the calculation of equations (4) to (6) is small due to the long (far) distance from the listener position LP71 to each of the CVP1 to the CVP 3), the original large gain significantly affects the listener reference gain.

Further, for example, the object OBJ71 is a mute object at the CVP 6. However, case B occurs when the angular orientation in the horizontal direction as the object position information is clearly shown with the angular azimuth (such as, -180 degrees) at other CVPs. This is because the effect of the object position information at the CVP6 closest to the listener position LP71 increases when the listener reference object position information is calculated.

Fig. 36 describes herein an example of positioning of a CVP or the like in a common absolute coordinate system when the positional relationship between the listener and the CVP has the relationship (CVP positioning mode PTT 1) presented in fig. 35 in the free viewpoint space of a substantially two-dimensional plane. Note that components in fig. 36 that are similar to corresponding components in fig. 35 are given the same reference numerals, and descriptions of these components will be omitted where appropriate.

In fig. 36, the horizontal axis and the vertical axis represent the X-axis and the Y-axis in a common absolute coordinate system, respectively. Further, it is assumed that the position (coordinates) in the common absolute coordinate system is represented as (x, y), for example, the listener position LP71 is represented as (0, -0.8).

Fig. 37 presents an example of the object position and gain of the object OBJ71 at each CVP and an example of the listener reference object position information and the listener reference gain of the object OBJ71 at each CVP in such positioning in the common absolute coordinate system when the case a or the case B occurs. Note that the contribution rate dp (i) in the example of fig. 37 is obtained by calculation similar to that of the above equations (4) to (6).

In fig. 37, an example of the object position and gain of the object OBJ71 at the time of occurrence of case a is presented in the column of the character "case a".

Specifically, "azi (0)" represents an angular azimuth as object position information associated with the object OBJ71, and "Gain (0)" represents a Gain of the object OBJ 71.

In this example, the Gain (0)) at each of CVPs 1 to 3 is "1", and the Gain (0)) at each of CVPs 5 and 6 is "0.2". In this case, the gain at each of the CVPs 1 to 3 distant from the listener position LP71 is larger than the gain at each of the CVPs 5 and 6 close to the listener position LP 71. Therefore, the listener reference Gain (0)) at the listener position LP71 is "0.37501".

In this example, the listener position LP71 is located between CVP5 and CVP6, each corresponding to a gain of 0.2. Thus, at each of CVP5 and CVP6, the listener reference gain desirably has a value close to gain "0.2". However, in actual cases, the listener reference gain has a value "0.37501" greater than "0.2" due to the influence of CVPs 1 to 3 each having a large gain.

Further, an example of the object position and gain of the object OBJ71 at the time of occurrence of case B is presented in the column of the character "case B".

Specifically, "azi (1)" represents an angular azimuth as object position information associated with the object OBJ71, and "Gain (1)" represents a Gain of the object OBJ 71.

In this example, the Gain (1)) and the angular azimuth (azi (1)) at each of the CVPs 1 to 5 are "1" and "0", respectively. Thus, the object OBJ71 is not a mute object at each of the CVPs 1 to 5.

On the other hand, the Gain (1)) and the azimuth angle (azi (1)) at the CVP6 are "0" and "120", respectively. Thus, the object OBJ71 is a mute object at the CVP 6.

Further, the angular azimuth (azi (1)) as listener reference object position information at the listener position LP71 is "67.87193".

In this example, the Gain at CVP6 (Gain (1)) is "0". Therefore, the angular azimuth (azi (1)) "120" at CVP6 needs to be ignored. In actual case, however, the angular azimuth at the CVP6 is used to calculate listener reference object position information. Therefore, the angular azimuth (azi (1)) at the listener position LP71 becomes "67.87193", which is significantly larger than "0".

< CVP locate mode PTT2>

Subsequently, fig. 38 describes a second CVP positioning mode (hereinafter also referred to as CVP positioning mode PTT 2). In this example, the respective CVP is located at a location such as around the object.

As in the case of fig. 35, a circle of a given value represents one CVP in fig. 38. It is assumed that the kth CVP given the value k (k: 1, 2, and up to 8) is also called CVPk.

The example considered here focuses on one object OBJ81, and interpolation processing for the listener position LP81 is performed using the above equations (7) to (11) based on the object position information and gains at the CVPs 1 to 8.

In this case, for example, if the gain of the object OBJ81 has the same value other than 0 for each of the CVPs 1 to 8, the above-described case a and case B do not occur.

On the other hand, if the gain of object OBJ81 at each of CVP1, CVP2, CVP6, and CVP8 is greater than the gain of object OBJ81 at each of CVP3 and CVP4, for example, case a may occur.

This is because the original large gain significantly affects the listener reference gain even in a state where each ratio of the CVP1, the CVP2, the CVP6, and the CVP8 is small due to the long (distant) distance from the listener position LP81 to the CVP1, the CVP2, the CVP6, and the CVP 8.

Further, for example, the object OBJ81 is a mute object at the CVP 3. However, case B occurs when the angular azimuth in the horizontal direction as the object position information is obviously shown with the angular azimuth at other CVPs (such as, -180 degrees). This is because the effect of the object position information at the CVP3 closest to the listener position LP81 increases at the time of calculation of the listener reference object position information.

Fig. 39 herein describes an example of positioning of a CVP or the like in a common absolute coordinate system when the positional relationship between the listener and the CVP has the relationship (CVP positioning mode PTT 2) presented in fig. 38 in the free viewpoint space of a substantially two-dimensional plane. Note that, in fig. 39, components similar to the corresponding components in fig. 38 have the same reference numerals, and descriptions of these components are omitted where appropriate.

In fig. 39, the horizontal axis and the vertical axis represent the X-axis and the Y-axis in the common absolute coordinate system, respectively. Further, for example, assuming that the position (coordinates) in the common absolute coordinate system is represented as (x, y), the listener position LP81 is represented as (-0.1768,0.176777).

Fig. 40 gives an example of the object position and gain of the object OBJ81 at each CVP and an example of the listener reference object position information and listener reference gain of the object OBJ81 at each CVP in such positioning in the common absolute coordinate system when the case a or the case B occurs. Note that the contribution rate dp (i) in the example of fig. 40 is obtained by calculation similar to that of the above equations (4) to (6).

In fig. 40, in the column of the character "case a", an example of the object position and gain of the object OBJ81 at the time of occurrence of case a is presented.

Specifically, "azi (0)" represents an angular azimuth as object position information associated with the object OBJ81, and "Gain (0)" represents a Gain of the object OBJ 81.

In this example, the Gain (0)) at each of CVP1, CVP2, CVP6, CVP8 is "1", and the Gain (0)) at each of CVP3 and CVP4 is "0.2". Therefore, the listener reference Gain (0)) at the listener position LP81 is "0.501194".

In this example, the listener position LP81 is located between CVP3 and CVP4, each corresponding to a gain of 0.2. Thus, at each of CVP3 and CVP4, the listener reference gain desirably has a value close to gain "0.2". However, in actual cases, the listener reference gain has a value "0.501194" greater than "0.2" due to the influence of CVP1, CVP2, and the like, each having a large gain.

Also, in the column of the character "case B", an example of the object position and gain of the object OBJ81 at the time of occurrence of case B is presented.

Specifically, "azi (1)" represents an angular azimuth as object position information associated with the object OBJ81, and "Gain (1)" represents a Gain of the object OBJ 81.

In this example, the Gain (1)) and the azimuth angle (azi (1)) at each CVP other than CVP3 are "1" and "0", respectively. Thus, the object OBJ81 is not a mute object at each CVP other than the CVP 3.

On the other hand, the Gain (1)) and the azimuth angle (azi (1)) at the CVP3 are "0" and "120", respectively. Thus, the object OBJ81 is a mute object at the CVP 3.

Further, the angular azimuth (azi (1)) as listener reference object position information at the listener position LP81 is "20.05743".

In this example, the Gain at CVP3 (Gain (1)) is "0". Therefore, the angular azimuth (azi (1)) "120" at CVP3 needs to be ignored. In actual case, however, the angular azimuth at CVP3 is used for calculation of listener reference object position information. Therefore, the angular azimuth (azi (1)) at the listener position LP81 becomes "20.05743", which is significantly larger than "0".

According to the present embodiment, the occurrence of the above-described case a and case B is reduced by the method SLA1, the method SLB1, and the method SLB 2.

In the method SLA1, the sensitivity coefficient is set to N, and the contribution rate dp (i) is obtained based on the inverse of a value obtained by raising the distance from the listener position to the CVP to the nth power. At this time, for example, the content creator may designate any positive real number as the sensitivity coefficient and store the sensitivity coefficient in the configuration information, or in a case where the listener or the like is allowed to change the sensitivity coefficient, the client 101 side may set the sensitivity coefficient.

Further, in the method SLB1, for each object in each frame, it is determined whether the gain of the object at the CVP is 0 or a value regarded as 0. Thereafter, the CVP corresponding to the object having the gain of 0 or the value regarded as 0 is excluded from use in the interpolation process. In other words, the CVP is excluded from the calculation targets of the vector sums in the interpolation process.

In the method SLB2, a mute flag is stored in the configuration information for each CVP, the mute flag signaling whether an object is a mute object for each object. Thereafter, the CVP corresponding to the object of Mute-flag 1, that is, the CVP corresponding to the Mute object is excluded from the calculation targets of the vector sum in the interpolation process.

For example, each of the above-described methods SLB1 and SLB2 selects a CVP for interpolation processing as shown in fig. 41.

Specifically, in the case where the method SLB1 or SLB2 is not applied, for example, all of the CVPs 1 to CVP4 located around the listener position LP91 are used for interpolation processing for obtaining listener reference object position information associated with the listener position LP91, as described in the left part of the figure. In other words, interpolation processing is performed using the object position information associated with each of the CVPs 1 to 4.

On the other hand, in the case of applying the method SLB1 or SLB2, as depicted in the right part of the figure, the CVP4 corresponding to the mute object is excluded from the targets of the interpolation processing. Specifically, the object position information associated with three CVPs among the CVPs 1 to 3 other than the CVP4 is used for interpolation processing for obtaining listener reference object position information associated with the listener position LP 91.

In the case where the method SLA1 and the method SLB1 or SLB2 are simultaneously executed, the results presented in fig. 42 and 43 are obtained as the results of the interpolation processing presented in the example depicted in fig. 37 and 40. Note that, descriptions of the portions in fig. 42 and 43 that are the same as the corresponding portions in fig. 47 and 40 will be omitted where appropriate.

Fig. 42 presents an example of a case where the method SLA1 and the method SLB1 or SLB2 are applied to the CVP locate mode PTT1 depicted in fig. 36 and 37.

In fig. 42, for each of "case a" and "case B", the angular azimuth and gain at each CVP are presented in the portion indicated by the arrow Q71.

Further, in the case where the value of the sensitivity coefficient is changed for "case a" and "case B", the angular azimuth and the listener reference gain as the listener reference object position information are presented in the portion indicated by the arrow Q72. As is apparent from this figure, the occurrence of case a and case B is reduced.

For example, in the case where the value of the sensitivity coefficient is set to "3" for the "case a", that is, when a portion of the column exhibiting "1/distance ratio cube" is noted, it can be understood from the figure that the listener reference Gain (0)) at the listener position LP71 is "0.205033".

In this example, it is apparent that, due to the application of the method SLA1, the effect of CVP1 to CVP3, which are each located away from the listener position LP71, is significantly reduced, and the listener reference gain becomes an ideal value close to the gains "0.2" at the CVP5 and the CVP6 located near the listener position LP 71.

Specifically, the listener reference gain at the listener position LP71 located between the CVP5 and the CVP6 has a value close to the gain at each of the CVP5 and the CVP 6. Thus, occurrence of unnatural audio image movement is reduced.

Further, when "case B" is noted, the angular azimuth (azi (1)) as listener reference object position information at the listener position LP71 is "0", regardless of the value of the sensitivity coefficient.

According to the present example, by applying the method SLB1 or the method SLB2, the value "120" of the angular azimuth (azi (1)) at the CVP6 corresponding to the gain "0" is excluded from the interpolation processing. In other words, the angular azimuth "120" at the CVP6 is excluded from the targets of the interpolation process.

Therefore, the angular azimuth (1)) at the listener position LP71 has the same value "0" as the values of the angular azimuths at all CVPs that are not excluded from the target. It is thus apparent that appropriate listener reference object position information has been obtained.

Fig. 43 presents an example of a case where the method SLA1 and the method SLB1 or SLB2 are applied to the CVP locate mode PTT2 depicted in fig. 39 and 40.

In fig. 43, for each of "case a" and "case B", the angular azimuth and gain at each CVP are presented in the portion indicated by the arrow Q81.

Further, in the case where the value of the sensitivity coefficient is changed for "case a" and "case B", the angular azimuth and the listener reference gain as the listener reference object position information are presented in the portion indicated by the arrow Q82. As is apparent from this figure, the occurrence of case a and case B is reduced.

For example, when focusing on the case where the value of the sensitivity coefficient for the "case a" is set to "3", it can be understood from the figure that the listener reference Gain (0)) at the listener position LP81 is "0.25492".

In this example, it is apparent that, due to the application of the method SLA1, the effects of CVP1, CVP2, CVP6, and CVP8, which are each located away from the listener position LP81, are significantly reduced, and the listener reference gain becomes an ideal value close to the gain "0.2" at the CVP3 and CVP4 located near the listener position LP 81.

In particular, it is apparent that the value of the listener reference gain at the listener position LP81 located between the CVP3 and the CVP4 approximates the gain of each of the CVP3 and the CVP4 under the sensitivity coefficient control. Thus, occurrence of unnatural audio image movement is reduced.

Further, when "case B" is focused on, the angular azimuth (azi (1)) as listener reference object position information at the listener position LP81 is "0", regardless of the value of the sensitivity coefficient.

According to the present example, by applying the method SLB1 or the method SLB2, the value "120" of the angular azimuth (azi (1)) at the CVP3 corresponding to the gain "0" is excluded from the interpolation processing. In other words, the angular azimuth "120" at CVP3 is excluded from the targets of the interpolation process.

Therefore, the angular azimuth (azi (1)) at the listener position LP81 has the same value "0" as the values of the angular azi at all CVPs that are not excluded from the target. It is thus apparent that appropriate listener reference object position information has been obtained.

< Format example of configuration information >

Further, in the case of applying the method SLB2, for example, the configuration (information) presented in fig. 44 is stored in the configuration information.

Note that fig. 44 presents a format (syntax) example of a part of the configuration information in the case of applying the method SLB 2.

More specifically, the configuration information includes the configuration presented in fig. 7 in addition to the configuration presented in fig. 44. In other words, the configuration information contains the configuration presented in fig. 44 in a portion of the configuration presented in fig. 7. Alternatively, a portion of the configuration information presented in FIG. 32 may contain the configuration presented in FIG. 44.

According to the example presented in fig. 44, "NumOfControlViewPoints" represents CVP number information, i.e., the CVP number set by the creator, and "numOfObjs" represents the object number.

The configuration information of each CVP stores mute flags "MuteObjIdx [ i ] [ j ]", corresponding to the number of combinations of CVPs and objects, which is the same as the number of objects.

The mute flag "MuteObjIdx [ i ] [ j ]" is flag information indicating whether or not the jth object is regarded as a mute object (whether or not the jth object is a mute object) when viewed from the ith CVP, i.e., when the listener position (viewpoint position) is located at the ith CVP. Specifically, a value of "0" of the mute flag "MuteObjIdx [ i ] [ j ]" indicates that the object is not a mute object, and a value of "1" of the mute flag "MuteObjIdx [ i ] [ j ]" indicates that the object is a mute object, i.e., the object is in a mute state.

Note that described herein is an example of storing a mute flag in configuration information as mute information for identifying an object designated as a mute object at the CVP. However, this example need not be employed. For example, "MuteObjIdx [ i ] [ j ]" may be coefficient information indicating an object designated as a mute object.

In this case, "MuteObjIdx [ i ] [ j ]" for all objects need not be stored in the configuration information. In contrast, "MuteObjIdx [ i ] [ j ]" which is an object designated as a mute object only needs to be stored in the configuration information. Also in this example, the client 101 side can specify with reference to "MuteObjIdx [ i ] [ j ]" whether or not each of the objects is specified as a mute object at the CVP.

< Description of contribution coefficient calculation processing >

Next, operations performed by the information processing apparatus 11 and the client 101 in the case where the method SLA1 and the method SLB1 or SLB2 are applied will be described.

For example, in the case of applying the method SLB2, the information processing apparatus 11 performs the content creation process described with reference to fig. 14.

However, in this case, the control unit 26 accepts a designation operation for determining whether or not to designate an object at the CVP as a mute object, for example, at any timing, and generates configuration information containing a mute flag indicating a value corresponding to the designation operation in step S23.

Further, in the case where the sensitivity coefficient is stored in the configuration information, for example, the control unit 26 accepts a designation operation for designating the sensitivity coefficient at any timing, and generates configuration information containing the sensitivity coefficient designated by the designation operation in step S23.

Further, in the case where the method SLA1 and the method SLB1 or SLB2 are applied, the client 101 basically performs the reproduced audio data generating process described with reference to fig. 18 or fig. 34. However, in step S84 in fig. 18, or in step S126 in fig. 34, interpolation processing based on the method SLA1 and the method SLB1 or SLB2 is performed.

Specifically, the client 101 first performs the contribution coefficient calculation processing presented in fig. 45 to calculate a contribution coefficient for obtaining the contribution rate.

The contribution coefficient calculation process by the client 101 will be described hereinafter with reference to a flowchart shown in fig. 45.

In step S201, the position calculation unit 114 initializes a coefficient cvpidx indicating a CVP corresponding to the processing target. In this way, the value of the coefficient cvpidx is set to 0.

In step S202, the position calculating unit 114 determines whether or not the value of the coefficient cvpidx indicating the CVP to be processed is smaller than the number numOfCVP of all CVPs, that is, CVpidx < numOfCVP.

Note that the number numOfCVP of CVPs is equal to the number of candidates of CVPs for interpolation processing. Specifically, the number indicated by the number information of CVPs, the number of CVPs satisfying a specific condition such as existence around the listener position, the number of CVPs belonging to the CVP group corresponding to the target group area, and the like are designated as numOfCVP.

In the case where it is determined in step S202 that cvpidx < numOfCVP of CVPs are satisfied, calculation of the contribution coefficients of all CVPs that are candidates for interpolation processing has not been completed. Thus, the process advances to step S203.

In step S203, the position calculation unit 114 calculates the euclidean distance from the listener position to the CVP corresponding to the processing object based on the listener position information and the CVP position information associated with the CVP corresponding to the processing object, and holds the calculation result thus obtained as distance information dist [ cvpidx ]. For example, the position calculation unit 114 calculates the distance information dist [ cvpidx ] by performing a calculation similar to that of the above equation (5).

In step S204, the position calculation unit 114 calculates a contribution coefficient CVP _ contri _coef [ cvpidx ] of the CVP as the processing target based on the distance information dist [ cvpidx ] and the sensitivity coefficient WeightRatioFactor.

For example, the sensitivity coefficient WeightRatioFactor may be read from the configuration information, or may be specified by a specification operation performed by a listener or the like on an undepicted input unit or the like. Alternatively, the position calculation unit 114 may calculate the sensitivity coefficient WeightRatioFactor based on the positional relationship between the listener position and the respective CVPs, the gain of the object at each CVP, and the like.

Here, the sensitivity coefficient WeightRatioFactor has a real value of 2 or more, for example. However, the value of the sensitivity coefficient WeightRatioFactor is not limited to this value, and may be any value.

For example, the position calculation unit 114 calculates distance information dist [ cvpidx ] raised to a power of a coefficient equal to the sensitivity coefficient WeightRatioFactor, and divides 1 by the value thus obtained, i.e., obtains the reciprocal of the value thus obtained, to calculate the contribution coefficient cvp _ contri _coef [ cvpidx ].

Specifically, the contribution coefficient cvp _ contri _coef [ cvpidx ] is obtained by performing a calculation operation of cvp _ contri _coef [ cvpidx ] =1.0/pow (dist [ cvpidx ], weightRatioFactor). Note here that pow () represents a function for performing calculation of power.

In step S205, the position calculation unit 114 increments the value of the coefficient cvpidx of the CVP.

After the process in step S205 is completed, the process then returns to step S202 to repeat the above-described process. Specifically, for the CVP newly designated as the processing object, a contribution coefficient CVP _ contri _coef [ cvpidx ] is calculated.

If it is determined in step S202 that cvpidx < numOfCVP is not satisfied, all CVPs are designated as processing targets, and a contribution coefficient CVP _ contri _coef [ cvpidx ] is calculated for these targets. After the calculation is completed, the contribution coefficient calculation process ends.

In the above manner, the client 101 calculates the contribution coefficient from the distance between the listener position and each CVP. In this way, interpolation processing based on the method SLA1 can be realized, and thus occurrence of unnatural audio image movement can be reduced.

< Description of normalized contribution coefficient calculation processing >

Further, after completing the contribution coefficient calculation process described with reference to fig. 45, the client 101 then performs normalized contribution coefficient calculation processing based on the method SLB1 or the method SLB2 to obtain a normalized contribution coefficient as a contribution rate.

The normalized contribution coefficient calculation process based on the method SLB2 and performed by the client 101 is described herein first with reference to the flowchart in fig. 46.

The normalized contribution coefficient calculation process based on the method SLB2 here is a normalized contribution coefficient calculation process based on the mute flag contained in the configuration information.

In step S231, the position calculating unit 114 initializes a coefficient cvpidx indicating a CVP corresponding to the processing target. In this way, the value of the coefficient cvpidx is set to 0.

Note that the same CVP as that corresponding to the processing object in the contribution coefficient calculation processing in fig. 45 is processed as the processing object in the normalized contribution coefficient calculation processing. Therefore, the number numOfCVP of CVPs as processing targets is the same as the number of CVPs in the contribution coefficient calculation processing in fig. 45.

In step S232, the position calculation unit 114 initializes the coefficient objidx indicating the object corresponding to the processing target.

In this way, the value of the coefficient objidx is set to 0. Here, the number numOfObjs of objects to be processed is set to the number indicated by the number of objects included in the configuration information, which is the number of all the objects constituting the content. In the following steps, the CVP indicated by the coefficient cvpidx and the object indicated by the coefficient objidx are sequentially processed in order, as viewed from the CVP.

In step S233, the position calculation unit 114 determines whether the value of the coefficient objidx is smaller than the number of all objects numOfObjs, i.e., objidx < numOfObjs.

In the case where it is determined in step S233 that objidx < numOfObjs is satisfied, the position calculation unit 114 initializes the value of the total coefficient variable total_coef in step S234. In this way, the value of the total coefficient variable total_coef of the object corresponding to the processing target and indicated by the coefficient objidx is set to 0.

The total coefficient variable total_coef is a coefficient for normalizing the contribution coefficient CVP _ contri _coef [ cvpidx ] of each CVP of the object corresponding to the processing target and indicated by the coefficient objidx. As described below, the sum of the contribution coefficients CVP _ contri _coef [ cvpidx ] of all CVPs used for interpolation processing of one object eventually becomes the total coefficient variable total_coef.

In step S235, the position calculating unit 114 determines whether or not the value of the coefficient cvpidx indicating the CVP to be processed is smaller than the number numOfCVP of all CVPs, that is, CVpidx < numOfCVP.

In the case where it is determined in step S235 that cvpidx < numOfCVP is satisfied, the process advances to step S236.

In step S236, the position calculation unit 114 determines whether the value of the mute flag of the object corresponding to the processing target and indicated by the coefficient objidx at the CVP indicated by the coefficient cvpidx is 1, that is, whether the object is a mute object.

In the case where it is determined in step S236 that the value of the mute flag is not 1 (i.e., the object is not a mute object), in step S237, the position calculation unit 114 adds the contribution coefficient corresponding to the CVP of the processing target to the value of the held total coefficient variable to update the total coefficient variable.

Specifically, total_coef+= cvp _ contri _coef [ cvpidx ] is calculated. In other words, the contribution coefficient CVP _ contri _coef [ cvpidx ] of the CVP corresponding to the processing target and indicated by the coefficient cvpidx is added to the current value of the total coefficient variable total_coef reserved by the position calculating unit 114 and associated with the object corresponding to the processing target and indicated by the coefficient objidx, and the result of this addition is designated as the total coefficient variable total_coef after updating.

After the process of step S237 is completed, the process advances to step S238.

Further, in the case where it is determined in step S236 that the value of the mute_flag is 1 (i.e., the object is a Mute object), the processing in step S237 is not performed, and the processing then proceeds to step S238. This is because the CVP corresponding to the processing target object designated as the mute object is excluded from the processing targets of the interpolation processing.

When the processing is performed in step S237 or the mute flag is determined to be 1 in step S236, the position calculating unit 114 increases the coefficient cvpidx indicating the CVP corresponding to the processing target in step S238.

After the process in step S238 is completed, the process then returns to step S235 to repeat the above-described process.

By repeating the processing from step S235 to step S238, the sum of the contribution coefficients of each CVP that does not correspond to the mute object is obtained for the object corresponding to the processing target, and the sum thus obtained is designated as the final total coefficient variable of the object corresponding to the processing target. The total coefficient variable corresponds to the variable t in the above equation (6).

If it is determined in step S235 that cvpidx < numObCVP is not satisfied, in step S239, the position calculating unit 114 initializes a coefficient cvpidx indicating a CVP corresponding to the processing target. In this way, the following processing is performed while each CVP is designated as a new processing object in order to correspond to the object of the processing object.

In step S240, the position calculation unit 114 determines whether cvpidx < numOfCVP is satisfied.

In the case where it is determined in step S240 that cvpidx < numOfCVP is satisfied, the process proceeds to step S241.

In step S241, the position calculation unit 114 determines whether the value of the mute flag of the object corresponding to the processing target indicated by the coefficient objidx at the CVP indicated by the coefficient cvpidx is 1.

In the case where it is determined in step S241 that the value of the mute flag is not 1 (i.e., the object is not a mute object), in step S242, the position calculation unit 114 calculates a normalized contribution coefficient contri _norm_ratio [ objidx ] [ cvpidx ].

For example, calculation of contri _norm_ratio [ objidx ] [ cvpidx ] = cvp _ contri _coef [ cvpidx ]/total_coef is performed to normalize the contribution coefficients, and the thus normalized contribution coefficients are designated as normalized contribution coefficients.

In other words, the position calculation unit 114 achieves normalization by dividing the contribution coefficient CVP _ contri _coef [ cvpidx ] of the CVP corresponding to the processing target and indicated by the coefficient cvpidx by the total coefficient variable total_coef of the object corresponding to the processing target and indicated by the coefficient objidx. In this way, for the object corresponding to the processing target and indicated by the coefficient objidx, the normalized contribution coefficient contri _norm_ratio [ objidx ] [ cvpidx ] of the CVP corresponding to the processing target and indicated by the coefficient cvpidx is obtained.

According to the present embodiment, the normalized contribution coefficient contri _norm_ratio [ objidx ] [ cvpidx ] is used as the contribution rate dp (i) (i.e., the contribution degree of the CVP) in equation (8). In other words, the normalized contribution coefficient is used as a weight of the CVP of each object in the interpolation process.

More specifically, the same contribution rate dp (i) common to all objects is used for the same CVP in equation (8). However, in the present embodiment, the CVP corresponding to the mute object is excluded from the interpolation processing. Thus, even for the same CVP, a normalized contribution coefficient (contribution rate dp (i)) is obtained for each object.

In this case, the normalized contribution coefficient is calculated on the basis of the contribution coefficient obtained using the value of the distance information raised to the power of the coefficient equal to the sensitivity coefficient in the contribution coefficient calculation process in fig. 45. Thus, interpolation processing based on the method SLA1 is achievable.

After the process in step S242 is completed, the process then proceeds to step S244.

Further, in the case where it is determined in step S241 that the value of mute_flag is 1 (i.e., the object is a Mute object), the processing in step S242 is not performed, and then the processing proceeds to step S243.

In step S243, the position calculation unit 114 sets the normalized contribution coefficient contri _norm_ratio [ objidx ] [ cvpidx ] of the CVP corresponding to the processing target and indicated by the coefficient cvpidx to 0 for the object corresponding to the processing target and indicated by the coefficient objidx.

In this way, the CVP corresponding to the object designated as the mute object is excluded from the targets of the interpolation processing. Thus, interpolation processing based on the method SLB2 can be realized.

After the process in step S242 or step S243 is completed, the position calculating unit 114 increments a coefficient cvpidx indicating the CVP corresponding to the processing object in step S244.

After the process in step S244 is completed, the process then returns to step S240 to repeat the above-described process.

By repeating the processing from step S240 to step S244, the normalized contribution coefficient of each CVP is obtained for the object corresponding to the processing object.

In addition, in the case where it is determined in step S240 that cvpidx < numOfCVP is not satisfied, in step S245, the position calculation unit 114 increments the coefficient objidx indicating the object corresponding to the processing target. Thus, as the processing object, a new object that is not selected as the processing object is specified.

After the process in step S245 is completed, the process then returns to step S233 to repeat the above-described process.

Further, in the case where it is determined in step S233 that objidx < numOfObjs is not satisfied, calculation of the normalized contribution coefficient (i.e., contribution rate dp (i)) of each CVP is completed for all the objects. Thus, the normalized contribution coefficient calculation process ends.

As described above, the client 101 calculates the normalized contribution coefficient of each CVP for each object based on the mute flag of each object. In this way, interpolation processing based on the method SLB2 can be realized. Thus, appropriate listener reference object location information can be acquired.

Although the normalized contribution coefficient calculation process based on the method SLB2 has been described above, a process similar to the process of the method SLB2 is also performed as the normalized contribution coefficient calculation process based on the method SLB 1.

With reference to the flowchart in fig. 47, a normalized contribution coefficient calculation process based on the method SLB1 and performed by the client 101 will be described hereinafter.

In the normalized contribution coefficient calculation process based on the method SLB1 shown in fig. 47, that is, in steps S271 to S285, a process similar to the process from step S231 to step S245 of the normalized contribution coefficient calculation process described with reference to fig. 46 is basically performed.

However, in steps S276 and S281, it is not determined whether the value of the mute flag is 1, but it is determined whether the gain of the object corresponding to the processing target at the CVP indicated by the coefficient cvpidx and indicated by the coefficient objidx is considered to be 0.

Specifically, in the case where the value of the gain of the subject is a predetermined threshold value or less, it is determined that the gain of the subject is regarded as 0.

In the case where it is determined in step S276 that the gain is not regarded as 0, the CVP is not the CVP corresponding to the mute object. Accordingly, the process advances to step S277 to update the total coefficient variable.

On the other hand, in the case where it is determined in step S276 that the gain is regarded as 0, the CVP is a CVP corresponding to the mute object. Accordingly, the corresponding CVP is excluded from the processing target of the interpolation processing, and then the processing proceeds to step S278.

Further, in the case where it is determined in step S281 that the gain is not regarded as 0, the CVP is not the CVP corresponding to the mute object. Thus, the process proceeds to step S282 to calculate a normalized contribution coefficient.

On the other hand, in the case where it is determined in step S281 that the gain is regarded as 0, the CVP is a CVP corresponding to the mute object. Therefore, the process proceeds to step S283, and the CVP is excluded from the targets of the interpolation process by setting the normalized contribution coefficient to 0.

According to the normalized contribution coefficient calculation process based on the method SLB1 as described above, the interpolation process based on the method SLB1 can be realized. Thus, appropriate listener reference object location information can be acquired.

In step S84 of fig. 18 or in step S126 of fig. 34, after the normalization contribution coefficient calculation process based on the method SLB1 or the method SLB2 is completed, the position calculation unit 114 then performs interpolation processing using the obtained normalization contribution coefficient.

Specifically, the position calculation unit 114 calculates equation (7) to obtain an object three-dimensional position vector, and calculates equation (8) using the normalized contribution coefficient contri _norm_ratio [ objidx ] [ cvpidx ] obtained by the above-described processing in place of the contribution rate dp (i). In other words, the interpolation process of equation (8) is performed using the normalized contribution coefficient.

Further, the position calculation unit 114 calculates equation (9) based on the calculation result of equation (8), and performs correction by calculating the correction amounts obtained by equation (10) and equation (11) as needed.

In this way, the end listener reference object position information and the end listener reference gain to which the method SLA1 and the method SLB1 or SLB2 have been applied are obtained.

Thus, the occurrence of the above case a or case B is reduced. In particular, it is possible to reduce the occurrence of unnatural audio image movements and to acquire suitable listener reference object position information.

By using either the method SLB1 or the method SLB2, the position calculation unit 114 performs interpolation processing based on CVP position information, object position information, gain of an object, and listener position information associated with CVPs corresponding to objects that are not substantially mute objects, and calculates listener reference object position information and listener reference gain.

At this time, according to the method SLB2, the position calculation unit 114 identifies the CVP corresponding to the object that is not the mute object as mute information based on the mute flag. On the other hand, according to the method SLB1, the position calculating unit 114 identifies the CVP corresponding to the object that is not the mute object based on the gain of the object viewed from the CVP (i.e., the result of the determination of whether the gain is the threshold or less).

< Fourth embodiment >

< Object position information and gain interpolation processing >

Meanwhile, in order to perform interpolation processing for obtaining listener reference object position information and listener reference gain, the reproduction side (i.e., listener side) may intentionally select a CVP for the interpolation processing.

In this case, the listener can enjoy the content of the listener by using only the CVP and desire to listen to the CVP in a limited manner. For example, content reproduction or the like may be realized using only CVPs corresponding to all artists as objects located at a positioning position close to a listener.

Specifically, as shown in fig. 48, for example, it is assumed that a stage ST11, a target point TP, and a corresponding CVP are located at positions similar to the example described in fig. 11 in the free viewpoint space, and a listener (user) is allowed to select a CVP for interpolation processing. Note that components in fig. 48 that are similar to corresponding components in fig. 11 have the same reference numerals, and descriptions of these components are omitted where appropriate.

According to the example described in fig. 48, it is assumed that CVPs 1 to 7 are defined as CVPs constituting the original CVP configuration, i.e., CVPs set by the content creator, for example, as described in the left part of the figure.

In this case, for example, as shown in the right part of the figure, it is assumed that the listener selects CVP1, CVP3, CVP4, and CVP6 located near the stage ST11 from the above CVPs 1 to 7.

In this case, at the time of actual reproduction of the content, the listener feels as if the artist as an object is located near the listener, as compared with the case where all CVPs are used for interpolation processing.

Further, for example, when the listener selects a CVP, a CVP selection screen as described in fig. 49 may be displayed on the client 101.

According to this example, the left part of the figure depicts a CVP selection screen DSP11 displayed as a screen containing a plurality of arranged viewpoint images for each of the CVPs depicted in fig. 48, the viewpoint images each indicating a state when the target point TP is viewed from the corresponding CVP (i.e., stage ST 11).

For example, the viewpoint images SPC11 to SCP14 are viewpoint images formed when the CVP5, the CVP7, the CVP2, and the CVP6 are respectively designated as viewpoint positions (listener positions). Further, a message "select reproduction viewpoint" causing selection of the CVP is also displayed in the CVP selection screen DSP 11.

When the CVP selection screen DSP11 thus configured is displayed, the listener (user) selects a viewpoint image corresponding to a favorite CVP to select a CVP for interpolation processing. As a result, for example, the display of the CVP selection screen DSP11 presented in the left part of the figure is updated, and the CVP selection screen DSP12 presented in the right part of the figure is displayed.

In the CVP selection screen DSP12, each view image of the CVP that is not selected by the listener is displayed in a display form different from that of each view image of the selected CVP, such as a light-colored gray display.

For example, the CVPs 5, 7, and 2 corresponding to the viewpoint images SPC11 to SPC13 are not selected here, and the viewpoint images of these CVPs are presented in gray display. Further, the display of view images corresponding to the CVPs 6, 1, 3, and 4 selected by the listener is displayed in the CVP selection screen DSP11 without being changed.

By displaying such a CVP selection screen, the listener is allowed to perform an appropriate CVP selection operation while visually checking the state viewed from each CVP. Further, an image of the entire venue depicted in fig. 48 (i.e., the entire free viewpoint space) may also be displayed in the CVP select screen.

Further, in the case where the listener is allowed to select a CVP for interpolation processing, information as to whether the CVP is selectable may be stored in the configuration information. In this case, the intention of the content creator may be transmitted to the listener (client 101) side.

In the case where information on whether a CVP is selectable is stored in the configuration information, the listener selects only the CVP from among target CVPs permitted to be selected on the reproduction side, and detects (specifies) the listener's selection or non-selection of the CVP. Thereafter, in the case where a listener unselected CVP exists in the allowable selected CVP (hereinafter, the unselected CVP will also be referred to as an unselected CVP), interpolation processing for calculating listener reference object position information and listener reference gain is performed after the unselected CVP is excluded.

< Format example of configuration information >

According to this embodiment, the information presented in fig. 50 is stored in the configuration information as information indicating whether the CVP is selectable, i.e., the selection possibility information.

Note that fig. 50 presents an example of the format (syntax) of a portion of the configuration information.

More specifically, the configuration information includes the configuration presented in fig. 7 in addition to the configuration presented in fig. 50. In other words, the configuration information includes the configuration presented in fig. 50 in a portion of the configuration presented in fig. 7. Alternatively, a portion of the configuration information presented in FIG. 32 may contain the configuration presented in FIG. 50, or the information presented in FIG. 44 may be further stored in the configuration information.

According to the example presented in fig. 50, "CVPelectAllowPresentFlag" represents a CVP select information present flag. The CVP selection information present flag is flag information indicating whether information associated with a CVP selectable on the listening side (i.e., whether the listening side is allowed to select a CVP) exists in the configuration information.

The value "0" of the CVP selection information present flag indicates that information associated with the selectable CVP is not included (stored) in the configuration information.

Further, a value of "1" of the CVP selection information present flag indicates that information associated with a selectable CVP is included in the configuration information.

In the case where the value of the CVP select information present flag is "1", the configuration information also stores "numOfAllowedCVP" indicating the number of CVPs selectable by the listener and coefficient information "AllowedCVPIdx [ i ]" indicating the CVPs selectable by the listener.

For example, coefficient information "AllowedCVPIdx [ i ]" represents a value or the like of CVP coefficient "ControlViewpointIndex [ i ]" indicating a CVP that the listener can select and present in fig. 9. Further, coefficient information "AllowedCVPIdx [ i ]" indicating the same number of selectable CVPs as the number indicated by "numOfAllowedCVP" is stored in the configuration information.

As described above, according to the example presented in fig. 50, the CVP selection information present flag "CVPelectAllowPresentFlag", the number of selectable CVPs "numOfAllowedCVP", and the coefficient information "AllowedCVPIdx [ i ]" are included in the configuration information, and as the selection possibility information, it is indicated whether or not the CVP for calculating the listener reference object position information and the listener reference gain is selectable.

Using this type of configuration information, the client 101 may identify which CVPs, among the CVPs that make up the content, are allowed to be selected.

Note that this embodiment mode can be combined with any one or more of the above-described first to third embodiment modes.

Also, in the case where the configuration information includes the configuration presented in fig. 50, the information processing apparatus 11 performs the content creation process described with reference to fig. 14.

In this case, however, the control unit 26 accepts a designation operation for designating whether or not the CVP is permitted to be selected, for example, at any timing such as step S16. Thereafter, the control unit 26 generates configuration information containing any necessary information selected from the CVP selection information present flag, the number of selectable CVPs, and coefficient information indicating the selectable CVPs according to a specified operation in step S23.

< Configuration example of client >

Further, for example, in the case where a CVP for interpolation processing is selectable on the client 101 (listener) side, the client 101 has the configuration described in fig. 51. Note that portions in fig. 51 that are similar to corresponding portions in fig. 17 are given the same reference numerals, and a description of these portions will be omitted where appropriate.

The configuration of the client 101 shown in fig. 51 has a configuration including an input unit 201 and a display unit 202 newly added to the configuration shown in fig. 17.

For example, the input unit 201 includes input devices such as a touch panel, a mouse, a keyboard, and buttons and supplies signals corresponding to input operations performed by a listener (user) to the position calculation unit 114.

The display unit 202 includes a display, and displays various types of images, such as a CVP selection screen, according to instructions issued from the position calculation unit 114 or the like.

< Description of Selective interpolation Process >

Also, in the case where the CVP for interpolation processing is selectable on the client 101 side as needed, the client 101 basically performs the reproduced audio data generating processing described with reference to fig. 18 or fig. 34.

However, in step S84 in fig. 18 or in step S126 in fig. 34, the selective interpolation processing presented in fig. 52 is performed to obtain listener reference object position information and listener reference gain.

The selective interpolation process performed by the client 101 will be described hereinafter with reference to a flowchart presented in fig. 52.

In step S311, the position calculation unit 114 acquires the configuration information from the decoding unit 113.

In step S312, the position calculation unit 114 determines whether the number of selectable CVPs is greater than 0, that is, whether numOfAllowedCVP >0 is satisfied, based on the configuration information.

In the event that determination is made in step S312 that numOfAllowedCVP >0 is satisfied (i.e., there is a listener selectable CVP), in step S313, the position calculation unit 114 presents the selectable CVP and accepts the listener' S selection of the CVP.

For example, based on coefficient information "AllowedCVPIdx [ i ]" indicating selectable CVPs and included in the configuration information, the position calculation unit 114 generates a CVP selection screen that presents the CVP indicated by the coefficient information as a selectable CVP, and causes the display unit 202 to display the generated CVP selection screen. In this case, for example, the display unit 202 displays a CVP selection screen shown in fig. 49.

The listener (user) operates the input unit 201 to select a desired CVP as the CVP for interpolation processing while viewing the CVP selection screen displayed on the display unit 202.

Thereafter, a signal corresponding to the selection operation performed by the listener is supplied from the input unit 201 to the position calculation unit 114. Thus, the position calculation unit 114 updates the screen on the display unit 202 according to the signal supplied from the input unit 201. Thus, for example, the display on the display unit 202 is updated from the display described in the left part of fig. 49 to the display described in the right part of fig. 49.

Note that the selection of the CVP by the listener on the CVP selection screen may be implemented before the content reproduction, or may be performed any number of times at any time during the content reproduction.

In step S314, the position calculation unit 114 determines whether or not the CVP excluded from the interpolation process exists among selectable CVPs, that is, whether or not there is a CVP not selected by the listener, based on the signal supplied from the input unit 201 according to the selection operation of the listener.

In the case where it is determined in step S314 that there is an excluded CVP, the process then advances to step S315.

In step S315, the position calculation unit 114 performs interpolation processing using the non-selectable CVP and the listener-selected CVP to obtain listener reference object position information and listener reference gain.

More specifically, interpolation processing is performed based on CVP position information, object position information, gain of an object, and the like associated with a plurality of CVPs including an unselected CVP and a CVP selected by a listener, and also based on the listener position information.

Note here that the non-selectable CVP is a CVP that does not include coefficient information "AllowedCVPIdx [ i ]" in the configuration information. In other words, the non-selectable CVP is a CVP that is not designated as selectable and identified by the selection possibility information contained in the configuration information.

Thus, in step S315, after excluding CVPs that are not selected by the listener (i.e., non-selected CVPs) from among all CVPs, interpolation processing is performed using all remaining CVPs.

Specifically, for example, interpolation processing is performed using all CVPs except the CVP selected in a manner similar to the first embodiment or the third embodiment to obtain listener reference object position information and listener reference gain.

Note that an example of excluding non-selected CVPs from all CVPs need not be employed. For example, interpolation processing may be performed using CVPs remaining after non-selected CVPs are excluded from CVPs satisfying a specific condition (such as existence around a listening position), or using CVPs remaining after non-selected CVPs are excluded from CVPs belonging to a CVP group corresponding to a target group area.

After the process of step S315 is completed, the selective interpolation process ends.

Further, in the case where it is determined in step S312 that numOfAllowedCVP >0 is not satisfied (i.e., there is no optional CVP), or in the case where it is determined in step S314 that there is no excluded CVP, the process then proceeds to step S316.

In step S316, the position calculation unit 114 performs interpolation processing using all the CVPs to obtain listener reference object position information and listener reference gains. Thereafter, the selective interpolation process ends.

In step S316, interpolation processing similar to the interpolation processing performed in step S315 is performed except that the CVP used for the interpolation processing is different. Note that in step S316, interpolation processing may be similarly performed using a CVP that satisfies a specific condition or using a CVP belonging to a CVP group corresponding to the target group area.

In the above manner, the client 101 selectively performs interpolation processing using the CVP selected according to the selection of the listener or the like. In this way, the client 101 can also reproduce content reflecting the preference of the listener (user) while reflecting the intention of the content creator.

< Configuration example of computer >

Meanwhile, the above-described series of processes may be performed by hardware or software. In the case where a series of processes are performed by software, a program constituting the software is installed in a computer. Examples of the computer herein include a computer incorporated in dedicated hardware and a computer capable of performing various types of functions under various types of installation programs, such as a general-purpose personal computer.

Fig. 53 is a block diagram describing a configuration example of hardware of a computer that executes the above-described series of processes under a program.

In the computer, a CPU (central processing unit) 501, a ROM (read only memory) 502, and a RAM (random access memory) 503 are connected to each other via a bus 504.

The input/output interface 505 is further connected to a bus 504. The input unit 506, the output unit 507, the recording unit 508, the communication unit 509, and the drive 510 are connected to the input/output interface 505.

The input unit 506 includes a keyboard, a mouse, a microphone, an imaging element, and the like. The output unit 507 includes a display, a speaker, and the like. The recording unit 508 includes a hard disk, a nonvolatile memory, and the like. The communication unit 509 includes a network interface and the like. The drive 510 drives a removable recording medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, and a semiconductor memory.

According to the computer configured as above, the CPU 501 loads a program recorded in the recording unit 508 into the RAM 503 via the input/output interface 505 and the bus 504, and executes the loaded program, for example, to execute the above-described series of processes.

For example, a program executed by a computer (CPU 501) may be recorded in a removable recording medium 511 such as a package medium, and provided in this form. Further, the program may be provided via a wired or wireless transmission medium such as a local area network, the internet, and digital satellite broadcasting.

The program of the computer can be installed into the recording unit 508 from the removable recording medium 511 attached to the drive 510 via the input/output interface 505. Alternatively, the program may be received by the communication unit 509 via a wired or wireless transmission medium and installed in the recording unit 508. Instead, the program may be installed in advance in the ROM 502 or the recording unit 508.

Note that the program executed by the computer may be a program in which processes are executed in time series in the order described in the present specification, or may be a program in which processes are executed in parallel or at necessary timing such as timing when a call is made.

Further, the embodiments of the present technology are not limited to the above-described embodiments, and may be modified in various ways without departing from the subject matter of the present technology.

For example, the present technology may have a configuration of cloud computing in which a plurality of devices operating in cooperation with each other through a network share and process one function.

Furthermore, the various steps described in the flowcharts above may be performed by one apparatus or may be shared and performed by a plurality of apparatuses.

Further, in the case where a plurality of processes are included in one step, the plurality of processes included in one step may be performed by one apparatus or may be shared and performed by a plurality of apparatuses.

Further, the present technology may have the following configuration.

(1)

An information processing apparatus comprising:

A control unit configured to

Generating a plurality of metadata sets, each metadata set comprising metadata associated with a plurality of objects, the metadata containing object position information indicating a position of an object viewed from a control viewpoint when a direction from the control viewpoint toward a target point in space is designated as a direction toward a median plane,

For each control viewpoint of a plurality of control viewpoints, generating control viewpoint information including control viewpoint position information indicating a position of the corresponding control viewpoint in the space and information indicating a metadata set associated with the corresponding control viewpoint among the plurality of metadata sets, and

Content data including the plurality of metadata sets different from each other and configuration information including the control viewpoint information associated with the plurality of control viewpoints is generated.

(2)

The information processing apparatus according to (1), wherein

The metadata includes a gain of the object.

(3)

The information processing apparatus according to (1) or (2), wherein

The control viewpoint information includes control viewpoint orientation information indicating a direction from the control viewpoint toward the target point in the space or target point information indicating the target point in the space.

(4)

The information processing apparatus according to any one of (1) to (3), wherein

The configuration information includes at least any one of object number information indicating the number of objects constituting the content, control viewpoint number information indicating the number of control viewpoints, and metadata set number information indicating the number of metadata sets.

(5)

The information processing apparatus according to any one of (1) to (4), wherein

The configuration information contains control viewpoint group information associated with a control viewpoint group including control viewpoints included in a predetermined group area in the space, and

For one or the plurality of control viewpoint groups, the control viewpoint group information includes information indicating the control viewpoints belonging to the control viewpoint group and information for identifying the group area corresponding to the control viewpoint group.

(6)

The information processing apparatus according to (5), wherein

The configuration information includes information indicating whether the control viewpoint group information is included.

(7)

The information processing apparatus according to (5) or (6), wherein

The control viewpoint group information includes at least one of information indicating the number of control viewpoints belonging to the control viewpoint group or information indicating the number of control viewpoint groups.

(8)

The information processing apparatus according to any one of (1) to (7), wherein

The configuration information contains mute information for identifying the object designated as a mute object viewed from any one of the control viewpoints.

(9)

The information processing apparatus according to any one of (1) to (8), wherein

The configuration information includes selection possibility information concerning whether a control viewpoint for calculating listener reference object position information indicating a position of the object viewed from a listener position or a calculation of a gain of the object viewed from the listener position is selectable.

(10)

An information processing method performed by an information processing apparatus, comprising:

Generating a plurality of metadata sets, each metadata set including metadata associated with a plurality of objects, the metadata containing object position information indicating a position of an object viewed from a control viewpoint when a direction from the control viewpoint toward a target point in space is designated as a direction toward a median plane;

For each control viewpoint of a plurality of control viewpoints, generating control viewpoint information including control viewpoint position information indicating a position of the corresponding control viewpoint in the space and information indicating a metadata set associated with the corresponding control viewpoint among the plurality of metadata sets; and

(11)

A program for causing a computer to execute:

(12)

An information processing apparatus comprising:

an acquisition unit that acquires object position information indicating a position of an object viewed from the control viewpoint when a direction from the control viewpoint toward a target point in a space is specified as a direction toward a median plane, and control viewpoint position information indicating a position of the control viewpoint in the space;

A listener position information acquisition unit configured to acquire listener position information indicating a listener position in the space; and

A position calculation unit configured to calculate listener reference object position information indicating a position of the object viewed from the listener position based on the listener position information, the control viewpoint position information associated with a plurality of control viewpoints, and the object position information associated with the plurality of control viewpoints.

(13)

The information processing apparatus according to (12), wherein

The acquisition unit acquires a metadata set including metadata of a plurality of objects and containing the object position information, the control viewpoint position information, and specification information indicating the metadata set associated with the control viewpoint, and

The position calculation unit calculates the listener reference object position information based on the object position information in the metadata set indicated by the specification information contained in a plurality of metadata sets different from each other.

(14)

The information processing apparatus according to (12) or (13), wherein

The position calculation unit calculates listener reference object position information by performing interpolation processing based on the listener position information, control viewpoint position information associated with the plurality of control viewpoints, and object position information associated with the plurality of control viewpoints.

(15)

The information processing apparatus according to (14), wherein

The interpolation process includes vector synthesis.

(16)

The information processing apparatus according to (15), wherein

The position calculation unit performs the vector synthesis by using weights obtained based on the listener position information and the control viewpoint position information associated with the plurality of control viewpoints.

(17)

The information processing apparatus according to any one of (14) to (16), wherein

The position calculation unit performs the interpolation processing based on the control viewpoint position information associated with the control viewpoint corresponding to the object that is not a mute object and based on the object position information.

(18)

The information processing apparatus according to (17), wherein

The acquisition unit further acquires mute information for identifying the object designated as the mute object when viewed from the control viewpoint, and

The position calculation unit identifies the control viewpoint corresponding to the object that is not the mute object based on the mute information.

(19)

The information processing apparatus according to (17), wherein

The acquisition unit further acquires, for each of the plurality of control viewpoints, a gain of the object when viewed from the control viewpoint, and

The position calculation unit identifies the control viewpoint corresponding to the object that is not the mute object based on the gain.

(20)

The information processing apparatus according to any one of (14) to (19), wherein

The acquisition unit further acquires selection possibility information on whether the control viewpoint for calculating the listener reference object position information is selectable, and

The position calculation unit performs the interpolation processing based on the control viewpoint position information associated with the control viewpoint selected by the listener from among the control viewpoints that can be selected with reference to the selection possibility information and based on the object position information.

(21)

The information processing apparatus according to (20), wherein

The position calculation unit performs the interpolation processing based on the control viewpoint position information and the object position information associated with the control viewpoint that cannot be selected with reference to the selection possibility information, and based on the control viewpoint position information and the object position information associated with the control viewpoint selected by the listener.

(22)

The information processing apparatus according to any one of (12) to (21), wherein

The listener position information acquisition unit acquires listener orientation information indicating the orientation of a listener in the space, and

The position calculation unit calculates listener reference object position information based on listener orientation information, listener position information, control viewpoint position information associated with a plurality of control viewpoints, and object position information associated with the plurality of control viewpoints.

(23)

The information processing apparatus according to (22), wherein

The acquisition unit further acquires control viewpoint orientation information indicating a direction from the control viewpoint toward the target point in the space, for each of the plurality of control viewpoints, and

The position calculation unit calculates the listener reference object position information based on the control viewpoint orientation information, the listener position information, the control viewpoint position information associated with the plurality of control viewpoints, and the object position information associated with the plurality of control viewpoints.

(24)

The information processing apparatus according to any one of (12) to (23), wherein

The position calculation unit calculates a gain of the object when viewed from the listener position by performing interpolation processing based on the listener position information, the control viewpoint position information associated with the plurality of control viewpoints, and the gains viewed from the plurality of control viewpoints.

(25)

The information processing apparatus according to (24), wherein

The position calculation unit performs the interpolation processing based on a weight obtained from the inverse of a power value of a distance defined from the listener position to the control viewpoint, which is exponential with a predetermined sensitivity coefficient.

(26)

The information processing apparatus according to (25), wherein

A sensitivity coefficient is set for each control viewpoint or for each object as viewed from the control viewpoint.

(27)

The information processing apparatus according to any one of (24) to (26), wherein

The acquisition unit further acquires selection possibility information on whether the control viewpoint for calculating the gain of the object when viewed from the listener position is selectable, and

The position calculation unit performs the interpolation processing based on the control viewpoint position information associated with the control viewpoint that the listener selects from among the control viewpoints that can be selected with reference to the selection possibility information and based on the gain.

(28)

The information processing apparatus according to (27), wherein

The position calculation unit performs the interpolation process based on the control viewpoint position information associated with the control viewpoint that cannot be selected with reference to the selection possibility information and based on the gain, and based on the control viewpoint position information associated with the control viewpoint selected by the listener and based on the gain.

(29)

The information processing apparatus according to any one of (12) to (28), further comprising:

And a rendering processing unit configured to perform rendering processing based on the audio data of the object and the listener reference object position information.

(30)

The information processing apparatus according to any one of (12) to (29), wherein

The listener reference object position information includes information indicating a position of the object and indicated by coordinates in a polar coordinate system having an origin located at the listener position.

(31)

The information processing apparatus according to any one of (12) to (30), wherein

The acquisition unit further acquires control viewpoint group information associated with a control viewpoint group including control viewpoints included in a predetermined group area included in the space, and for one or more control viewpoint groups, the control viewpoint group information includes information indicating the control viewpoints belonging to the control viewpoint group and information for identifying the group area corresponding to the control viewpoint group, and

The position calculation unit calculates the listener reference object position information based on the control viewpoint position information associated with the control viewpoints belonging to the control viewpoint group corresponding to the group region including the listener position and based on the object position information and the listener position information.

(32)

The information processing apparatus according to (31), wherein

The position calculation unit acquires configuration information including control viewpoint information that is associated with the plurality of control viewpoints and includes the control viewpoint position information, and the configuration information includes information indicating whether the control viewpoint group information is included, and

And according to the information indicating whether the control viewpoint group information is contained or not, the configuration information contains the control viewpoint group information.

(33)

The information processing apparatus according to (31) or (32), wherein

(34)

The information processing apparatus according to any one of (12) to (33), wherein

The acquisition unit acquires configuration information including:

Control viewpoint information associated with the plurality of control viewpoints and containing the control viewpoint position information, and

At least any one of object number information indicating the number of the objects constituting the content, control viewpoint number information indicating the number of the control viewpoints, and metadata set number information indicating the number of metadata sets including a plurality of objects including object position information.

(35)

acquiring object position information indicating a position of an object viewed from the control viewpoint when a direction from the control viewpoint toward a target point in a space is specified as a direction toward a median plane, and control viewpoint position information indicating a position of the control viewpoint in the space;

Acquiring listener position information indicating a listener position in the space; and

Listener reference object position information indicating a position of the object viewed from the listener position is calculated based on listener position information, the control viewpoint position information associated with a plurality of control viewpoints, and the object position information associated with a plurality of control viewpoints.

(36)

A program for causing a computer to execute:

Listener reference object position information indicating a position of the object viewed from the listener position is calculated based on the listener position information, the control viewpoint position information associated with a plurality of control viewpoints, and the object position information associated with a plurality of control viewpoints.

[ List of reference numerals ]

11. Information processing apparatus

21. Input unit

22. Display unit

24. Communication unit

26. Control unit

51. Server device

61. Communication unit

62. Control unit

71. Coding unit

101. Client terminal

111 Listener position information acquisition unit

112. Communication unit

113. Decoding unit

114. Position calculation unit

115. And a rendering processing unit.

Claims

1. An information processing apparatus comprising:

A control unit configured to generate a plurality of metadata sets, each metadata set including metadata associated with a plurality of objects, the metadata containing object position information indicating a position of the object viewed from a control viewpoint when a direction from the control viewpoint toward a target point in space is designated as a direction toward a median plane,

2. The information processing apparatus according to claim 1, wherein,

The metadata includes a gain of the object.

3. The information processing apparatus according to claim 1, wherein,

4. The information processing apparatus according to claim 1, wherein,

The configuration information includes at least any one of object number information indicating the number of the objects constituting the content, control viewpoint number information indicating the number of the control viewpoints, and metadata set number information indicating the number of the metadata sets.

5. The information processing apparatus according to claim 1, wherein,

For one or more control viewpoint groups, the control viewpoint group information includes information indicating the control viewpoints belonging to the control viewpoint group and information for identifying the group areas corresponding to the control viewpoint group.

6. The information processing apparatus according to claim 5, wherein,

7. The information processing apparatus according to claim 5, wherein,

The control viewpoint group information includes at least one of information indicating the number of control viewpoints belonging to the control viewpoint group and information indicating the number of control viewpoint groups.

8. The information processing apparatus according to claim 1, wherein,

The configuration information contains mute information for identifying the object designated as a mute object when viewed from any one of the control viewpoints.

9. The information processing apparatus according to claim 1, wherein,

The configuration information includes selection possibility information about whether the control viewpoint is selectable, the control viewpoint being used to calculate listener reference object position information indicating a position of the object viewed from a listener position or calculate a gain of the object viewed from the listener position.

10. An information processing method performed by an information processing apparatus, comprising:

11. A program for causing a computer to execute:

12. An information processing apparatus comprising:

13. The information processing apparatus according to claim 12, wherein,

14. The information processing apparatus according to claim 12, wherein,

The position calculation unit calculates the listener reference object position information by performing interpolation processing based on the listener position information, control viewpoint position information associated with the plurality of control viewpoints, and the object position information associated with the plurality of control viewpoints.

15. The information processing apparatus according to claim 14, wherein,

The interpolation process includes vector synthesis.

16. The information processing apparatus according to claim 15, wherein,

17. The information processing apparatus according to claim 14, wherein,

18. The information processing apparatus according to claim 17, wherein,

The acquisition unit further acquires mute information for identifying the object designated as the mute object when viewed from the control viewpoint, and the position calculation unit identifies the control viewpoint corresponding to the object that is not the mute object based on the mute information.

19. The information processing apparatus according to claim 17, wherein,

20. The information processing apparatus according to claim 14, wherein,

21. The information processing apparatus according to claim 20, wherein,

22. The information processing apparatus according to claim 12, wherein,

The position calculation unit calculates the listener reference object position information based on the listener orientation information, the listener position information, the control viewpoint position information associated with the plurality of control viewpoints, and the object position information associated with the plurality of control viewpoints.

23. The information processing apparatus according to claim 22, wherein,

24. The information processing apparatus according to claim 12, wherein,

25. The information processing apparatus according to claim 24, wherein,

26. The information processing apparatus according to claim 25, wherein,

A sensitivity coefficient is set for each control viewpoint or for each object viewed from the control viewpoint.

27. The information processing apparatus according to claim 24, wherein,

28. The information processing apparatus according to claim 27, wherein,

The position calculation unit performs the interpolation processing based on the control viewpoint position information and the gain associated with the control viewpoint that cannot be selected with reference to the selection possibility information, and based on the control viewpoint position information and the gain associated with the control viewpoint selected by the listener.

29. The information processing apparatus according to claim 12, further comprising:

30. The information processing apparatus according to claim 12, wherein,

31. The information processing apparatus according to claim 12, wherein,

32. The information processing apparatus according to claim 31, wherein,

33. The information processing apparatus according to claim 31, wherein,

34. The information processing apparatus according to claim 12, wherein,

The acquisition unit acquires configuration information including:

35. An information processing method performed by an information processing apparatus, comprising:

36. A program for causing a computer to execute: