CN113949884A

CN113949884A - Multi-view video data processing method, device and storage medium

Info

Publication number: CN113949884A
Application number: CN202111035779.7A
Authority: CN
Inventors: 王荣刚; 王振宇; 高文
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2021-09-02
Filing date: 2021-09-02
Publication date: 2022-01-18
Also published as: WO2023029252A1

Abstract

The invention discloses a multi-view video data processing method, equipment and a storage medium, wherein the method comprises the following steps: when the view point of the display device is switched, the display device sends the switched target view point to the decoding device, the decoding device provides the display device with the image of the slave view point corresponding to the target view point cut from the image frames in the image frame sequence, so that the display device can display the view point picture of the target view point with low resolution, when the picture switching condition is met, the display device sends a picture switching condition meeting instruction to the decoding device, and the decoding device provides the display device with the image of the main image frame sequence corresponding to the target view point and/or the image of the slave view point corresponding to the target view point cut from the image frame according to the picture switching condition meeting instruction, so that the display device can display the view point picture of the target view point with high resolution. The invention realizes fast recovery of the video from low resolution to high resolution after viewpoint switching, and ensures the definition when watching the video for a long time.

Description

Multi-view video data processing method, device and storage medium

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a method, a device, and a storage medium for processing multi-view video data.

Background

The free viewpoint technique is a technique for realizing free viewing angle viewing of video. Current free-viewpoint applications applying free-viewpoint techniques may allow viewers to view video in a continuous viewpoint form within a certain range. The viewer can set the position and the angle of the viewpoint, and is not limited to watching the video shot by only one fixed camera view angle, so that the video can be watched at a 360-degree free view angle.

The current free viewpoint application usually uses an airspace stitching method to stitch together single-path videos of multiple viewpoints, and when a user switches viewpoints at a free viewpoint application end, the free viewpoint application displays the single-path video corresponding to the switched viewpoints for the user through the stitched single-path videos of the multiple viewpoints. However, after the spatial domain stitching method is used to stitch the single-channel videos of multiple viewpoints, the resolution of the single-channel videos of each viewpoint is reduced, so that the resolution of the picture required by the free viewpoint application display is insufficient, and the resolution of the finally generated viewpoint picture is not high.

Disclosure of Invention

The embodiment of the application provides a multi-view video data processing method, equipment and a storage medium, and aims to solve the technical problem that after a spatial domain splicing method is used for splicing single-path videos of multiple views, the resolution of pictures required by view application display of the single-path videos is insufficient, and the resolution of the finally generated view pictures is not high.

The embodiment of the application provides a multi-view video data processing method, which is characterized in that the method is applied to decoding equipment, and the multi-view video data processing method comprises the following steps:

when a display instruction sent by display equipment is received, acquiring a current viewpoint corresponding to the display instruction;

sending first video data required by the display equipment to display a current viewpoint picture corresponding to the current viewpoint to the display equipment; the first video data comprises an image in a main image frame sequence received by a main transmission path corresponding to the current viewpoint and/or an image in a slave viewpoint corresponding to the current viewpoint and intercepted by a slave image frame in a slave image frame sequence received by a slave transmission path, the images in the main image frame sequence and the images in the slave image frame both comprise viewpoint pictures and/or viewpoint depth picture, and the resolution of the images in the main image frame sequence is greater than that of the images in the slave image frame;

when a viewpoint switching instruction sent by the display equipment is received, a target viewpoint corresponding to the viewpoint switching instruction is obtained;

sending second video data required by the display equipment to display a first target viewpoint picture corresponding to the target viewpoint to the display equipment; the second video data comprises a slave viewpoint image corresponding to the target viewpoint cut from an image frame in the slave image frame sequence;

when receiving a picture switching condition meeting an instruction sent by the display equipment, sending third video data required by the display equipment to display a second target viewpoint picture corresponding to the target viewpoint to the display equipment; the third video data comprises images in a main image frame sequence received by a main transmission path corresponding to the target viewpoint and/or images of a slave viewpoint corresponding to the target viewpoint and intercepted by slave image frames in the slave image frame sequence.

In an embodiment, before the step of sending, to the display device, first video data required by the display device to display a current viewpoint picture corresponding to the current viewpoint, the method further includes:

acquiring a view point identification of a slave view point corresponding to the current view point and the arrangement information of the slave image frame sequence;

determining the position information of the image of the slave viewpoint corresponding to the current viewpoint in the slave image frame according to the arrangement information and the viewpoint identification;

and intercepting an image corresponding to the position information from a slave image frame in the slave image frame sequence.

In an embodiment, the step of sending, to the display device, first video data required by the display device to display a current viewpoint picture corresponding to the current viewpoint includes:

judging whether the current viewpoint is a virtual viewpoint;

and when the current viewpoint is not the virtual viewpoint, sending the image in the main image frame sequence received by the main transmission path corresponding to the current viewpoint to the display device.

In an embodiment, the step of determining whether the current viewpoint is a virtual viewpoint further includes:

when the current viewpoint is a virtual viewpoint, sending an image in a main image frame sequence received by a main transmission path corresponding to the current viewpoint and an image of a slave viewpoint corresponding to the current viewpoint, which is intercepted from a slave image frame in a slave image frame sequence received by a slave transmission path, to the display device; the slave viewpoint corresponding to the current viewpoint comprises a slave viewpoint adjacent to the current viewpoint.

In an embodiment, the sending, to the display device, second video data required by the display device to display a first target viewpoint picture corresponding to the target viewpoint includes:

judging whether the target viewpoint is a virtual viewpoint or not;

transmitting, to the display device, an image of a same slave viewpoint as the target viewpoint, which is cut from an image frame in the slave image frame sequence, when the target viewpoint is not a virtual viewpoint.

In an embodiment, after the step of transmitting, to the display device, an image of a same slave viewpoint as the target viewpoint, which is cut from an image frame in the sequence of slave image frames, when the target viewpoint is not the virtual viewpoint, and when the picture switching condition sent by the display device is received and meets an instruction, the step of transmitting, to the display device, third video data required by the display device to display a second target viewpoint picture corresponding to the target viewpoint includes:

and when receiving a picture switching condition meeting instruction sent by the display equipment, sending an image in a main image frame sequence received by a main transmission path corresponding to the target viewpoint to the display equipment.

In an embodiment, the step of determining whether the target viewpoint is a virtual viewpoint further includes:

transmitting, to the display device, an image of a slave viewpoint adjacent to the target viewpoint, which is cut out from a slave image frame in the slave image frame sequence, when the target viewpoint is a virtual viewpoint.

In an embodiment, after the step of sending, to the display device, an image of a slave viewpoint adjacent to the target viewpoint, which is cut from an image frame in the sequence of slave image frames, when the target viewpoint is a virtual viewpoint, and when the picture switching condition sent by the display device is received and meets an instruction, the step of sending, to the display device, third video data required by the display device to display a second target viewpoint picture corresponding to the target viewpoint further includes:

and when receiving a picture switching condition meeting an instruction sent by the display equipment, sending an image in a main image frame sequence received by a main transmission path corresponding to the target viewpoint and an image of a slave viewpoint adjacent to the target viewpoint and cut from a slave image frame in the slave image frame sequence to the display equipment.

In addition, the present invention also provides a multi-view video data processing method applied to an encoding device, the multi-view video data processing method comprising:

acquiring images of all viewpoints shot by all cameras, wherein different cameras shoot images corresponding to different viewpoints, and the images comprise at least one of viewpoint images and viewpoint depth map images;

splicing the images of all the viewpoints, and coding the spliced images according to the shooting time and the first resolution ratio to generate a slave image frame sequence;

encoding the image of each viewpoint according to the shooting time and the second resolution to generate a main image frame sequence of each viewpoint, wherein the resolution of the image in the main image frame sequence is greater than that of the image in the spliced image;

when a viewpoint selecting instruction sent by decoding equipment is received, acquiring a viewpoint selected by the decoding equipment according to the viewpoint selecting instruction;

transmitting a main image frame sequence of a viewpoint selected by the decoding apparatus to the decoding apparatus from a main transmission path of the viewpoint, while transmitting the slave image frame sequence to the decoding apparatus from a slave transmission path.

In an embodiment, the step of stitching the images of the viewpoints and encoding the stitched images according to the shooting time and the first resolution to generate the sequence of slave image frames includes:

splicing the images of the viewpoints by adopting a preset arrangement mode to generate spliced images and arrangement information of the images of the viewpoints in the spliced images, wherein the arrangement information at least comprises viewpoint identifications of the viewpoints and position information of the images of the viewpoints in the spliced images;

sequencing the spliced images according to the shooting time to generate a spliced image sequence;

and coding the spliced image sequence according to a first resolution, and marking the coded spliced image sequence by adopting the arrangement information to obtain the slave image frame sequence.

In an embodiment, the step of marking the encoded stitched image sequence with the arrangement information to obtain the slave image frame sequence includes:

and inserting the arrangement information into the sequence head of the coded spliced image sequence to obtain the slave image frame sequence.

In an embodiment, the step of labeling the encoded stitched image sequence with the arrangement information to obtain the slave image frame sequence further includes:

and inserting the arrangement information into each spliced image in the coded spliced image sequence to obtain the slave image frame sequence.

Further, to achieve the above object, the present invention also provides a decoding device comprising: a memory, a processor and a multi-view video data processing program stored on the memory and executable on the processor, the multi-view video data processing program when executed by the processor implementing the steps of the multi-view video data processing method described above.

Further, to achieve the above object, the present invention also provides an encoding device comprising: a memory, a processor and a multi-view video data processing program stored on the memory and executable on the processor, the multi-view video data processing program when executed by the processor implementing the steps of the multi-view video data processing method described above.

Further, to achieve the above object, the present invention also provides a storage medium having stored thereon a multi-view video data processing program which, when executed by a processor, implements the steps of the above-described multi-view video data processing method.

The technical scheme of the multi-view video data processing method, device and storage medium provided in the embodiments of the present application has at least the following technical effects or advantages:

when the viewpoint of the display device is switched, the display device sends the switched target viewpoint to the decoding device, the decoding device provides the display device with the image of the slave viewpoint corresponding to the target viewpoint intercepted from the image frame in the image frame sequence for the display device to display the viewpoint picture of the target viewpoint with low resolution, when the picture switching condition is met, the display device sends a picture switching condition meeting instruction to the decoding device, the decoding device provides the display device with the image in the main image frame sequence corresponding to the target viewpoint and/or the image of the slave viewpoint corresponding to the target viewpoint intercepted from the image frame according to the picture switching condition meeting instruction for the display device to display the viewpoint picture of the target viewpoint with high resolution, the technical proposal solves the technical proposal that after the single-path videos of a plurality of viewpoints are spliced by using a spatial domain splicing method, the technical problem that the resolution of pictures required by the viewpoint application display is insufficient, and the resolution of the finally generated viewpoint pictures is not high is caused, so that zero delay of video display during viewpoint switching is realized, the video can be quickly restored from low resolution to high resolution after viewpoint switching, and the definition of watching the video for a long time is guaranteed.

Drawings

FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a multi-view video data processing method according to a first embodiment of the present invention;

FIG. 3 is a schematic view of a camera arrangement in multi-view video capture;

FIG. 4 is a flowchart illustrating a multi-view video data processing method according to a second embodiment of the present invention;

FIG. 5 is a flowchart illustrating a method for processing multi-view video data according to a third embodiment of the present invention;

FIG. 6 is a diagram illustrating a current viewpoint as a real viewpoint;

FIG. 7 is a diagram illustrating a current viewpoint as a virtual viewpoint;

FIG. 8 is a flowchart illustrating a fourth embodiment of a multi-view video data processing method according to the present invention;

FIG. 9 is a diagram illustrating a target viewpoint as a real viewpoint;

FIG. 10 is a diagram illustrating a target viewpoint as a virtual viewpoint;

fig. 11 is a flowchart illustrating a fifth embodiment of a method for processing multi-view video data according to the present invention;

FIG. 12 is a schematic view of a predetermined arrangement;

FIG. 13 is a schematic view of another predetermined arrangement.

Detailed Description

For a better understanding of the above technical solutions, exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.

It should be noted that fig. 1 is a schematic structural diagram of a hardware operating environment of a decoding device or an encoding device.

As shown in fig. 1, the decoding apparatus or the encoding apparatus may include: a processor 1001, such as a CPU, a memory 1005, a user interface 1003, a network interface 1004, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

It will be understood by those skilled in the art that the decoding apparatus or encoding apparatus configuration shown in fig. 1 is not limited to the decoding apparatus or encoding apparatus, and may include more or less components than those shown, or combine some components, or arrange different components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a multi-view video data processing program. Among them, the operating system is a program that manages and controls hardware and software resources of the decoding apparatus or the encoding apparatus, a multi-view video data processing program, and the execution of other software or programs.

In the decoding apparatus or the encoding apparatus shown in fig. 1, the user interface 1003 is mainly used for connecting a terminal to perform data communication with the terminal; the network interface 1004 is mainly used for the background server and performs data communication with the background server; the processor 1001 may be used to call a multi-view video data processing program stored in the memory 1005.

In the present embodiment, the decoding apparatus or the encoding apparatus includes: a memory 1005, a processor 1001 and a multi-view video data processing program stored on said memory 1005 and executable on said processor, wherein:

applied to a decoding apparatus, the processor 1001, when calling a multi-view video data processing program stored in the memory 1005, performs the following operations:

Applied to the encoding apparatus, the processor 1001, when calling the multi-view video data processing program stored in the memory 1005, performs the following operations:

Embodiments of the present invention provide embodiments of a method for processing multi-view video data, and it should be noted that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from that shown or described herein.

As shown in fig. 2, in a first embodiment of the present application, a multi-view video data processing method of the present application is applied to a decoding apparatus, and includes the steps of:

step S210: and when a display instruction sent by display equipment is received, acquiring a current viewpoint corresponding to the display instruction.

In this embodiment, the display device is a device equipped with a multi-view video playing application, such as a smart phone, a tablet, a smart television, a computer, and the like, and a user can select different views to watch videos on the display device through the multi-view video playing application, where the videos may be live videos, such as a basketball game live broadcast, a football game live broadcast, and the like, or recorded videos, such as a badminton recorded video and the like.

The present embodiment is described by taking a live video as an example, for example, a live video of a basketball event. When shooting a live video of a basketball event, a plurality of cameras need to be arranged around the event hosting site, each camera is responsible for shooting an angle image of the event, the angle image shot by each camera is an image of a viewpoint, and the image corresponding to each viewpoint comprises at least one of a viewpoint image and a viewpoint depth map image. As shown in fig. 3, 1-9 respectively represent a P1 viewpoint-a P9 viewpoint, and each viewpoint is provided with a corresponding camera, namely a P1 camera-a P9 camera. The P1 camera-P9 camera is 9 cameras for shooting the basketball game, the P1 camera-P9 camera is responsible for shooting images of one viewpoint respectively, wherein the images shot by the P1 camera are images of P1 viewpoint, the images shot by the P2 camera are images of P2 viewpoint, and the like, and the images shot by the P9 camera are images of P9 viewpoint.

After the image photographed at the P1 viewpoint-P9 viewpoint taken by the P1 camera-P9 camera, the encoding apparatus encodes the image photographed at the P1 viewpoint-P9 viewpoint. The encoding device firstly splices the images of each viewpoint together by adopting a preset arrangement mode, and generates arrangement information of the spliced images and the images of each viewpoint in the spliced images, wherein the spliced images are spliced by images shot at the same time from a P1 viewpoint to a P9 viewpoint, and the spliced images are understood to be one large image and 9 small images. The arrangement information at least includes viewpoint identifiers of the viewpoints and position information of the images of the viewpoints in the stitched images, the viewpoint identifiers indicate which viewpoint image each small image in the large image is, for example, if the viewpoint identifier of one image is P9, the image is an image corresponding to the P9 viewpoint, and the position information indicates at which position each small image is specifically arranged in the large image. After generating the arrangement information of the spliced images and the images of all the viewpoints in the spliced images, sequencing each spliced image according to the shooting time to generate a spliced image sequence. If n spliced images are generated according to the sequence of the shooting time, namely the image 1, the image 2, the image 3, the image n and the image 1-image n are sequenced once, and then the spliced image sequence is obtained. And then, coding the spliced image sequence according to a preset first resolution, and marking the coded spliced image sequence by adopting the configuration information, thereby obtaining the slave image frame sequence.

Further, while the slave image frame sequences are generated, the images of the respective viewpoints photographed from the P1 viewpoint to the P9 viewpoint are separately encoded according to the photographing time and the preset second resolution to generate the main image frame sequence of each viewpoint, that is, the images photographed at the P1 viewpoint are separately encoded according to the photographing time and the second resolution to generate the main image frame sequence of the P1 viewpoint, the images photographed at the P2 viewpoint are separately encoded according to the photographing time and the second resolution to generate the main image frame sequence of the P2 viewpoint, and so on, the images photographed at the P9 viewpoint are separately encoded according to the photographing time and the second resolution to generate the main image frame sequence of the P9 viewpoint, thereby generating 9 main image frame sequences. Wherein the main image frame of each viewpoint is an encoded image. Wherein the first resolution represents a total resolution of the stitched images, and a resolution of the images of the respective viewpoints in the stitched images is less than the second resolution, that is, the resolution of the images of the respective viewpoints in the stitched images is less than a resolution of the images in the main image frame sequence of each viewpoint.

Further, after the encoding device generates the slave image frame sequence and the main image frame sequence of each view, the encoding device acquires the view selected by the decoding device according to the received view selection instruction sent by the decoding device, and then transmits the main image frame sequence of the view selected by the decoding device to the decoding device through the main transmission path of the view, and simultaneously transmits the slave image frame sequence to the decoding device through the slave transmission path. The encoding apparatus transmits the generated main image frame sequences for the respective viewpoints to the decoding apparatus through independent transmission paths, respectively, and also transmits the generated sub-image frame sequences to the display apparatus through an independent transmission path. For ease of understanding, the independent transmission path transmitting the main image frame sequence of each viewpoint is referred to as a main transmission path, and the independent transmission path transmitting the slave image frame sequence is referred to as a slave transmission path. If there are 9 viewpoints, that is, there are 9 main transmission paths and 1 slave transmission path, each main transmission path is responsible for transmitting the main image frame sequence of the corresponding viewpoint, for example, the main transmission path corresponding to the P1 viewpoint transmits the main image frame sequence of the P1 viewpoint, and the slave transmission path is responsible for transmitting the slave image frame sequence.

The viewpoint selecting instruction is generated by the decoding device according to any one of the display instruction sent by the display device and the picture switching condition satisfying instruction. For example, the display device needs to display a view picture corresponding to a certain view currently, if the view is a P2 view, the display device generates a display instruction according to a P2 view, and sends the display instruction including a P2 view to the decoding device, after the decoding device acquires the P2 view according to the display instruction, the decoding device generates a view selection instruction including a P2 view, and sends a view selection instruction including a P2 view to the encoding device, the encoding device can acquire a P2 view according to the received view selection instruction, and the P2 view is a view selected by the decoding device, then the encoding device transmits a main image frame sequence of the P2 view to the decoding device through a main transmission path corresponding to a P2 view, and simultaneously transmits a slave image frame sequence to the decoding device through a slave transmission path.

For another example, when the user switches the current viewpoint P2 to the target viewpoint P4 through the display device, the viewpoint picture corresponding to the P4 viewpoint is the viewpoint picture to be displayed, and the display device determines that the viewpoint picture switching condition corresponding to the P4 viewpoint is satisfied, a picture switching condition satisfaction instruction including the P4 viewpoint is generated, and generates a picture switching condition satisfaction instruction including a P4 view to the decoding apparatus, the decoding apparatus generates a view selection instruction including a P4 view after acquiring the P4 view according to the picture switching condition satisfaction instruction, and sending a viewpoint selecting instruction including a P4 viewpoint to the encoding device, where the encoding device may acquire the P4 viewpoint according to the received viewpoint selecting instruction, and the P4 viewpoint is the viewpoint selected by the decoding device, and then the encoding device transmits the main image frame sequence of the P4 viewpoint to the decoding device through the main transmission path corresponding to the P4 viewpoint.

Specifically, if a user opens the multi-view video playing application on the display device and just starts to watch a live video of a basketball event, the multi-view video playing application generally takes a default view point as a current view point and displays a current view point picture (a video picture of the basketball event) corresponding to the current view point. Before the display device needs to display a current viewpoint picture corresponding to a current viewpoint, a display instruction including the current viewpoint is generated, the display instruction including the current viewpoint is sent to a decoding device, and the decoding device obtains the current viewpoint corresponding to the display instruction when receiving the display instruction sent by the display device. For example, the default viewpoint is a P2 viewpoint, the P2 viewpoint is the current viewpoint, the display device needs to display a viewpoint picture corresponding to the P2 viewpoint, and after the decoding device receives the display instruction, the viewpoint obtained according to the display instruction is the P2 viewpoint.

Step S220: and sending first video data required by the display equipment to display a current viewpoint picture corresponding to the current viewpoint to the display equipment.

The decoding device obtains a current viewpoint corresponding to the display instruction, generates a viewpoint selection instruction comprising the current viewpoint, sends the viewpoint selection instruction comprising the current viewpoint to the encoding device, the encoding device obtains the current viewpoint according to the received viewpoint selection instruction, transmits a main image frame sequence of the current viewpoint to the decoding device through a main transmission path corresponding to the current viewpoint, and simultaneously transmits a slave image frame sequence to the decoding device through a slave transmission path, and the decoding device not only receives the main image frame sequence of the current viewpoint through the main transmission path corresponding to the current viewpoint, but also receives the slave image frame sequence through the slave transmission path. After receiving the main image frame sequence and the slave image frame sequence of the current viewpoint, the decoding device acquires first video data required by the display device to display a current viewpoint picture corresponding to the current viewpoint from the main image frame sequence and/or the slave image frame sequence of the current viewpoint, and then sends the first video data to the display device.

Specifically, the first video data includes an image in a main image frame sequence received by the decoding apparatus through a main transmission path corresponding to a current viewpoint and/or an image in a slave viewpoint corresponding to the current viewpoint, which is cut out from a slave image frame in a slave image frame sequence received by the decoding apparatus through a slave transmission path. The images in the main image frame sequence and the images in the slave image frame each include a viewpoint picture and/or a viewpoint depth map picture, and the resolution of the images in the main image frame sequence is greater than the resolution of the images in the slave image frame. The slave image frame in the slave image frame sequence refers to a spliced image in the slave image frame sequence, and includes images of various viewpoints. If the stitched images are stitched by the images of the P1 viewpoint-P9 viewpoint, the images of the P1 viewpoint-P9 viewpoint are included in the image frames. The viewpoint corresponding to each image in the image frame may be referred to as a slave viewpoint, for example, the current viewpoint is a P1 viewpoint, then the P1 viewpoint in the image frame is the image of the slave viewpoint corresponding to the current viewpoint, and further, the decoding device extracts the image of the P1 viewpoint from the image frames in the image frame sequence.

Further, after receiving the first video data, the display device displays a current viewpoint picture corresponding to the current viewpoint according to the first video data. When the display device displays a current viewpoint picture corresponding to a current viewpoint according to the first video data, the current viewpoint picture seen by the user is of a higher resolution, that is, the live video of the basketball event is seen with a higher resolution.

Step S230: and when a viewpoint switching instruction sent by the display equipment is received, acquiring a target viewpoint corresponding to the viewpoint switching instruction.

Step S240: and sending second video data required by the display equipment to display a first target viewpoint picture corresponding to the target viewpoint to the display equipment.

In this embodiment, the multi-view video playing application has a view switching function, and a user can select a target view to be switched in the multi-view video playing application to watch a live video of a basketball event corresponding to the target view. Specifically, a user selects a target viewpoint to be switched on a video playing interface of display equipment, after the display equipment acquires the target viewpoint, it is determined that the user needs to perform viewpoint switching and needs to display second video data required by a first target viewpoint picture corresponding to the target viewpoint for the user, then the display equipment generates a viewpoint switching instruction according to the target viewpoint, sends the viewpoint switching instruction to decoding equipment, and the decoding equipment receives the viewpoint switching instruction and acquires the target viewpoint from the viewpoint switching instruction. For example, the current viewpoint is a P1 viewpoint, the target viewpoint selected by the user is a P2 viewpoint, and the target viewpoint acquired by the decoding device from the viewpoint switching instruction is a P2 viewpoint.

And after the target viewpoint acquired by the decoding device, capturing an image of a slave viewpoint corresponding to the target viewpoint from the slave image frames in the image frame sequence to obtain the second video data, sending the second video data to the display device, and displaying a first target viewpoint picture corresponding to the target viewpoint by the display device according to the second video data. For example, the target viewpoint is a P2 viewpoint, the second video data includes an image of a slave viewpoint corresponding to a P2 viewpoint cut out from image frames in the image frame sequence, and if the image of the slave viewpoint corresponding to a P2 viewpoint is an image F2 in the slave image frame, the display device displays an image F2. Here, since the resolution of the images from the respective viewpoints in the image frames is smaller than the resolution of the images in the main image frame sequence for each viewpoint, the resolution of the basketball event live video picture at the P2 viewpoint that the user sees when the display device displays the image F2 is lower than the resolution of the basketball event live video picture at the P1 viewpoint.

Step S250: and when receiving a picture switching condition meeting an instruction sent by the display equipment, sending third video data required by the display equipment to display a second target viewpoint picture corresponding to the target viewpoint to the display equipment.

The method comprises the steps that after a target viewpoint is acquired by decoding equipment, a viewpoint selection instruction comprising the target viewpoint is sent to the encoding equipment, the encoding equipment acquires the target viewpoint according to the received viewpoint selection instruction, a main image frame sequence of the target viewpoint is transmitted to the decoding equipment through a main transmission path corresponding to the target viewpoint, meanwhile, a slave image frame sequence is transmitted to the decoding equipment through a slave transmission path, and the decoding equipment not only receives the main image frame sequence of the target viewpoint through the main transmission path corresponding to the target viewpoint, but also receives the slave image frame sequence through the slave transmission path. After receiving the main image frame sequence and the slave image frame sequence of the target viewpoint, the decoding device acquires third video data required by the display device to display a second target viewpoint picture corresponding to the target viewpoint from the main image frame sequence and/or the slave image frame sequence of the target viewpoint, that is, the decoding device prepares the third video data in advance.

In this embodiment, the second video data required by the display device to display the first target viewpoint picture corresponding to the target viewpoint is captured from the image frames in the image frame sequence, the resolution ratio of the first target viewpoint picture is relatively low, and the picture quality presented to the user is relatively poor, so that after the first target viewpoint picture corresponding to the target viewpoint is displayed for a period of time, the display device may resume to display the higher-resolution viewpoint picture corresponding to the target viewpoint, and further if the viewpoint is not changed, the display device always displays the higher-resolution viewpoint picture.

The display device may restore and display the higher-resolution viewpoint picture corresponding to the target viewpoint refers to third video data required for displaying a second target viewpoint picture corresponding to the target viewpoint, where the third video data includes an image in a main image frame sequence received by a main transmission path corresponding to the target viewpoint at an encoding end and/or an image of a slave viewpoint corresponding to the target viewpoint captured from an image frame in the slave image frame sequence. Wherein an image of a slave viewpoint corresponding to a target viewpoint cut out from image frames in the image frame sequence in the third video data is different from second video data, the second video data is transmitted to the display device by the decoding device before the third video data, and the third video data is transmitted to the display device when the decoding device satisfies an instruction according to a picture switching condition transmitted by the display device.

The time when the display device resumes displaying the higher-resolution viewpoint picture corresponding to the target viewpoint is determined according to the picture switching condition, where the picture switching condition may be understood as a time when the display device has already displayed the second video data required for displaying the first target viewpoint picture corresponding to the target viewpoint and continues to display the third video data required for displaying the second target viewpoint picture corresponding to the target viewpoint at the next time, that is, a time when the display device reaches to display the second target viewpoint picture corresponding to the target viewpoint. For example, the time required for displaying the second video data of the first target viewpoint picture corresponding to the target viewpoint is 10 min 00 sec, and then the second video data is connected to the first target viewpoint picture corresponding to the target viewpoint at 10 min 01 sec, that is, the third video data required for displaying the second target viewpoint picture corresponding to the target viewpoint at 10 min 01 sec.

Further, when the display device judges that the picture switching condition is met, a picture switching condition meeting instruction is generated, the picture switching condition meeting instruction is sent to the decoding device, the decoding device acquires third video data according to the picture switching condition meeting instruction, the third video data is sent to the display device, the display device displays a second target viewpoint picture corresponding to a target viewpoint according to the third video data, namely the display device restores from a first target viewpoint picture corresponding to a target viewpoint displaying lower resolution to a second target viewpoint picture corresponding to a target viewpoint displaying higher resolution, and a user can see a viewpoint picture with higher resolution, namely can see a basketball game live video with higher resolution. And if the viewpoint is not being changed, the display device always displays the viewpoint picture of higher resolution.

Based on the above steps S210 to S250, the present embodiment describes the display device according to the following example, which specifically includes:

for example, when a user just opens a multi-view video playing application to watch a live video of a basketball event, a default current view is a P1 view, then the display device displays a view picture corresponding to a P1 view according to the first video data sent by the decoding device, at this time, the resolution of the view picture corresponding to a P1 view is relatively high, and the user watches the live video of the basketball event with relatively high resolution. If the user switches the P1 viewpoint to the P2 viewpoint, the display device displays the first target viewpoint picture corresponding to the P2 viewpoint according to the second video data sent by the decoding device, at this time, the resolution ratio of the first target viewpoint picture corresponding to the P2 viewpoint is lower, and the user sees the basketball game live video with lower resolution ratio. After the second video data is displayed, the display device continues to display a second target viewpoint picture corresponding to a P2 viewpoint according to third video data sent by the decoding device, at this time, the resolution of the second target viewpoint picture corresponding to a P2 viewpoint is higher, the user sees a basketball event live video with higher resolution, that is, the basketball event live video seen by the user is restored to high resolution from low resolution, and later, if the P2 viewpoint is not switched, the display device always displays the basketball event live video with higher resolution for the user.

According to the technical scheme, when watching live videos or recorded and broadcasted videos, the zero delay of video display during viewpoint switching is realized, the videos can be quickly restored from low resolution to high resolution after viewpoint switching, and the definition of watching the videos for a long time is guaranteed.

As shown in fig. 4, in the second embodiment of the present application, based on the first embodiment, step S210 further includes the following steps:

step S110: and acquiring the view identification of the slave view corresponding to the current view and the arrangement information of the slave image frame sequence.

Step S120: and determining the position information of the image of the slave viewpoint corresponding to the current viewpoint in the slave image frame according to the arrangement information and the viewpoint identification.

Step S130: and intercepting an image corresponding to the position information from a slave image frame in the slave image frame sequence.

Before the coding equipment generates the slave image frame sequence, the coding equipment firstly splices the images of all the viewpoints according to a preset arrangement mode to generate spliced images and arrangement information of the images of all the viewpoints in the spliced images, then codes the spliced image sequence according to shooting time and preset first resolution, and marks the coded spliced image sequence by adopting the arrangement information, so that the slave image frame sequence is obtained. The marking of the coded spliced image sequence by the arrangement information to obtain the slave image frame sequence comprises the following steps: the image frame sequence may be obtained by inserting the arrangement information into the sequence header of the encoded stitched image sequence, or the image frame sequence may be obtained by inserting the arrangement information into each stitched image in the encoded stitched image sequence. The arrangement information is inserted into a sequence header of the spliced image sequence, the decoding device can acquire the position information of each image from the view point in each image frame in the slave image frame sequence only by reading the arrangement information in the sequence header of the slave image frame sequence, the arrangement information is inserted into each spliced image, and the decoding device needs to read the arrangement information in each image from the image frame in the slave image frame sequence to acquire the position information of each image from the view point in each image frame. The arrangement information at least comprises viewpoint identification of each viewpoint and position information of the images of each viewpoint in the spliced images.

When the image of each viewpoint comprises a viewpoint picture, the format of the arrangement information is { x, y, w, h, view _ id }, x and y are coordinates of a pixel at the upper left corner of the viewpoint picture in the spliced image, w and h are the width and height of the viewpoint picture, and view _ id is a viewpoint identifier corresponding to the image or the viewpoint picture, wherein x, y, w and h represent position information. When the image of each viewpoint comprises a viewpoint picture and a viewpoint depth picture, the format of the arrangement information is { x, y, w, h, view _ id, is _ depth }, wherein x and y are coordinates of a viewpoint picture or a top left pixel of the viewpoint depth picture in the spliced image, w and h are the width and height of the viewpoint picture or the viewpoint depth picture, view _ id is a viewpoint identifier corresponding to the viewpoint picture or the viewpoint depth picture, and is _ depth marks whether the picture is the viewpoint depth picture. Wherein x, y, w, h represent position information. The arrangement of the images of all viewpoints in the spliced image and the arrangement of the images of all viewpoints in the slave image frame are the same, so that the spliced image and the slave image frame are the same, and the arrangement information of the spliced image and the slave image frame is also the same.

In this embodiment, when acquiring an image of a slave view corresponding to a current view, the decoding apparatus acquires a view identifier of the slave view corresponding to the current view and arrangement information of the slave image frame sequence. For example, if the current viewpoint is the P2 viewpoint, the viewpoint corresponding to the current viewpoint is identified as P2. After the viewpoint identification and the arrangement information corresponding to the current viewpoint are obtained, the viewpoint identification which is the same as the viewpoint identification corresponding to the current viewpoint in the arrangement information is obtained, and then the position information of the image of the slave viewpoint corresponding to the current viewpoint in the slave image frame is determined from the slave image frame according to the viewpoint identification obtained from the arrangement information. And further, according to the determined position information, an image corresponding to the position information is intercepted from the image frames in the image frame sequence. If the current viewpoint is the P2 viewpoint, the viewpoint identified as P2 from the image frame is the slave viewpoint corresponding to the P2 viewpoint.

For example, after the viewpoint identifier corresponding to the current viewpoint is P2, and the viewpoint identifier in the arrangement information is determined to be the viewpoint of P2 according to the viewpoint identifier corresponding to the current viewpoint, the image of the slave viewpoint corresponding to the current viewpoint determined according to the viewpoint identifier P2 in the arrangement information is captured from the image frame according to the arrangement information, and the image of the P2 viewpoint in the image frame is captured.

As shown in fig. 5, in the third embodiment of the present application, based on the first embodiment, the step S220 includes the following steps:

step S221: judging whether the current viewpoint is a virtual viewpoint, if so, executing step S223; if not, step S222 is performed.

Step S222: and sending the image in the main image frame sequence received by the main transmission path corresponding to the current viewpoint to the display device.

Step S223: and sending the image in the main image frame sequence received by the main transmission path corresponding to the current viewpoint and the image in the slave viewpoint corresponding to the current viewpoint and intercepted from the slave image frame in the slave image frame sequence received by the slave transmission path to the display device.

In this embodiment, if a user opens a multi-view video playing application on a display device and just starts to watch a live video of a basketball event, a current view (a default view) of the multi-view video playing application may be a virtual view, or may not be a virtual view, and is a real view instead of the virtual view. The real viewpoint is a viewpoint that exists really, and a viewpoint corresponding to each camera is a real viewpoint, for example, a P1 viewpoint-a P9 viewpoint corresponding to a P1 camera-a P9 camera are real viewpoints. The virtual viewpoint refers to a viewpoint that does not actually exist, i.e., a viewpoint between two adjacent real viewpoints, for example, a viewpoint between a P1 viewpoint and a P2 viewpoint is a virtual viewpoint. If the current viewpoint is a real viewpoint, the display device directly displays an image in the main image frame sequence of the current viewpoint, and if the current viewpoint is a virtual viewpoint, the current viewpoint picture of the current viewpoint needs to be synthesized according to an image of a slave viewpoint corresponding to the current viewpoint in the slave image frames, and then an image in the main image frame sequence of the real viewpoint closest to the current viewpoint is displayed, wherein the slave viewpoint corresponding to the current viewpoint comprises a slave viewpoint adjacent to the current viewpoint; the image of the slave viewpoint corresponding to the current viewpoint includes a viewpoint picture and a viewpoint depth map picture, and the current viewpoint picture of the current viewpoint needs to be synthesized according to the viewpoint picture and the viewpoint depth map picture in the image of the slave viewpoint corresponding to the current viewpoint.

Specifically, after receiving a display instruction sent by the display device, the decoding device obtains a current viewpoint according to the display instruction, and determines whether the current viewpoint is a virtual viewpoint, if the current viewpoint is a real viewpoint, an image in a main image frame sequence received by a main transmission path corresponding to the current viewpoint is sent to the display device as first video data, and the display device displays a current viewpoint picture of the current viewpoint according to the first video data. If the current viewpoint is a virtual viewpoint, sending an image in the main image frame sequence received by the main transmission path corresponding to the current viewpoint and an image of a slave viewpoint corresponding to the current viewpoint captured from an image frame in the slave image frame sequence received by the slave transmission path as first video data to a display device, synthesizing a current viewpoint picture corresponding to the current viewpoint by the display device according to a viewpoint picture and a viewpoint depth map picture in the image of the slave viewpoint corresponding to the current viewpoint, and then displaying the current viewpoint picture corresponding to the current viewpoint, namely displaying the image in the main image frame sequence of the current viewpoint. The displayed image in the main image frame sequence of the current viewpoint is an image in the main image frame sequence of the real viewpoint closest to the current viewpoint, and when the display device displays the current viewpoint picture corresponding to the current viewpoint according to the first video data, the current viewpoint picture seen by the user is of a higher resolution.

For example, as shown in fig. 6, 1-9 respectively represent a P1 viewpoint-a P9 viewpoint, a viewpoint a represents a default viewpoint, i.e., a current viewpoint, and the viewpoint a in fig. 6 falls on the P5 viewpoint, i.e., the viewpoint a is also a P5 viewpoint, which is also a real viewpoint. Then, the decoding apparatus transmits the images in the main image frame sequence received by the main transmission path corresponding to the P5 viewpoint to the display apparatus as the first video data, and the display apparatus displays the images in the main image frame sequence of the P5 viewpoint in accordance with the first video data. As shown in fig. 7, the viewpoint a represents a default viewpoint, i.e., the current viewpoint, and a in fig. 7 falls between the P5 viewpoint and the P6 viewpoint, i.e., the viewpoint a is a virtual viewpoint. Then, the decoding apparatus transmits the image in the main image frame sequence received by the main transmission path corresponding to the P5 viewpoint and the images of the P5 viewpoint and the P6 viewpoint taken from the image frames in the sub image frame sequence received from the transmission path to the display apparatus as the first video data, and the display apparatus synthesizes the current viewpoint picture of the current viewpoint by the viewpoint picture and the viewpoint depth map picture in the images of the P5 viewpoint and the P6 viewpoint and then displays the images in the main image frame sequence of the P5 viewpoint.

As shown in fig. 8, in the fourth embodiment of the present application, based on the first embodiment, step S240 includes the steps of:

step S241: judging whether the target viewpoint is a virtual viewpoint, if so, executing step S243; if not, step S242 is performed.

Step S242: transmitting, to the display device, an image of a same slave viewpoint as the target viewpoint, which is cut out from the slave image frames in the slave image frame sequence.

Step S243: transmitting, to the display device, an image of a slave viewpoint adjacent to the target viewpoint, which is cut out from the slave image frames in the slave image frame sequence.

In this embodiment, after the user selects a target viewpoint to be switched on the video playing interface of the display device, the target viewpoint selected by the user may be a virtual viewpoint, may not be a virtual viewpoint, and is either a virtual viewpoint or a real viewpoint. The display apparatus directly displays an image of a slave viewpoint identical to the target viewpoint in the slave image frame if the target viewpoint is a real viewpoint, and if the target viewpoint is a virtual viewpoint, it is necessary to synthesize a viewpoint picture of the target viewpoint from images of slave viewpoints adjacent to the target viewpoint in the slave image frame including a viewpoint picture and a viewpoint depth map picture, and then display an image of a slave viewpoint closest in distance to the target viewpoint, the viewpoint picture of the target viewpoint being necessary to be synthesized from the viewpoint picture and the viewpoint depth map picture in the image of the slave viewpoint adjacent to the target viewpoint.

Specifically, after receiving a viewpoint switching instruction sent by the display device, the decoding device acquires a target viewpoint according to the viewpoint switching instruction, and determines whether the target viewpoint is a virtual viewpoint, if the target viewpoint is a real viewpoint, an image of a slave viewpoint that is the same as the target viewpoint and is captured from an image frame in the image frame sequence is sent to the display device as second video data, and the display device displays the image of the slave viewpoint that is the same as the target viewpoint according to the second video data. If the target viewpoint is a virtual viewpoint, transmitting images of slave viewpoints adjacent to the target viewpoint, which are cut from the image frames in the image frame sequence, to the display device as second video data, and the display device synthesizes a first target viewpoint picture corresponding to the target viewpoint from a viewpoint picture and a viewpoint depth map picture in the images of the slave viewpoints adjacent to the target viewpoint and then displays an image of the slave viewpoint closest to the target viewpoint according to the second video data. It is noted that the display apparatus displays that the resolution of the first target view picture corresponding to the target view is lower than the resolution of the current view picture corresponding to the current view, i.e., the resolution of the view picture seen by the user is lower.

For example, as shown in fig. 9, the viewpoint B represents a target viewpoint, the viewpoint B in fig. 9 falls on the P6 viewpoint, and the viewpoint B is the P6 viewpoint, which is also a real viewpoint. Then, the decoding apparatus transmits an image of a P6 viewpoint cut out from the image frames in the image frame sequence as second video data to the display apparatus, and the display apparatus displays an image of a P6 viewpoint cut out from the image frames in the image frame sequence in accordance with the second video data. As shown in fig. 10, the viewpoint C represents a target viewpoint, and the viewpoint C in fig. 10 falls between the P6 viewpoint and the P7 viewpoint, i.e., the viewpoint C is a virtual viewpoint. Then, the decoding apparatus transmits images of a P6 viewpoint and a P7 viewpoint cut from image frames in the image frame sequence to the display apparatus as second video data, the display apparatus synthesizes a first target viewpoint picture corresponding to the viewpoint C from viewpoint pictures and viewpoint depth map pictures in the images of the P6 viewpoint and the P7 viewpoint, where the P7 viewpoint is closest to the viewpoint C, and the display apparatus displays an image of the P7 viewpoint as the second video data.

Further, according to the fourth embodiment, after the step of transmitting, to the display device, an image from a viewpoint identical to the target viewpoint and extracted from image frames in the sequence of image frames when the target viewpoint is not the virtual viewpoint, and when receiving that a screen switching condition transmitted by the display device satisfies an instruction, the step of transmitting, to the display device, third video data required for the display device to display a second target viewpoint screen corresponding to the target viewpoint includes:

Specifically, if the target viewpoint is a real viewpoint, the decoding device sends an image in the main image frame sequence received by the main transmission path corresponding to the target viewpoint to the display device as third video data, and the display device displays a second target viewpoint picture corresponding to the target viewpoint according to the third video data. When the display device displays the second target viewpoint picture corresponding to the target viewpoint, the resolution of the viewpoint picture seen by the user is higher, that is, the display device reverts from the first target viewpoint picture corresponding to the target viewpoint displaying the lower resolution to the second target viewpoint picture corresponding to the target viewpoint displaying the higher resolution, and the user can see the viewpoint picture with the higher resolution.

For example, as shown in fig. 9, the viewpoint B represents a target viewpoint, the viewpoint B in fig. 9 falls on the P6 viewpoint, and the viewpoint B is the P6 viewpoint, which is also a real viewpoint. Then, the decoding apparatus transmits the images in the main image frame sequence corresponding to the P6 viewpoint to the display apparatus as the third video data, and the display apparatus displays the second target viewpoint picture corresponding to the P6 viewpoint, that is, displays the images in the main image frame sequence corresponding to the P6 viewpoint according to the third video data, at which time the resolution of the second target viewpoint picture corresponding to the P6 viewpoint displayed by the display apparatus is higher.

Further, according to the fourth embodiment, after the step of transmitting, to the display device, an image of a slave viewpoint adjacent to the target viewpoint and captured from an image frame in the sequence of slave image frames when the target viewpoint is a virtual viewpoint, and when receiving that a screen switching condition transmitted by the display device satisfies an instruction, the step of transmitting, to the display device, third video data required for the display device to display a second target viewpoint screen corresponding to the target viewpoint further includes:

Specifically, if the target viewpoint is a virtual viewpoint, the decoding apparatus transmits an image in the main image frame sequence received by the main transmission path corresponding to the target viewpoint and an image of a slave viewpoint adjacent to the target viewpoint, which is cut from the slave image frames in the image frame sequence, as third video data to the display apparatus, and the display apparatus displays a second target viewpoint picture corresponding to the target viewpoint according to the third video data. When the display device displays the second target viewpoint picture corresponding to the target viewpoint, the resolution of the viewpoint picture seen by the user is higher, that is, the display device reverts from the first target viewpoint picture corresponding to the target viewpoint displaying the lower resolution to the second target viewpoint picture corresponding to the target viewpoint displaying the higher resolution, and the user can see the viewpoint picture with the higher resolution.

For example, as shown in FIG. 10, the viewpoint C represents a target viewpoint, the viewpoint C in FIG. 10 falls between the P6 viewpoint and the P7 viewpoint, i.e., the viewpoint C is a virtual viewpoint and the P7 viewpoint is closest to the viewpoint C, the decoding apparatus transmits images in the main image frame sequence received by the main transmission path corresponding to the P7 viewpoint and images of the P6 viewpoint and the P7 viewpoint cut from the image frames in the image frame sequence as third video data to the display apparatus, the display apparatus synthesizes a second target viewpoint picture corresponding to the viewpoint C from the viewpoint pictures and the viewpoint depth map pictures in the images of the P6 viewpoint and the P7 viewpoint cut from the image frames, and displays the second target viewpoint picture corresponding to the viewpoint C according to the second video data, i.e., images in the main image frame sequence of the P7 viewpoint are displayed, the resolution of the second target viewpoint picture corresponding to the viewpoint C displayed by the device at this time is higher.

As shown in fig. 11, in a fifth embodiment of the present application, a multi-view video data processing method of the present application, applied to an encoding apparatus, includes the steps of:

step S310: images of respective viewpoints photographed by respective cameras are acquired.

In this embodiment, when video shooting is performed, a plurality of cameras need to be arranged in a video shooting site in advance, each camera is responsible for shooting an image at one angle, and an image at one angle shot by each camera is an image at one viewpoint, that is, images corresponding to different viewpoints are shot by different cameras. The shot video can be live video, such as basketball game live broadcast, football game live broadcast and the like, and can also be recorded and broadcast video, such as badminton recorded and broadcast video and the like.

The present embodiment is described by taking a live video as an example, for example, a live video of a basketball event. When shooting a live video of a basketball event, a plurality of cameras need to be arranged around the event hosting site, each camera is responsible for shooting an angle image of the event, and the angle image shot by each camera is an image of a viewpoint. As shown in fig. 3, 1-9 respectively represent a P1 viewpoint-a P9 viewpoint, and each viewpoint is provided with a corresponding camera, namely a P1 camera-a P9 camera. The P1 camera-P9 camera is 9 cameras for shooting the basketball game, the P1 camera-P9 camera is responsible for shooting images of one viewpoint respectively, wherein the images shot by the P1 camera are images of P1 viewpoint, the images shot by the P2 camera are images of P2 viewpoint, and the like, and the images shot by the P9 camera are images of P9 viewpoint. The image corresponding to each viewpoint includes at least one of a viewpoint picture and a viewpoint depth map picture, and the viewpoint depth map picture is also called a range image (range image) and refers to an image in which a distance (depth) from an image capture device (such as a camera) to each point in a scene is used as a pixel value. Specifically, the encoding apparatus acquires images of respective viewpoints photographed by respective cameras, that is, the images of respective viewpoints acquired by the encoding apparatus include at least one of a viewpoint picture and a viewpoint depth map picture.

Step S320: and splicing the images of the viewpoints, and encoding the spliced images according to the shooting time and the first resolution to generate a slave image frame sequence.

Specifically, step S320 includes:

splicing the images of the viewpoints by adopting a preset arrangement mode to generate spliced images and arrangement information of the images of the viewpoints in the spliced images;

The arrangement information at least comprises viewpoint identification of each viewpoint and position information of the image of each viewpoint in the spliced image;

in the present embodiment, the encoding apparatus encodes an image captured at a P1 viewpoint-P9 viewpoint after acquiring an image captured at a P1 viewpoint-P9 viewpoint captured by a P1 camera-P9 camera. The encoding device firstly splices the images of each viewpoint together by adopting a preset arrangement mode, and generates arrangement information of the spliced images and the images of each viewpoint in the spliced images, wherein the spliced images are spliced by images shot at the same time from a P1 viewpoint to a P9 viewpoint, and the spliced images are understood to be one large image and 9 small images.

The arrangement information at least includes viewpoint identifiers of the viewpoints and position information of the images of the viewpoints in the stitched images, the preset arrangement mode is as shown in fig. 12 and 13, and if the images of the viewpoints in the stitched images include viewpoint pictures, the arrangement mode of the images of the viewpoints in the stitched images is the mode corresponding to fig. 12; if the images of the viewpoints in the stitched image include a viewpoint picture and a viewpoint depth map picture, the arrangement manner of the images of the viewpoints in the stitched image is the manner corresponding to fig. 13.

When the image of each viewpoint comprises a viewpoint picture, the format of the arrangement information is { x, y, w, h, view _ id }, x and y are coordinates of a pixel at the upper left corner of the viewpoint picture in the spliced image, w and h are the width and height of the viewpoint picture, and view _ id is a viewpoint identifier corresponding to the image or the viewpoint picture, wherein x, y, w and h represent position information. When the image of each viewpoint comprises a viewpoint picture and a viewpoint depth picture, the format of the arrangement information is { x, y, w, h, view _ id, is _ depth }, wherein x and y are coordinates of a viewpoint picture or a top left pixel of the viewpoint depth picture in the spliced image, w and h are the width and height of the viewpoint picture or the viewpoint depth picture, view _ id is a viewpoint identifier corresponding to the viewpoint picture or the viewpoint depth picture, and is _ depth marks whether the picture is the viewpoint depth picture. Wherein x, y, w, h represent position information. The viewpoint identifier indicates which viewpoint image the image of each viewpoint in the stitched image is, for example, if the viewpoint identifier of one image is P9, the image is the image corresponding to the P9 viewpoint, and the position information indicates at which position in the stitched image the image of each viewpoint is specifically arranged.

And after the coding equipment generates the arrangement information of the spliced images and the images of all viewpoints in the spliced images, sequencing each spliced image according to the shooting time to generate a spliced image sequence. If n spliced images are generated according to the sequence of the shooting time, namely the image 1, the image 2, the image 3, the image n and the image 1-image n are sequenced once, and then the spliced image sequence is obtained. And then, coding the spliced image sequence according to a preset first resolution, and marking the coded spliced image sequence by adopting the configuration information, thereby obtaining the slave image frame sequence.

Further, marking the encoded stitched image sequence by using the arrangement information to obtain a slave image frame sequence includes: the image frame sequence may be obtained by inserting the arrangement information into the sequence header of the encoded stitched image sequence, or the image frame sequence may be obtained by inserting the arrangement information into each stitched image in the encoded stitched image sequence. The arrangement information is inserted into a sequence header of the spliced image sequence, the decoding device can acquire the position information of each image from the view point in each image frame in the slave image frame sequence only by reading the arrangement information in the sequence header of the slave image frame sequence, the arrangement information is inserted into each spliced image, and the decoding device needs to read the arrangement information in each image from the image frame in the slave image frame sequence to acquire the position information of each image from the view point in each image frame. The arrangement information is adopted to mark the coded spliced image sequence, so that the decoding equipment can conveniently intercept the required image in the spliced image according to the arrangement information, and the image interception efficiency can be improved.

Step S330: and encoding the image of each viewpoint according to the shooting time and the second resolution to generate a main image frame sequence of each viewpoint.

In the present embodiment, the encoding apparatus generates the sequence of slave image frames and simultaneously separately encodes the images of the respective viewpoints captured from the P1 viewpoint to the P9 viewpoint at the capturing time and at the preset second resolution, generates the sequence of main image frames for each viewpoint, that is, separately encodes the images captured at the P1 viewpoint at the capturing time and at the second resolution, generates the sequence of main image frames for the P1 viewpoint, separately encodes the images captured at the P2 viewpoint at the capturing time and at the second resolution, generates the sequence of main image frames for the P2 viewpoint, and so on, separately encodes the images captured at the P9 viewpoint at the capturing time and at the second resolution, generates the sequence of main image frames for the P9 viewpoint, thereby generating 9 sequences of main image frames.

Specifically, step S330 includes:

sequencing the images of each viewpoint according to the shooting time to generate an image sequence of each viewpoint;

and coding the image sequence of each viewpoint according to a second resolution to obtain the main image frame sequence.

Suppose that, according to the sequence of the shooting time, the encoding device acquires n images corresponding to P1 viewpoint-P9 viewpoint, for example, the image corresponding to P1 viewpoint, sorts the n images corresponding to P1 viewpoint according to the shooting time, generates an image sequence of P1 viewpoint as image 1, image 2, image 3, image n, and then encodes the image sequence of P1 viewpoint according to a preset second resolution, and obtains a main image frame sequence corresponding to P1 viewpoint. The coding method of the n images corresponding to the P2 viewpoint-P9 viewpoint is the same as the coding method of the n images corresponding to the P1 viewpoint, and is not described herein again.

It is to be noted that, in the present embodiment, the first resolution represents a total resolution of the stitched image, and a resolution of the image of each viewpoint in the stitched image is smaller than the second resolution, that is, a resolution of the image of each viewpoint in the stitched image is smaller than a resolution of the image in the main image frame sequence of each viewpoint.

Further, the first picture in the picture sequence corresponding to each view is encoded as an I-frame, also called intra picture, which is usually the first frame of each GOP (a video compression technique used by MPEG), and is moderately compressed and used as a reference point for random access, and can be used as a picture. For multi-view video, since view switching can only be performed at an I frame, the present embodiment encodes the first picture in the picture sequence corresponding to each view as an I frame when encoding the picture of each view. Because the first image in the main image frame sequence of each viewpoint is an I frame, random switching along with the viewpoints can be realized, so that the main image frame sequence corresponding to a target viewpoint with high resolution is switched and displayed, and after the viewpoints are switched on a display device side by a user, the display device can quickly restore and display the viewpoint picture with high resolution according to the auxiliary image frame sequence provided by the encoding device and the main image frame sequence corresponding to each viewpoint, so as to keep the definition of the video watched by the user for a long time.

Step S340: and when a viewpoint selecting instruction sent by the decoding equipment is received, acquiring the viewpoint selected by the decoding equipment according to the viewpoint selecting instruction.

Step S350 transmits the main image frame sequence of the viewpoint selected by the decoding apparatus to the decoding apparatus from the main transmission path of the viewpoint, and simultaneously transmits the slave image frame sequence to the decoding apparatus from the slave transmission path.

In this embodiment, after the encoding device generates the slave image frame sequence and the main image frame sequence of each view, the encoding device acquires the view selected by the decoding device according to the received view selection instruction sent by the decoding device, and then transmits the main image frame sequence of the view selected by the decoding device to the decoding device through the main transmission path of the view, and simultaneously transmits the slave image frame sequence to the decoding device through the slave transmission path.

The encoding apparatus transmits the generated main image frame sequences for the respective viewpoints to the decoding apparatus through independent transmission paths, respectively, and also transmits the generated sub-image frame sequences to the display apparatus through an independent transmission path. For ease of understanding, the independent transmission path transmitting the main image frame sequence of each viewpoint is referred to as a main transmission path, and the independent transmission path transmitting the slave image frame sequence is referred to as a slave transmission path. If there are 9 viewpoints, that is, there are 9 main transmission paths and 1 slave transmission path, each main transmission path is responsible for transmitting the main image frame sequence of the corresponding viewpoint, for example, the main transmission path corresponding to the P1 viewpoint transmits the main image frame sequence of the P1 viewpoint, and the slave transmission path is responsible for transmitting the slave image frame sequence.

According to the technical scheme, the slave image frame sequence and the main image frame sequence corresponding to each viewpoint required by viewpoint image display are provided for the display device, so that after a user switches viewpoints on the display device, the display device can quickly restore and display a high-resolution viewpoint image according to the slave image frame sequence and the main image frame sequence corresponding to each viewpoint provided by the encoding device, and the definition of the video watched by the user for a long time is maintained.

Based on the above steps S310 to S350, the present embodiment describes the encoding apparatus according to the following example, which specifically includes:

for example, after the images of P1 viewpoint-P9 viewpoint captured by 9 cameras are encoded, the main image frame sequences corresponding to P1 viewpoint-P9 viewpoint are respectively the main image frame sequence F1, the main image frame sequence F2, ·.. and the main image frame sequence F9, the sub image frame sequence F0, each spliced image in the sub image frame sequence F0 includes images of P1 viewpoint-P9 viewpoint, which are respectively image F1-image F9, the main transmission path of P1 viewpoint is path 1, the main transmission path of P2 viewpoint is path 2,..... and the main transmission path of P9 viewpoint is path 9, and the sub transmission path is path 0. If the view selected by the decoding apparatus is the P1 view, the encoding apparatus transmits the main image frame sequence F1 to the decoding apparatus through path 1; if the view selected by the decoding apparatus is the P5 view, the encoding apparatus transmits the main image frame sequence F5 to the decoding apparatus through path 5; if the view selected by the decoding apparatus is a virtual view which is between the P3 view and the P4 view and which is closest to the P4 view, the encoding apparatus transmits the main image frame sequence F4 to the decoding apparatus through the path 4, and also transmits the sub image frame sequence F0 to the decoding apparatus through the path 0.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A multi-view video data processing method applied to a decoding apparatus, the multi-view video data processing method comprising:

2. The method of claim 1, wherein before the step of sending the first video data required by the display device to display the current view picture corresponding to the current view to the display device, further comprising:

3. The method of claim 1, wherein the transmitting first video data required for the display device to display a current view picture corresponding to the current view to the display device comprises:

judging whether the current viewpoint is a virtual viewpoint;

4. The method of claim 3, wherein the step of determining whether the current viewpoint is a virtual viewpoint further comprises:

5. The method of claim 1, wherein the transmitting second video data required for the display device to display the first target view picture corresponding to the target view to the display device comprises:

judging whether the target viewpoint is a virtual viewpoint or not;

6. The method as claimed in claim 5, wherein the step of transmitting third video data required for the display device to display a second target viewpoint picture corresponding to the target viewpoint when the picture switching condition transmitted from the display device is received after the step of transmitting the same from-viewpoint image as the target viewpoint, which is cut from image frames in the sequence of image frames, to the display device when the target viewpoint is not the virtual viewpoint, when the picture switching condition transmitted from the display device is satisfied, the step of transmitting the third video data to the display device comprises:

7. The method of claim 5, wherein the step of determining whether the target viewpoint is a virtual viewpoint further comprises:

8. The method as claimed in claim 7, wherein the step of transmitting third video data required for the display device to display a second target viewpoint picture corresponding to the target viewpoint when the picture switching condition transmitted from the display device is received after the step of transmitting the picture of the slave viewpoint adjacent to the target viewpoint, which is cut from the image frames in the sequence of the slave image frames, to the display device when the picture switching condition transmitted from the display device satisfies the instruction when the target viewpoint is the virtual viewpoint further comprises:

9. A multi-view video data processing method applied to an encoding apparatus, the multi-view video data processing method comprising:

10. The method of claim 9, wherein the stitching the images for each of the viewpoints and encoding the stitched images at a capture time and a first resolution to generate the sequence of slave image frames comprises:

11. The method of claim 10, wherein said step of labeling said encoded stitched image sequence with said arrangement information to obtain said sequence of slave image frames comprises:

12. The method of claim 10, wherein said step of labeling said encoded stitched image sequence with said arrangement information to obtain said sequence of slave image frames further comprises:

13. A decoding device, characterized in that the decoding device comprises: memory, processor and multi-view video data processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the multi-view video data processing method according to any of claims 1-8.

14. An encoding apparatus characterized by comprising: memory, processor and multi-view video data processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the multi-view video data processing method according to any of claims 9-12.

15. A storage medium having stored thereon a multi-view video data processing program which, when executed by a processor, implements the steps of the multi-view video data processing method of any one of claims 1 to 12.