CN111711859A

CN111711859A - Video image processing method, system and terminal equipment

Info

Publication number: CN111711859A
Application number: CN202010600416.2A
Authority: CN
Inventors: 张涛; 何广; 庹虎; 孙鹏飞
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-09-25

Abstract

The embodiment of the invention relates to a video image processing method, a system and terminal equipment, wherein the method comprises the following steps: the method comprises the steps that a first server obtains a video code stream to be played requested by a terminal device; the second server decodes the video code stream to obtain a video; blocking a frame image in a video, and acquiring a plurality of sub-pictures corresponding to the frame image and metadata corresponding to each sub-picture; sending the metadata to the terminal device; and synchronously transcoding all the sub-pictures and storing. By the method, the problem that the high-bit-rate online video cannot be transmitted due to the limitation of the network transmission condition of the user can be solved. Moreover, only the frame image picture required to be displayed by the terminal equipment at the next moment is transmitted according to the request of the terminal equipment each time, the influences of the power consumption, the heat dissipation, the computing capacity and the like of the terminal equipment are not considered at all, and the problem that most of the terminal equipment cannot decode videos with 8K and above 8K is avoided. The user can also watch the ultra-high-definition video smoothly through the terminal equipment.

Description

Video image processing method, system and terminal equipment

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a video image processing method, a video image processing system and terminal equipment.

Background

Video (virtual reality video) is video in which a picture taken in a panorama 360 degrees is mapped to one plane. The resolution and the code rate are both high, for example, the resolution can be 4K, 8K, 12K or more, and the code rate can be in the range of 40Mbps to 50Mbps when the resolution is 4K; when the resolution is 8K, the code rate may be in the range of 120Mbps to 150 Mbps. And the online video with high code rate cannot be transmitted due to the current user network transmission conditions. Meanwhile, most mobile phones cannot decode 8K and above 8K videos due to the influences of power consumption, heat dissipation, computing capacity and the like of the mobile terminal. Therefore, due to the defects in the current network transmission conditions and the functions of the terminal devices, the user cannot smoothly watch the ultra-high-definition video by using the terminal devices such as the mobile phone.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and a system for processing a video image, and a terminal device, in order to solve the technical problem in the prior art that a user cannot smoothly watch an ultra-high-definition video using a mobile phone or other terminal devices due to the limitations of current network transmission conditions and the defects of the terminal device in terms of functions.

In a first aspect, an embodiment of the present invention provides a video image processing method, where the method is applied to a video image processing system, where the system includes a master server and multiple slave servers, and the method includes:

the main server receives video streaming media data, and unframes the video streaming media data to obtain a multi-frame image;

according to the time sequence information of the frame images, adopting a preset picture segmentation rule to perform block processing on each frame image to obtain a plurality of sub-pictures corresponding to each frame image and metadata corresponding to the sub-pictures;

and respectively sending the plurality of sub-pictures corresponding to each frame image to a plurality of corresponding slave servers so that each slave server synchronously encodes the corresponding sub-picture, and storing one or more encoded sub-pictures in the form of slice data by each slave server.

In one possible embodiment, Supplementary Enhancement Information (SEI) is injected into the video streaming media data at preset time intervals;

each slave server stores the encoded one or more sub-pictures in the form of slice data, including:

in the synchronous encoding process, if the current frame is detected to carry SEI, a slice of data is generated after the current frame is encoded and is stored in a preset address of a current slave server, and then the next frame is used as a key frame for encoding and is used as a starting frame of the next slice of data.

In a second aspect, an embodiment of the present invention provides a video image processing method, where the method is applied to a terminal device, and the method includes:

sending a video playing request to a main server in the video image processing system, wherein the video playing request carries video information to be played;

receiving metadata of a sub-picture corresponding to the video information and a download address of the sub-picture corresponding to the video information, which are fed back by the main server;

determining a target sub-picture from a plurality of sub-pictures of the live video based on the watching state and the metadata of the terminal equipment, and determining a target downloading address corresponding to the target sub-picture from the downloading address;

acquiring slice data corresponding to the target sub-frame from the corresponding slave server based on the target download address;

and decoding the slice data, and displaying the corresponding target sub-picture in the live video on the terminal equipment based on the decoded slice data.

In one possible embodiment, determining a target sprite from a plurality of sprites of a live video based on a viewing status of a terminal device and metadata includes:

detecting the terminal equipment through a gyroscope to obtain the watching state of the terminal equipment;

determining viewing angle information of a live video watched by a user based on the watching state;

a target sub-frame is determined from a plurality of sub-frames of the live video based on the view information and the metadata.

In one possible embodiment, the method further comprises:

detecting whether the watching state of the terminal equipment changes or not through a gyroscope;

if the watching state of the terminal equipment changes, determining the space movement data of the terminal equipment;

and adjusting the target split picture displayed by the terminal equipment based on the spatial movement data.

In a third aspect, an embodiment of the present invention provides a video image processing system, including: a master server and a plurality of slave servers;

the main service is used for receiving the video streaming media data, and unframing the video streaming media data to obtain a multi-frame image;

respectively sending a plurality of sub-pictures corresponding to each frame image to a plurality of corresponding slave servers;

a slave server for synchronously encoding the corresponding sprites; and storing the encoded one or more sub-pictures in the form of slice data.

In one possible embodiment, the main service is further configured to inject auxiliary enhancement information SEI in the video streaming data every interval preset time;

the slave server is specifically configured to, in the synchronous encoding process, if it is detected that the current frame carries the SEI, generate slice data after the current frame is encoded, store the slice data in a preset address of the current slave server, and encode the next frame as a key frame and serve as a start frame of the next slice data.

In a fourth aspect, an embodiment of the present invention provides a terminal device, including: a processor, a transceiver, a decoder, and a display panel;

the video playing system comprises a transceiver and a video processing system, wherein the transceiver is used for sending a video playing request to a main server in the video image processing system, and the video playing request carries video information to be played; receiving metadata of a sub-picture corresponding to the video information and a download address of the sub-picture corresponding to the video information, which are fed back by the main server;

the processor is used for determining a target sub-picture from a plurality of sub-pictures of the live video based on the watching state and the metadata of the terminal equipment and determining a target downloading address corresponding to the target sub-picture from the downloading address; acquiring slice data corresponding to the target sub-frame from the corresponding slave server based on the target download address;

a decoder for decoding slice data;

and the display panel is used for displaying the corresponding target sub-picture in the live video on the terminal equipment based on the decoded slice data.

In a possible implementation manner, the processor is specifically configured to detect the terminal device through a gyroscope to obtain a viewing state of the terminal device; determining viewing angle information of a live video watched by a user based on the watching state; a target sub-frame is determined from a plurality of sub-frames of the live video based on the view information and the metadata.

In one possible embodiment, the processor is further configured to detect whether a viewing state of the terminal device changes through a gyroscope; if the watching state of the terminal equipment changes, determining the space movement data of the terminal equipment; and adjusting the target split picture displayed by the terminal equipment based on the spatial movement data.

The video image processing method provided by the embodiment of the invention divides the frame image in the received video streaming media data into a plurality of sub-frames, and then stores the sub-frames in a slice form. When the subsequent terminal equipment sends a video playing request, only partial sub-pictures which can be displayed on a display interface of the terminal equipment are transmitted, instead of transmitting all plane video data mapped by the whole spherical video to the terminal equipment. Therefore, the problem that the high-bit-rate online video cannot be transmitted due to the limitation of the network transmission condition of the user is avoided.

In addition, the ultra-high-definition video transcoding, which can be performed only by high-end computing power in the past, is performed by a plurality of machines (slave servers) distributed with common computing power. The method and the device have the advantages that the transmission code rate and the decoding threshold of the terminal device are reduced, the influences on the aspects of power consumption, heat dissipation, computing capacity and the like of the terminal device are reduced, the problem that most of terminal devices cannot decode videos with the frequency of 8K or above is solved, and ultrahigh-definition live videos can be smoothly watched on common networks and common mobile phones.

Drawings

FIG. 1 is a diagram illustrating a hardware architecture corresponding to a video image processing method according to the present invention;

fig. 2 is a schematic flow chart of a video image processing method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a frame of image subjected to blocking processing according to a preset picture segmentation rule according to the present invention;

FIG. 4 is a flowchart illustrating another video image processing method according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a video image processing signaling flow interaction according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video image processing apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of another video image processing apparatus according to an embodiment of the present invention;

FIG. 8 is a block diagram of a video image processing system according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the convenience of understanding of the embodiments of the present invention, the following description will be further explained with reference to specific embodiments, which are not to be construed as limiting the embodiments of the present invention.

Before the video image processing method provided by the embodiment of the present invention is introduced, an application scene corresponding to the video image processing method is introduced first. The application scene corresponding to this embodiment may be, for example, a scene of a spherical live video, where the spherical video is a video in which a picture shot in a panoramic 360 degrees is mapped to one plane.

Specifically, referring to fig. 1, fig. 1 is a schematic diagram of a hardware architecture corresponding to a video image processing method provided by the present invention. In the application scenario, the execution subject includes the video image processing system 100 and the terminal device 200. The video image processing system 100 obtains the number of video streaming media in real time, and performs deframing on the video streaming media data to obtain a multi-frame image. And according to the time sequence information of the frame image, performing blocking processing on the frame image of each frame by adopting a preset frame segmentation rule to obtain a frame corresponding to the frame image and metadata corresponding to the frame. The plurality of divided pictures are synchronously encoded and stored in slice form. The video image processing system 100 includes a master server and a plurality of slave servers. The terminal device 200 may request the partial split screen to be displayed from the video image processing system 100 according to the situation of the display interface of itself. By the method, the user can watch the ultrahigh video smoothly by using the terminal equipment.

Specific embodiments, see below:

first, a method executed by a video image processing system is introduced, and specifically, referring to fig. 2, fig. 2 is a schematic flow chart of a video image processing method according to an embodiment of the present invention, where the method is applied to a video image processing system, and the system includes: a master server and a plurality of slave servers.

Step 210, the main server receives the video streaming media data, and deframes the video streaming media data to obtain a multi-frame image.

And step 220, the main server performs blocking processing on each frame image by adopting a preset picture segmentation rule according to the time sequence information of the frame image to obtain a plurality of sub-pictures corresponding to each frame image and metadata corresponding to the sub-pictures.

Specifically, the video streaming media data is composed of a plurality of frame images according to a time sequence order. When processing the frame image, a preset frame segmentation rule can be adopted to perform block processing on each frame image. Such as shown in fig. 3. Fig. 3 shows that a frame of reference frame image is divided into 64 blocks according to an 8 × 8 mode, so as to obtain 64 blocks of sub-pictures, and at the same time, the host server should obtain metadata corresponding to each block of sub-pictures in the frame of reference frame image. In specific execution, since each frame image has the same size, after segmentation is performed according to the same segmentation blur, the metadata corresponding to the sub-picture of each frame image in the video streaming media is actually the same as the metadata corresponding to the sub-picture at the same position in the reference frame image shown in fig. 3.

Optionally, the metadata may include spatial position data corresponding to each frame, coordinate position data of each pixel in the whole frame image, and a scale factor. It should be noted that the metadata is mainly information describing a property of data (property) and is used to support functions such as indicating a storage location, history data, resource search, file record, and the like. In this document, metadata is understood to include: the coordinate position relationship of different sprites in the whole frame image, the coordinate position data of each pixel in the frame image in the whole frame image, and the scale factor are described.

Spatial location data in the metadata describes where the ground object is located. Such a position may be defined according to a geodetic reference system, such as geodetic longitude and latitude coordinates, or may be defined as a relative positional relationship between features, such as spatial adjacency, inclusion, and the like. In this embodiment, the spatial position data mainly refers to a coordinate position relationship of different sub-frames in the whole frame image. Specifically, a coordinate system is constructed by using a certain point in an entire frame of image as a reference point, and then coordinate positions of different sub-frames in the coordinate system are determined respectively. The role of these parameters included in the metadata is described in detail in the method steps performed on the side of the terminal device, and will not be described in greater detail here.

In step 230, the master server sends the plurality of sub-frames corresponding to each frame image to the corresponding plurality of slave servers.

Specifically, the purpose of the master server is to transmit a plurality of sub-pictures corresponding to each frame image to a plurality of corresponding slave servers respectively, so that the plurality of slave servers can synchronously encode the corresponding sub-pictures conveniently, and each slave server stores one or more encoded sub-pictures in the form of slice data.

Specifically, the multiple slave servers synchronously encode the corresponding sub-pictures, that is, the ultrahigh-definition video which can only be processed by high-end computing power in the past is transcoded in real time and is processed by distributing multiple machines (slave servers) with common computing power, so that the problem that the computing power of a single server is insufficient is solved. Moreover, the operation can also reduce the transmission code rate and the decoding threshold of the terminal equipment, reduce the influence on the aspects of power consumption, heat dissipation, computing capacity and the like of the terminal equipment, and avoid the problem that most of terminal equipment cannot decode 8K videos and videos above 8K videos. Therefore, ultrahigh-definition live video can be smoothly watched on a common network and a common mobile phone. The specific encoding process is the same as the prior art and will not be described in detail herein.

Optionally, when the main server receives the video streaming media data and performs a frame decoding process on the video streaming media data, the method may further include: and injecting SEI into the video streaming media data at preset intervals. The preset time may be a few seconds, for example, in one specific example, the preset time is 1 s. That is, one SEI is injected in the video streaming media volume data every interval 1 s. In a preferred example, stored in the SEI may be a system timestamp. The following terminal device side may combine the sub-frames obtained after slice decoding according to the timestamp to form a target frame image, and the target frame image is displayed on the display interface of the terminal device.

Further optionally, each slave server stores the encoded one or more split pictures in the form of slice data, including:

in the synchronous encoding process, if the current frame is detected to carry SEI, a slice of data is generated after the current frame is encoded and is stored in a preset address of a current slave server, and then the next frame is used as a key frame for encoding and is used as a starting frame of the next slice of data. Preferably, the key frame is generally an I-frame. The I frame is also called an intra-frame coded frame, and is an independent frame with all information, and can be independently decoded without referring to other images, and can be simply understood as a static picture. The first frame in the video sequence is always an I frame, and SEI is injected into the I frame, which has the purpose described above and is not described herein again.

And storing the slice data acquired after the sub-pictures are coded to a preset address from the server. The preset address is an address which is manually configured in advance. In addition, the specific address information stored by different slave servers for the slices is backed up on the side of the main server, so that the main server can inform the corresponding sub-picture downloading address of the terminal equipment when the side of the subsequent terminal equipment requests an estrus request.

Corresponding to the above embodiment, the embodiment of the present invention further provides another video image processing method, which is applied to a terminal device. Specifically referring to fig. 4, fig. 4 is a schematic flow chart of another video image processing method according to an embodiment of the present invention, including:

step 410, sending a video playing request to a main server in the video image processing system.

Specifically, the video playing request carries video information to be played.

And step 420, receiving metadata of the split pictures corresponding to the video information and fed back by the main server, and a download address of the split pictures corresponding to the video information.

In particular, as can be seen from the above, the master service may know the specific location information of the different slave servers storing the slices. Therefore, when the main service in the image processing system receives the video request sent by the terminal device, the metadata of the sub-picture corresponding to the video information and the download address of the sub-picture corresponding to the video information can be fed back to the terminal device together. Therefore, the terminal equipment can determine the target sub-picture to be played according to the metadata of the sub-picture and match the download address corresponding to the target sub-picture from the download place of the sub-picture corresponding to the video information.

Of course, determining the target sub-frame to be played needs to consider the current viewing state of the terminal device, specifically referring to step 430.

And 430, determining a target sub-frame from a plurality of sub-frames of the live video based on the viewing state and the metadata of the terminal equipment, and determining a target download address corresponding to the target sub-frame from the download addresses.

Optionally, when the method is specifically executed, the target sub-frame is determined from the multiple sub-frames of the live video based on the viewing state and the metadata of the terminal device, and the method may specifically be implemented as follows:

Specifically, the viewing state of the terminal device may be detected by a gyroscope. For example, it is detected that the terminal device is in a landscape display mode, or it is detected that the terminal device is in a portrait display mode. In the display in different directions, the areas of the corresponding display interfaces are different, and the range of the displayed picture is different naturally. That is, viewing angle information at which the user views the live video is determined based on the viewing state.

And finally, determining a target sub-frame from a plurality of sub-frames of the live video according to the view angle information and the metadata.

In a specific example, for example, the currently viewed video status of the terminal device is a portrait viewing status, and the area circled by the white line in fig. 3 is the viewing angle information currently seen by the terminal device. That is, in the whole frame image, only the area outlined by the white line can be displayed on the current terminal device side. Then, a target sprite may be determined from a plurality of sprites of the live video based on the view information and the metadata. It should be noted that the target split screen may be a complete split screen described in the previous embodiment, or may be a partial screen of a split screen. For example, the white line in fig. 3 is outlined, and the target sprite is composed of a plurality of complete sprites and a plurality of partial sprites.

Specifically, as can be seen from the above, the metadata may include spatial position data corresponding to each sprite, and a scale factor.

After the information of the current visual angle seen by the terminal device is acquired, when any pixel in the visual angle is triggered, the pixel coordinate data (defined as x, y) of the pixel in the whole frame image and the coordinate data (defined as xc and yc) of each pixel in the sub-frame to which the pixel belongs can be obtained. The reason for this is that when the image processing system processes a frame image, the coordinate data of each pixel in the entire frame image and the coordinate data in the divided screen to which it belongs are already determined. The specific determination method is the prior art, for example, a frame image coordinate system is constructed by taking the position of the first pixel at the upper left corner of the whole frame image as an origin, and the coordinate data of other pixels in the frame image coordinate system is determined according to the spatial position relationship between different pixels and the first pixel at the upper left corner. Similarly, a coordinate system is constructed by taking the position of the first pixel at the upper left corner of the sub-picture as the origin of the coordinate system corresponding to the sub-picture, and the coordinate data of each pixel in the sub-picture to which the pixel belongs is determined.

In the metadata, scale factors are also included. The scale factor actually establishes a mapping relationship between coordinate data of the pixel in the sub-frame to which the pixel belongs, coordinate data of the sub-frame in the whole frame image, and coordinate data of the pixel in the whole frame image. The specific mapping relationship can be seen in the following formula:

x is Xc × scale + Xc (equation 1)

y ═ Yc × scale + Yc (equation 2)

Wherein (x, y) is coordinate data of a certain pixel corresponding to the picture of the complete frame image, (Xc, Yc) is coordinate data of a sub-picture to which the pixel belongs corresponding to the picture of the complete frame image, (Xc, Yc) is coordinate data of the pixel in the sub-picture to which the pixel belongs, and scale is a scale factor.

As can be seen from the above, if the parameters are (x, y), scale, and (Xc, Yc), then it is natural to calculate the coordinate data (Xc, Yc) of the sub-frame to which the pixel belongs corresponding to the frame in the complete frame image according to formula 1 and formula 2.

And (Xc, Yc) is corresponding to the metadata, and the split picture corresponding to the (Xc, Yc) is matched, namely the target split picture to be determined.

After the target split pictures are determined, the target download addresses corresponding to the target split pictures can be searched from the download addresses fed back by the main server.

And step 440, acquiring the slice data corresponding to the target split picture from the corresponding slave server based on the target download address.

Specifically, slice data corresponding to the target split screen is already stored in the target download address corresponding to the server. The terminal equipment only needs to request corresponding slice data from the corresponding target download address corresponding to the slave server. It should be noted that, because the application scene to which the embodiment of the present application is applied is a scene of a live video, the slice data requested to be acquired by the terminal device is the latest slice data generated at the current time, and is not all slice data.

And 450, decoding the slice data, and displaying the corresponding target sub-picture in the live video on the terminal equipment based on the decoded slice data.

The specific decoding process is prior art and will not be described in detail herein. Based on the decoded slice data, displaying a target sub-picture corresponding to the live video on the terminal device, specifically comprising:

and according to the system time stamp in the SEI, combining the target sub-pictures corresponding to the decoded slice data in the live video, and displaying the target sub-pictures on the terminal equipment. Specifically, as described above, assuming that SEI is inserted into the video streaming data every 1s, the system timestamp is in units of 1 s. Then, according to the system time stamp, the target sub-picture obtained after decoding can be displayed on the terminal device according to the time sequence.

Optionally, after performing the above operation, the method may further include:

Specifically, for example, the terminal device is currently in a state of playing a video in a landscape mode, but it is detected by the gyroscope that the terminal device is tilted backwards, that is, the viewing state of the terminal device changes. Then, the spatial movement data of the terminal device may be detected from the gyroscope. And, based on the movement data, determining that the viewing angle of the terminal device has changed, and re-determining the target split screen according to the viewing angle and the metadata. For the specific determination process, see above, the detailed description is not repeated here.

In the video image processing method provided by the embodiment of the invention, after the terminal device sends the video playing request to the main server in the video image processing system, the sub-frame can be determined from a plurality of sub-frames of the live video according to the watching state of the terminal device and the metadata fed back by the main server, and the target downloading address corresponding to the target sub-frame is determined from the downloading address. And summarizing and acquiring the slice data corresponding to the target sub-picture from the target download address. After the slice data is decoded, the target sub-picture in the live video can be displayed on the terminal device based on the slice data. In this way, the terminal equipment can acquire only part of the target sub-pictures at each request. Instead of transmitting all pictures of the frame image, the method solves the problem that the high-bit-rate online video cannot be transmitted due to the limitation of network transmission conditions of users. And most terminal devices can be ensured to watch the ultra-high-definition live video smoothly under the scene of a common network.

On the basis of the above embodiments, the technical solutions of the present application are more clearly understood for the convenience of the reader. The embodiment of the present invention provides a schematic diagram of signaling flow interaction in video image processing, specifically referring to fig. 5, including:

step 510, a main server in the video image processing system receives video streaming media data, and performs deframing on the video streaming media data to obtain a multi-frame image.

And step 520, the main server performs blocking processing on each frame image by adopting a preset picture segmentation rule according to the time sequence information of the frame image to obtain a plurality of sub-pictures corresponding to each frame image and metadata corresponding to the sub-pictures.

In step 530, the master server sends the plurality of sub-frames corresponding to each frame image to the corresponding plurality of slave servers.

Each of the plurality of slave servers synchronously encodes the corresponding sprite and stores the encoded one or more sprites in the form of slice data, step 540.

In step 550, the terminal device sends a video playing request to a host server in the video image processing system. The video playing request carries video information to be played.

And step 560, the main server determines the video information to be played according to the video playing request, and feeds back the metadata of the sub-frame corresponding to the video information to be played and the download address of the sub-frame.

In step 570, the terminal device receives the metadata of the split picture corresponding to the video information and the download address of the split picture corresponding to the video information, which are fed back by the main server.

In step 580, the terminal device determines a target sub-frame from the plurality of sub-frames of the live video based on the viewing status and the metadata of the terminal device, and determines a target download address corresponding to the target sub-frame from the download addresses.

In step 590, the terminal device obtains the slice data corresponding to the target split screen from the corresponding slave server based on the target download address.

Step 595, the terminal device decodes the slice data, and displays a target sub-picture corresponding to the live video on the terminal device based on the decoded slice data.

The details of the method steps in the signaling flow have been described in detail in the foregoing embodiments, and are not described herein too much.

Corresponding to the embodiment shown in fig. 2, an embodiment of the present invention further provides a video image processing apparatus, specifically referring to fig. 6, the video image processing apparatus includes: receiving section 601, blocking section 602, transmitting section 603, and encoding section 604.

A receiving unit 601, configured to receive video streaming media data, perform de-framing on the video streaming media data, and obtain a multi-frame image;

a blocking unit 602, configured to perform blocking processing on each frame image by using a preset frame segmentation rule according to timing information of the frame image, so as to obtain multiple sub-frames corresponding to each frame image and metadata corresponding to the sub-frames;

a sending unit 603, configured to send a plurality of split screens corresponding to each frame image to a corresponding plurality of slave servers, respectively;

an encoding unit 604, configured to perform synchronous encoding on the corresponding sub-pictures and store one or more encoded sub-pictures in the form of slice data.

Optionally, the apparatus further comprises: an information injection unit 605, configured to inject auxiliary enhancement information SEI in the video streaming media data at intervals of a preset time;

a detecting unit 606, configured to detect whether the current frame carries SEI;

the encoding unit 604 is specifically configured to, in the synchronous encoding process, if the detection unit 606 detects that the current frame carries the SEI, generate a slice data after the current frame is encoded, store the slice data in a preset address of the current slave server, and encode a next frame as a key frame and serve as a start frame of the next slice data.

In the virtual device, the receiving unit 601, the blocking unit 602, and the transmitting unit 603 are combined together, and then the corresponding physical device is the host server in the image processing system. After encoding section 604, detection section 606, and information injection section 605 are combined together, the corresponding entity apparatus is a slave server in the image processing system.

The functions executed by each functional unit in the video image processing apparatus according to the embodiment of the present invention have been described in detail in the embodiment of the video image processing method corresponding to fig. 2, and for simplicity and convenience of description, the description is not repeated here.

The video image processing device provided by the embodiment of the invention divides the frame image in the received video streaming media data into a plurality of sub-frames, and then stores the sub-frames in a slice form. When the subsequent terminal equipment sends a video playing request, only partial sub-pictures which can be displayed on a display interface of the terminal equipment are transmitted, instead of transmitting all plane video data mapped by the whole spherical video to the terminal equipment. Therefore, the problem that the high-bit-rate online video cannot be transmitted due to the limitation of the network transmission condition of the user is avoided.

Corresponding to the embodiment shown in fig. 4, another video image processing apparatus is further provided in the embodiment of the present invention, and specifically referring to fig. 7, the video image processing apparatus includes: transmitting section 701, receiving section 702, processing section 703, acquiring section 704, and decoding section 705.

A sending unit 701, configured to send a video playing request to a main server in a video image processing system, where the video playing request carries video information to be played;

a receiving unit 702, configured to receive metadata of a sub-frame corresponding to the video information and a download address of the sub-frame corresponding to the video information, where the metadata is fed back by the main server;

a processing unit 703, configured to determine a target sub-frame from multiple sub-frames of a live video based on a viewing status and metadata of a terminal device, and determine a target download address corresponding to the target sub-frame from the download addresses;

an obtaining unit 704, configured to obtain slice data corresponding to the target split screen from the corresponding slave server based on the target download address;

the decoding unit 705 is configured to decode the slice data, and display a target split picture in the live video on the terminal device based on the decoded slice data.

Optionally, the processing unit 703 is specifically configured to detect the terminal device through a gyroscope to obtain a viewing state of the terminal device;

Optionally, the processing unit 703 is further configured to detect whether the viewing state of the terminal device changes through a gyroscope;

The functions executed by each functional unit in the video image processing apparatus according to the embodiment of the present invention have been described in detail in the embodiment of the video image processing method corresponding to fig. 4, and for simplicity and convenience of description, the description is not repeated here.

According to the video image processing device provided by the embodiment of the invention, after the video playing request is sent to the main server in the video image processing system, the sub-pictures can be determined from a plurality of sub-pictures of the live video according to the watching state of the terminal equipment and the metadata fed back by the main server, and the target downloading addresses corresponding to the target sub-pictures are determined from the downloading addresses. And summarizing and acquiring the slice data corresponding to the target sub-picture from the target download address. After the slice data is decoded, the target sub-picture in the live video can be displayed on the terminal device based on the slice data. In this way, the terminal equipment can acquire only part of the target sub-pictures at each request. Instead of transmitting all pictures of the frame image, the method solves the problem that the high-bit-rate online video cannot be transmitted due to the limitation of network transmission conditions of users. And most terminal devices can be ensured to watch the ultra-high-definition live video smoothly under the scene of a common network.

In correspondence with the embodiment shown in fig. 2, the embodiment of the present invention further provides a video image processing system, and particularly, as shown in fig. 8, the system includes a master server 801 and a plurality of slave servers 802.

The main server 801 is configured to receive video streaming media data, perform de-framing on the video streaming media data, and obtain a multi-frame image;

sending a plurality of sub-frames corresponding to each frame image to a plurality of corresponding slave servers 802;

a slave server 802 for synchronously encoding the corresponding sprite; and storing the encoded one or more sub-pictures in the form of slice data.

Optionally, the main service is further configured to inject auxiliary enhancement information SEI in the video streaming media data every preset time;

the slave server 802 is specifically configured to, in the synchronous encoding process, if it is detected that the current frame carries the SEI, generate a slice data after the current frame is encoded, store the slice data in a preset address of the current slave server 802, and encode a next frame as a key frame and serve as a start frame of the next slice data.

The functions executed by each functional component in the video image processing system according to the embodiment of the present invention have been described in detail in the embodiment of the video image processing method corresponding to fig. 2, and for simplicity and convenience of description, the description is not repeated here.

The video image processing system provided by the embodiment of the invention divides the frame image in the received video streaming media data into a plurality of sub-frames, and then stores the sub-frames in a slice form. When the subsequent terminal equipment sends a video playing request, only partial sub-pictures which can be displayed on a display interface of the terminal equipment are transmitted, instead of transmitting all plane video data mapped by the whole spherical video to the terminal equipment. Therefore, the problem that the high-bit-rate online video cannot be transmitted due to the limitation of the network transmission condition of the user is avoided.

In addition, the ultra-high-definition video transcoding, which can be performed only by high-end computing power in the past, is performed by a plurality of machines (slave servers 802) distributed with common computing power. The method and the device have the advantages that the transmission code rate and the decoding threshold of the terminal device are reduced, the influences on the aspects of power consumption, heat dissipation, computing capacity and the like of the terminal device are reduced, the problem that most of terminal devices cannot decode videos with the frequency of 8K or above is solved, and ultrahigh-definition live videos can be smoothly watched on common networks and common mobile phones.

Corresponding to the embodiment shown in fig. 4, an embodiment of the present invention further provides a terminal device, specifically referring to fig. 9, where the terminal device includes: a transceiver 901, a processor 902, a decoder 903, and a display panel 904.

A transceiver 901, configured to send a video playing request to a main server in a video image processing system, where the video playing request carries video information to be played; receiving metadata of a sub-picture corresponding to the video information and a download address of the sub-picture corresponding to the video information, which are fed back by the main server;

a processor 902, configured to determine a target sub-frame from multiple sub-frames of a live video based on a viewing status and metadata of a terminal device, and determine a target download address corresponding to the target sub-frame from the download addresses; acquiring slice data corresponding to the target sub-frame from the corresponding slave server based on the target download address;

a decoder 903 for decoding slice data;

and a display panel 904, configured to display, on the terminal device, a target sub-frame corresponding to the live video based on the decoded slice data.

Optionally, the processor 902 is specifically configured to detect the terminal device through a gyroscope to obtain a viewing state of the terminal device; determining viewing angle information of a live video watched by a user based on the watching state; a target sub-frame is determined from a plurality of sub-frames of the live video based on the view information and the metadata.

Optionally, the processor 902 is further configured to detect whether the viewing state of the terminal device changes through a gyroscope; if the watching state of the terminal equipment changes, determining the space movement data of the terminal equipment; and adjusting the target split picture displayed by the terminal equipment based on the spatial movement data.

The functions executed by each functional unit in the terminal device according to the embodiment of the present invention have been described in detail in the embodiment of the video image processing method corresponding to fig. 2, and for simplicity and convenience of description, the description is not repeated here.

According to the terminal device provided by the embodiment of the invention, after the video playing request is sent to the main server in the video image processing system, the sub-pictures can be determined from a plurality of sub-pictures of the live video according to the watching state of the terminal device and the metadata fed back by the main server, and the target downloading addresses corresponding to the target sub-pictures are determined from the downloading addresses. And summarizing and acquiring the slice data corresponding to the target sub-picture from the target download address. After the slice data is decoded, the target sub-picture in the live video can be displayed on the terminal device based on the slice data. In this way, the terminal equipment can acquire only part of the target sub-pictures at each request. Instead of transmitting all pictures of the frame image, the method solves the problem that the high-bit-rate online video cannot be transmitted due to the limitation of network transmission conditions of users. And most terminal devices can be ensured to watch the ultra-high-definition live video smoothly under the scene of a common network.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a server, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A video image processing method applied to a video image processing system including a master server and a plurality of slave servers, the method comprising:

and respectively sending the plurality of sub-pictures corresponding to each frame image to the corresponding plurality of slave servers, so that each slave server synchronously encodes the corresponding sub-picture, and each slave server stores one or more encoded sub-pictures in the form of slice data.

2. The method of claim 1, further comprising:

injecting auxiliary enhancement information SEI into the video streaming media data at preset intervals;

each slave server stores the encoded one or more divided pictures in the form of slice data, and the method comprises the following steps:

in the synchronous encoding process, if the current frame is detected to carry the SEI, a slice data is generated after the current frame is encoded and is stored in a preset address of the slave server, and then the next frame is used as a key frame for encoding and is used as a starting frame of the next slice data.

3. A video image processing method is applied to a terminal device, and comprises the following steps:

sending a video playing request to a main server in a video image processing system, wherein the video playing request carries video information to be played;

receiving metadata of a sub-picture corresponding to the video information and fed back by the main server, and a download address of the sub-picture corresponding to the video information;

determining a target sub-frame from a plurality of sub-frames of a live video based on the viewing state of the terminal equipment and the metadata, and determining a target download address corresponding to the target sub-frame from the download address;

acquiring slice data corresponding to the target split picture from a corresponding slave server based on the target download address;

and decoding the slice data, and displaying a target sub-picture in the live video on the terminal equipment based on the decoded slice data.

4. The method of claim 3, wherein determining a target sub-frame from a plurality of sub-frames of a live video based on the viewing status of the terminal device and the metadata comprises:

determining a target sub-frame from a plurality of sub-frames of a live video based on the view information and the metadata.

5. The method according to claim 3 or 4, characterized in that the method further comprises:

detecting whether the watching state of the terminal equipment is changed or not through a gyroscope;

if the watching state of the terminal equipment changes, determining spatial movement data of the terminal equipment;

and adjusting the target split picture displayed by the terminal equipment based on the space movement data.

6. A video image processing system, comprising: a master server and a plurality of slave servers;

the main service is used for receiving video streaming media data, and performing unframing on the video streaming media data to obtain a multi-frame image;

the slave server is used for synchronously coding the corresponding sub-pictures; and storing the encoded one or more of the split pictures in the form of slice data.

7. The system of claim 6, wherein the main service is further configured to inject auxiliary enhancement information SEI in the video streaming media data every preset time interval;

the slave server is specifically configured to, in a synchronous encoding process, if it is detected that the current frame carries the SEI, generate slice data after the current frame is encoded, store the slice data in a preset address of the current slave server, and encode a next frame as a key frame and serve as a start frame of the next slice data.

8. A terminal device, comprising: a processor, a transceiver, a decoder, and a display panel;

the transceiver is used for sending a video playing request to a main server in the video image processing system, wherein the video playing request carries video information to be played; receiving metadata of a sub-picture corresponding to the video information and fed back by the main server, and a download address of the sub-picture corresponding to the video information;

the processor is used for determining a target sub-frame from a plurality of sub-frames of a live video based on the viewing state of the terminal equipment and the metadata, and determining a target downloading address corresponding to the target sub-frame from the downloading address; acquiring slice data corresponding to the target split picture from a corresponding slave server based on the target download address;

the decoder is used for decoding the slice data;

9. The terminal device according to claim 8, wherein the processor is specifically configured to detect the terminal device through a gyroscope to obtain a viewing state of the terminal device; determining viewing angle information of a live video watched by a user based on the watching state; determining a target sub-frame from a plurality of sub-frames of a live video based on the view information and the metadata.

10. The terminal device according to claim 8 or 9, wherein the processor is further configured to detect whether the viewing state of the terminal device changes through a gyroscope; if the watching state of the terminal equipment changes, determining spatial movement data of the terminal equipment; and adjusting the target split picture displayed by the terminal equipment based on the space movement data.