WO2018006825A1 - 视频编码方法和装置 - Google Patents

视频编码方法和装置 Download PDF

Info

Publication number
WO2018006825A1
WO2018006825A1 PCT/CN2017/091846 CN2017091846W WO2018006825A1 WO 2018006825 A1 WO2018006825 A1 WO 2018006825A1 CN 2017091846 W CN2017091846 W CN 2017091846W WO 2018006825 A1 WO2018006825 A1 WO 2018006825A1
Authority
WO
WIPO (PCT)
Prior art keywords
region
video frame
video
area
moving target
Prior art date
Application number
PCT/CN2017/091846
Other languages
English (en)
French (fr)
Inventor
万千
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018006825A1 publication Critical patent/WO2018006825A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field

Definitions

  • the present application relates to the field of video processing technologies, and in particular, to a video encoding method and apparatus.
  • a video is a form of data related to a moving image. It usually consists of a series of video frames. The video frames can be played continuously to display the dynamic images in the video. Through video coding, a video format file can be converted into a video stream suitable for transmission by using a specific compression technique.
  • the application provides a video encoding method, including:
  • the application provides a video encoding apparatus, including:
  • One or more memories are One or more memories
  • One or more processors among them,
  • the one or more memories storing one or more instruction modules configured to be executed by the one or more processors;
  • the one or more instruction modules include:
  • a region of interest acquisition module configured to acquire a video frame, detect a moving target in the video frame, and determine, in the video frame, a region where the moving target is located as a first region;
  • a region filtering module configured to perform smooth filtering on a second region in the video frame;
  • the video frame includes the first region and the second region, and between the first region and the second region No overlap;
  • an encoding module configured to encode the video frame according to an encoding manner that the fidelity of the first region is higher than the fidelity of the second region, to obtain a video bitstream.
  • the present application also proposes a non-transitory computer readable storage medium storing computer readable instructions that enable at least one processor to perform the above method.
  • FIG. 1 is an application environment diagram of a video encoding system in an example
  • 2A is a schematic diagram showing the internal structure of a server in an example
  • 2B is a schematic diagram showing the internal structure of a terminal in an example
  • 3A is a schematic flow chart of a video encoding method in an example
  • FIG. 3B is a schematic flowchart of a video encoding method in an example
  • FIG. 4 is a flow chart showing the steps of global motion compensation for a video frame in an example
  • FIG. 5 is a flow chart showing a step of detecting a moving target in a video frame and determining a region in which the moving target is located as a region of interest in the video frame;
  • FIG. 6 is a schematic flow chart showing the steps of determining whether a feature point belongs to an area where a moving target is located according to the extracted feature
  • FIG. 7 is a flow chart showing the steps of determining a region of interest according to feature points belonging to a region where a moving target is located in an example
  • Figure 8 is a block diagram showing the structure of a video encoding apparatus in an example
  • FIG. 9 is a structural block diagram of an area of interest acquisition module in an example
  • Figure 10 is a block diagram showing the structure of a video encoding apparatus in another example.
  • the inventor found that the current video coding technology is suitable for the encoding of normal scene video.
  • the video is vigorously moved and the details are rich.
  • various reasons such as uneven illumination often lead to difficulty in controlling the quality of the encoded video stream picture, or to ensure that the video stream generated by the encoding consumes too much network resources, and is not suitable. Transmission, so the current video coding method is difficult to balance the picture quality and the occupation of network resources.
  • the present application provides a video coding method for the technical problem that the video code stream after video coding is difficult to balance the picture quality and the occupation of network resources.
  • FIG. 1 is an application environment diagram of a video encoding system in an example.
  • the video encoding system includes a server 110 and a terminal 120.
  • the server 110 may be configured to acquire a video frame of the video; detect a moving target in the video frame, and determine a region where the moving target is located as a region of interest in the video frame; smooth the non-interest region of the video frame that does not belong to the region of interest After filtering, the video frame is encoded according to the fidelity of the region of interest higher than the fidelity of the non-interest region, and the video stream is obtained.
  • the server 110 can transmit the video code stream to the terminal 120 over the network.
  • the server includes a processor, a non-volatile storage medium, an internal memory, and a network interface connected by a system bus.
  • the non-volatile storage medium of the server stores an operating system, a database, and a video encoding device.
  • the database may store parameters required for video encoding, and the video encoding device is used to implement a video encoding method.
  • the server's processor is used to provide computing and control capabilities that support the operation of the entire server.
  • the internal memory of the server provides an environment for operation of a video encoding device in a non-volatile storage medium, and the internal memory can store computer readable instructions that, when executed by the processor, cause the processor to execute Video coding method.
  • the network interface of the server is used to communicate with an external terminal via a network connection, to send a video stream to the terminal, and the like.
  • the server can be implemented with a stand-alone server or a server cluster consisting of multiple servers.
  • the specific server may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • the terminal includes a processor connected through a system bus, a non-volatile storage medium, an internal memory, a network interface, and a display screen.
  • the non-volatile storage medium of the terminal stores an operating system, and further stores a video decoding device, and the video decoding device is used to implement a video decoding method.
  • the processor is used to provide computing and control capabilities to support the operation of the entire terminal.
  • the internal memory in the terminal provides an environment for operation of the video decoding device in the non-volatile storage medium, and the internal memory can store computer readable instructions that, when executed by the processor, cause the processor to execute A video decoding method.
  • the network interface is used for network communication with the server, such as receiving a video stream sent by the server.
  • the display screen of the terminal may be a liquid crystal display or an electronic ink display screen.
  • the input device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the terminal housing, or may be an external device. Keyboard, trackpad or mouse.
  • the terminal can be a mobile phone, a tablet computer, a personal digital assistant, or a VR (Virtual Reality) terminal.
  • FIG. 2B is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied.
  • the specific terminal may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • FIG. 3A is a schematic flow chart of a video encoding method in an example. This example is illustrated by the method applied to the server 110 of FIG. 1 described above. As shown in FIG. 3A, the method includes the following steps:
  • S304A detecting a moving target in the video frame, and determining, in the video frame, an area where the moving target is located as the first area.
  • S306A Perform smoothing filtering on a second area in the video frame, where the video frame includes the first area and the second area, and there is no overlap between the first area and the second area.
  • the video frame is encoded according to a coding manner in which the fidelity of the first area is higher than the fidelity of the second area, to obtain a video code stream.
  • FIG. 3B is a schematic flow chart of a video encoding method in an example. This example is illustrated by the method applied to the server 110 of FIG. 1 described above. As shown in FIG. 3B, the method specifically includes the following steps:
  • the video frame is a component unit of the video to be encoded, and the video frame is displayed in order to realize video playback.
  • the server may sequentially acquire the video frames in the order of the video frames in the video to be encoded.
  • the obtained video frame is a key frame
  • S304 is directly performed on the acquired video frame
  • the obtained video frame is a transition frame
  • the complete video frame may be calculated according to the key frame on which the transition frame depends. After that, S304 is performed on the complete video frame.
  • the key frame refers to a video frame containing complete picture information
  • the transition frame is a video frame containing incomplete picture information calculated based on the key frame.
  • the moving target is the element of motion in the picture represented by the video frame, which is the foreground of the video frame; and the element in the video frame that is still or nearly prohibited is the background of the video frame.
  • Moving objects such as people whose position or posture changes, moving vehicles, or moving lights.
  • a Region Of Interest (ROI) is an area that needs to be processed from a processed image in a frame, a circle, an ellipse, or an irregular polygon.
  • the server may perform motion target detection on the video frame, and detect a region where the moving target is located in the video frame, thereby determining the region as the region of interest. Since the region of interest is the region in which the moving object is located in the video frame, the region of interest is also the region of the video frame that is of interest to the video viewer relative to the non-interest region.
  • the server detects moving targets in the video frame, and specifically adopts an interframe difference method, a background subtraction method, and an optical flow based moving target detection algorithm.
  • the background subtraction method learns the law of background disturbance by counting the changes of several video frames before.
  • the main idea of the interframe difference method is to detect the region where motion occurs by using the difference of two consecutive frames or three bits in the video image sequence.
  • the interframe difference method is characterized by strong dynamics and can adapt to moving target detection under dynamic background.
  • the moving object detection algorithm based on optical flow is to calculate the motion state vector of each pixel point by using the optical flow equation, thereby finding the moving pixel point, and then detecting the moving target area.
  • the non-interest area refers to an area outside the region of interest in the video frame.
  • Smoothing filtering for non-interest regions is a process of smoothly transitioning pixel values of respective pixel points in a non-interest region.
  • the fidelity is a quantized value that measures the degree of similarity between the decoded video stream and the original video frame before encoding. The higher the fidelity, the higher the similarity, and the encoded video stream quality. The smaller the loss; the lower the fidelity, the lower the similarity, and the greater the image quality loss of the encoded video stream.
  • the above-mentioned region of interest may also be referred to as a first region, and the non-inductive region may be referred to as a second region.
  • the video frame includes the first area and the second area, and there is no overlap between the first area and the second area.
  • the smoothing filtering may adopt a method such as mean filtering, median filtering, or Gaussian filtering. If the mean filtering is used, the server can replace the pixel value of each pixel in the non-interest area with the pixel value mean in the neighborhood of the pixel. If median filtering is used, the server may replace the pixel value of each pixel in the non-interest area with the intermediate value of the pixel value in the neighborhood, and the intermediate value is the pixel value of the neighborhood sorted by the pixel value. The pixel value in the middle position.
  • mean filtering the server can replace the pixel value of each pixel in the non-interest area with the pixel value mean in the neighborhood of the pixel.
  • median filtering the server may replace the pixel value of each pixel in the non-interest area with the intermediate value of the pixel value in the neighborhood, and the intermediate value is the pixel value of the neighborhood sorted by the pixel value. The pixel value in the middle position.
  • the server can be in the non-interest area
  • the pixel values of the respective pixels are replaced by a weighted average of the respective pixel values within the neighborhood of the pixel, and the weights for calculating the weighted average are subject to a normal distribution.
  • the server can adjust the quantization parameter of the region of interest and the region of non-interest region (Quantizaion Parameter) to achieve the fidelity of the region of interest is higher than the fidelity of the non-interest region.
  • the quantization parameter is a parameter used when the video frame is quantized and encoded.
  • the quantization parameter is negatively correlated with the fidelity.
  • the quantization parameter takes the minimum value, the quantization is the finest.
  • the quantization parameter takes the maximum value, the quantization is the coarsest.
  • the server may specifically encode according to the coding mode of the quantization parameter of the region of interest lower than the quantization parameter of the non-region of interest, thereby realizing the coding mode of the fidelity of the region of interest higher than the fidelity of the non-interest region.
  • the server may adjust the resolution of the region of interest and the region of non-interest to achieve a higher fidelity of the region of interest than the fidelity of the region of interest, specifically The encoding method with a lower resolution than the resolution of the non-region of interest.
  • the resolution of a certain area of the video frame refers to the number of pixels included in the unit area in the area.
  • the video coding method by detecting a moving target in a video frame, determining a region where the moving target is located as a region of interest, thereby dividing the video frame into a region of interest and a region of non-interest, which is also of interest to the viewer. region.
  • the video frame is encoded to obtain a corresponding video stream, and even for the video of the complex scene, the region of the moving target can be maintained. High picture quality.
  • the fidelity of the non-interest area is directly reduced, compression distortion such as step ripple or ringing effect is brought about, and the picture quality is lowered.
  • the video encoding method further includes the step of global motion compensation of the video frame. Assuming that the video frame of the video is taken by the camera, the motion of the camera will cause the overall motion of the video frame picture, while some static backgrounds in the video frame are not moving, so the global motion compensation of the video frame is performed here to repair the camera motion. The effect on the overall picture of the video frame, so as to avoid detecting errors when detecting moving targets or even detecting that the entire frame of the video frame is moving.
  • FIG. 4 is a flow chart showing the steps of global motion compensation for a video frame in an example. As shown in FIG. 4, the step specifically includes the following steps:
  • the subsequent processing of the video frame in this example requires only pure object motion, so it is necessary to first estimate the camera motion parameter and then use the camera motion parameter. Fix video frames to achieve global motion compensation for video frames.
  • Camera motion parameters can be estimated using a variety of methods, such as M-estimater (M-estimator), least squares, or ant colony algorithms.
  • M-estimater M-estimator
  • the two axes can be orthogonal.
  • S404 Perform global motion compensation processing on the video frame according to the camera motion parameter.
  • the motion of the camera is the dominant amount of the observed motion, whereby the camera motion parameters can be estimated and the original video corrected according to the camera motion parameters.
  • Frame which gets a video frame with only object motion. If the camera is modeled using a two-dimensional affine model, the server can calculate the video frame processed by the global motion compensation according to the following formula (2):
  • I(s) is the positional coordinates of the camera in two axial directions
  • the global motion compensation processing is performed on the video frame by using the estimated camera parameters, so that the video frame processed by the global motion compensation can eliminate the influence of the camera motion, and thus the region of the moving target in the video frame can be accurately detected.
  • the region of interest is the real moving target area, to ensure that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • S304 includes the following steps:
  • S502 specifically includes: using each pixel in the video frame as a feature point; or randomly selecting a preset number or a preset ratio of pixel points as feature points in the video frame; or, in a video frame The pixels are uniformly sampled to obtain feature points.
  • the server may use all the pixel points in the video frame as feature points, or select a partial pixel point from the video frame as a feature point by using a set rule.
  • the preset ratio refers to the ratio of the number of feature points to the total number of pixels of the video frame.
  • the pixel points in the video frame are uniformly sampled, specifically, the pixel points are selected as feature points in every two preset pixels in the two axial directions of the video frame.
  • a preset number or a preset ratio of pixel points is randomly selected as a feature point in a video frame, or when a pixel point in a video frame is uniformly sampled to obtain a feature point, the number of feature points is less than the pixel point of the video frame. total.
  • the extracted features include motion features, and further include at least one of spatial features, color features, and temporal features.
  • the motion feature is a feature that characterizes the motion characteristics of the feature point.
  • the feature point at time t is i t (x, y).
  • x and y are the positional coordinates in the two axial directions of the feature point i t , respectively.
  • the spatial feature is a feature that characterizes the spatial position of the feature point relative to the video frame.
  • the color feature is a feature that characterizes the color characteristics of the feature point, and the pixel value of the feature point can constitute a color feature.
  • the server can also convert the video frame into the YUV color mode, and then set the pixel value y t (x, y), u t (x, y) and v of each component of the feature point i t (x, y) in the YUV color mode.
  • the YUV color mode is more sensitive to color changes, and the ability of the extracted features to express the color characteristics of the feature points can be improved.
  • the temporal feature is a feature that characterizes the temporal variation characteristic of the feature point, and the color feature at the next time t+1 of the feature point i t (x, y) can be used as the temporal feature of the present time t.
  • S506. Determine, according to the extracted feature, whether the feature point belongs to an area where the moving target is located.
  • the server may input the extracted features to the trained classifier, and the classifier outputs whether the feature points belong to the classification result of the region where the moving target is located, thereby determining whether the feature points belong to the region where the moving target is located.
  • the server may also cluster feature points to obtain multiple regions in the video frame, and then determine whether each of the multiple regions is the region where the moving target is located.
  • the server may use an area surrounded by feature points determined to belong to the area where the moving object is located as the area of interest. If the number of feature points is less than the total number of pixels of the video frame, the server may estimate whether the pixel points of the non-feature points in the video frame belong to the area where the moving target is located according to the determination result of whether the feature point belongs to the area where the moving target is located.
  • each feature point belongs to the region where the moving target is located, so that the region formed by the feature points belonging to the region where the moving target is located is determined as the region of interest, and the video frame can be accurately detected.
  • the area where the moving target is located ensures that the region of interest is the real moving target area, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • S506 includes the following steps:
  • the feature points are divided into a plurality of categories according to the extracted features, and multiple regions corresponding to the plurality of categories respectively in the video frame are obtained, wherein one region includes categories corresponding to the regions.
  • One or more feature points are divided into a plurality of categories according to the extracted features, and multiple regions corresponding to the plurality of categories respectively in the video frame are obtained, wherein one region includes categories corresponding to the regions.
  • the server may cluster the feature points into a plurality of categories according to the extracted features, and the feature points of each category form corresponding regions, thereby obtaining a plurality of regions in the video frame.
  • the server may be clustered by k-means clustering algorithm, hierarchical clustering algorithm, SOM (Self-organizing feature map) clustering algorithm or Meanshift (mean shift) clustering algorithm.
  • SOM Self-organizing feature map
  • Meanshift meanshift
  • the average moving velocity of the optical flow of each of the plurality of regions is an average value of the moving speed of each of the plurality of regions in the optical flow field.
  • the optical flow field is one of all the pixels in the video frame.
  • a two-dimensional instantaneous velocity field is one of all the pixels in the video frame.
  • S606 Compare respective average optical motion speeds of the plurality of regions with preset values.
  • the preset value is 0 or a value close to 0.
  • the server compares the average moving speeds of the respective optical flows of the plurality of regions with the preset values, so that the region of the moving target can be determined according to the comparison result.
  • S608 Determine an area in which the average moving speed of the optical flow in the plurality of areas is greater than a preset value as the area where the moving target is located.
  • the server may calibrate the area in which the average moving speed of the optical flow in the plurality of areas is greater than the preset value to the area where the moving target is located, and mark the area in which the average moving speed of the optical flow in the multiple areas is less than or equal to the preset value as Non-interest area.
  • the feature points are clustered according to the extracted features to obtain a plurality of regions in the video frame, and the video frames can be efficiently and accurately determined by comparing the average motion speed of each of the plurality of regions with a preset value.
  • the region of interest ensures that the region of interest is the region of the real moving target, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • the number of feature points is less than the total number of pixels of the video frame.
  • S508 specifically includes the following steps:
  • the server may traverse the pixel points of each non-feature point in the video frame, and calculate the distance between the traversed pixel points and each feature point, so as to find the feature closest to the traversed pixel point according to the calculated distance. Point until the pixel points of all non-feature points in the video frame are traversed.
  • the server may directly determine that the pixel point corresponding to the traversal belongs to the area where the moving target is located; If the sign is not in the area where the moving target is located, the server can directly determine that the corresponding traversed pixel does not belong to the moving target area.
  • the server can know whether each pixel in the video frame belongs to the region of interest, and then according to the moving target in the video frame.
  • the pixel points of the region determine the region of interest
  • the pixel points of the video frame belonging to the region where the moving target is located include the feature points belonging to the region where the moving target is located and the pixel points of the non-feature points belonging to the region where the moving target is located.
  • the pixel points of the non-feature points in the video frame belong to the region where the moving target is located, and the number of the feature points of the total number of pixels of the video frame is determined by the determination result of the region of the moving target.
  • the area of interest is determined to improve the video coding efficiency.
  • the method further includes generating a tag template for each pixel in the tagged video frame that belongs to the region of interest.
  • This step can be performed after performing step S304.
  • the markup template records information about whether each pixel in the video frame belongs to the region of interest.
  • the marking template may specifically be a two-dimensional matrix having the same screen size as the video frame, and the elements in the two-dimensional matrix are in one-to-one correspondence with the respective pixel points of the video frame, and each element in the two-dimensional matrix is corresponding in the video frame. Whether the pixel is a marker of the region of interest.
  • the mark in the markup template takes two values, which respectively indicate that the pixel points in the corresponding video frame belong to the region of interest or do not belong to the region of interest, for example, "1" and "0" can be used to indicate that they belong to or not belong to the region of interest.
  • S306 includes: smoothing the non-region of interest formed by the pixel of the region of the video frame not marked by the region of interest in the video frame, according to the fidelity of the region of interest formed by the marker template marker
  • the coding method of the fidelity of the non-interest area is encoded, and the video frame is encoded to obtain a video code stream.
  • the video frame includes a left-eye video frame and a right-eye video frame
  • the video code stream includes a left-eye video stream and a right-eye video stream
  • the video encoding method further includes: a left-eye video stream and a right-eye video.
  • the code stream is sent to the VR terminal, so that the VR terminal separately decodes the left-eye video stream and the right-eye video stream and plays the same.
  • the server may acquire a left-eye video frame and a right-eye video frame, respectively detect motion targets in the left-eye video frame and the right-eye video frame, and respectively detect the motions in the left-eye video frame and the right-eye video frame.
  • the target area is determined as the region of interest, and the non-interest region of the left-eye video frame and the right-eye video frame that does not belong to the region of interest is smoothed and filtered, and the fidelity of the region of interest is higher than the non-region of interest.
  • the fidelity encoding method encodes the video frame to obtain a left-eye video stream and a right-eye video stream, respectively. Among them, the left eye video frame and the right eye video frame are used to generate a visual three-dimensional picture.
  • the left eye video frame and the right eye video frame can be obtained from the panoramic video.
  • the server pushes the left-eye video stream and the right-eye video stream to the VR terminal, so that the VR terminal separately separates the left-eye video stream and the right-eye video stream. Decoded into a left-eye video frame and a right-eye video frame and played synchronously.
  • the left-eye video frame and the right-eye video frame displayed by the VR terminal form a visual three-dimensional picture through the user's eyes through the left and right glasses of the VR terminal.
  • the VR terminal may be a dedicated VR terminal with a left eyeglass lens, a right eyeglass lens and a display screen, or may be a mobile terminal, a tablet computer and the like, and the mobile terminal passes through the left eyeglass lens and the right eyeglass lens attached to the mobile terminal.
  • a visual three-dimensional picture is formed through the eyes of the user.
  • the video is encoded into a left-eye video stream and a right-eye video stream, and then sent to the VR terminal, so that the VR terminal can restore the left-eye video frame and the right-eye video frame and synchronize. Play, the user of the VR terminal can view high quality 3D images. Moreover, the left-eye video code stream and the right-eye video code stream are sent to the VR terminal to occupy a small amount of network resources, which can prevent the VR terminal from being stuck when playing.
  • a video encoding apparatus 800 including a region of interest acquisition module 810, a region filtering module 820, and an encoding module 830.
  • the region of interest acquisition module 810 is configured to acquire a video frame, detect a moving target in the video frame, and determine a region where the moving target is located as a region of interest in the video frame.
  • the region filtering module 820 is configured to perform smooth filtering on non-regions of interest in the video frame that do not belong to the region of interest.
  • the encoding module 830 is configured to encode the video frame according to a coding manner in which the fidelity of the region of interest is higher than the fidelity of the non-interest region, to obtain a video code stream.
  • the video encoding apparatus 800 determines a moving target in the video frame to determine a region where the moving target is located as a region of interest, thereby dividing the video frame into a region of interest and a region of non-interest, which is also of interest to the viewer. Area.
  • the video frame is encoded to obtain a corresponding video stream, and even for the video of the complex scene, the region of the moving target can be maintained. High picture quality.
  • the fidelity of the non-interest area is directly reduced, compression distortion such as step ripple or ringing effect is brought about, and the picture quality is lowered.
  • the region of interest acquisition module 810 includes: a global motion compensation module 811 for acquiring camera motion parameters; Number, global motion compensation processing for video frames.
  • the global motion compensation processing is performed on the video frame by using the estimated camera parameters, so that the video frame processed by the global motion compensation can eliminate the influence of the camera motion, and thus the region of the moving target in the video frame can be accurately detected.
  • the region of interest is the real moving target area, to ensure that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • the region of interest acquisition module 810 includes a feature extraction module 812 and a region of interest determination module 813.
  • the feature extraction module 812 is configured to determine a feature point in a pixel of the video frame; and extract a feature of the feature point.
  • the region of interest judging module 813 is configured to determine, according to the extracted feature, whether the feature point belongs to the region where the moving target is located; and determine the region of interest according to the feature point belonging to the region where the moving target is located.
  • each feature point belongs to the region where the moving target is located, so that the region formed by the feature points belonging to the region where the moving target is located is determined as the region of interest, and the video frame can be accurately detected.
  • the area where the moving target is located ensures that the region of interest is the real moving target area, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • the feature extraction module 812 is further configured to use each pixel point in the video frame as a feature point; or randomly select a preset number or a preset ratio of pixel points as feature points in the video frame; or, The pixels in the video frame are uniformly sampled to obtain feature points.
  • the extracted features include motion features, and further include at least one of spatial features, color features, and temporal features.
  • the region of interest determination module 813 is further configured to cluster the feature points according to the extracted features to obtain multiple regions in the video frame; The average moving speed of the optical flow; comparing the average moving speeds of the optical flows of the plurality of regions with the preset values respectively; determining the region where the average moving velocity of the optical flow in the plurality of regions is greater than the preset value is the region where the moving target is located.
  • the feature points are clustered according to the extracted features to obtain a plurality of regions in the video frame, and the video frames can be efficiently and accurately determined by comparing the average motion speed of each of the plurality of regions with a preset value.
  • the region of interest ensures that the region of interest is the region of the real moving target, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • the number of feature points is less than the total number of pixel points of the video frame; the region of interest determination module 813 is further configured to search for a feature point in the video frame that is closest to the pixel point of the non-feature point; according to the found feature Whether the point belongs to the judgment result of the region where the moving target is located, determines whether the pixel point of the non-feature point belongs to the region where the moving target is located; and determines the region of interest according to the pixel point belonging to the region where the moving target is located.
  • the pixel points of the non-feature points in the video frame belong to the region where the moving target is located, and the number of the feature points of the total number of pixels of the video frame is determined by the determination result of the region of the moving target.
  • the area of interest is determined to improve the video coding efficiency.
  • the region of interest acquisition module 810 is further configured to generate a tag template for each pixel in the tagged video frame to belong to the region of interest;
  • the region filtering module 820 is further configured to perform smooth filtering on the non-region of interest composed of the pixel points not marked by the region of interest marked by the mark template in the video frame.
  • the encoding module 830 is further configured to encode the video frame according to an encoding manner in which the fidelity of the region of interest formed by the marking template mark is higher than the fidelity of the non-interest region, to obtain a video bitstream.
  • the template by marking the template, it is simple and efficient to express whether each pixel in the video frame belongs to the region of interest, so that when processing each pixel of the video frame,
  • the template is used as a reference, and the coding of the pixels in the region of interest and the non-interest region is differentiated, which can further improve the video coding efficiency.
  • the video frame includes a left eye video frame and a right eye video frame;
  • the video code stream includes a left eye video stream and a right eye video stream.
  • the video encoding apparatus further includes: a video code stream sending module 840, configured to send the left-eye video stream and the right-eye video stream to the VR terminal, so that the VR terminal sets the left-eye video stream and the right eye.
  • the video streams are decoded separately and played synchronously.
  • the video is encoded into a left-eye video stream and a right-eye video stream, and then sent to the VR terminal, so that the VR terminal can restore the left-eye video frame and the right-eye video frame and play the same, and the VR terminal user can Watch high quality 3D images.
  • the left-eye video code stream and the right-eye video code stream are sent to the VR terminal to occupy a small amount of network resources, which can prevent the VR terminal from being stuck when playing.
  • the program when executed, may include a flow of an instance of each of the methods described above.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.
  • the present application also provides a storage medium in which is stored a data processing program for performing any of the above-described methods of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请涉及一种视频编码方法和装置,该方法包括:获取视频帧;检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域;对所述视频帧中第二区域进行平滑滤波,按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。

Description

视频编码方法和装置
本申请要求于2016年07月08日提交中国专利局、申请号为201610541399.3、发明名称为“视频编码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频处理技术领域,特别是涉及一种视频编码方法和装置。
背景
视频是涉及动态影像的数据形式,通常包括一系列的视频帧,将视频帧连续播放就可以实现展示视频中的动态影像。通过视频编码,可以利用特定的压缩技术,将一种视频格式文件转换成适于传输的视频码流。
技术内容
本申请提供了一种视频编码方法,包括:
获取视频帧;
检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域;
对所述视频帧中第二区域进行平滑滤波,所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠;
按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。
本申请提供了一种视频编码装置,包括:
一个或一个以上存储器;
一个或一个以上处理器;其中,
所述一个或一个以上存储器存储有一个或者一个以上指令模块,经配置由所述一个或者一个以上处理器执行;其中,
所述一个或者一个以上指令模块包括:
感兴趣区域获取模块,用于获取视频帧;检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域;
区域滤波模块,用于将所述视频帧中第二区域进行平滑滤波;所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠;
编码模块,用于按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。
本申请还提出了一种非易失性计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行以上方法。
附图简要说明
为了更清楚地说明本申请实例或现有技术中的技术方案,下面将对实例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实例中视频编码***的应用环境图;
图2A为一个实例中服务器的内部结构示意图;
图2B为一个实例中终端的内部结构示意图;
图3A为一个实例中视频编码方法的流程示意图;
图3B为一个实例中视频编码方法的流程示意图;
图4为一个实例中对视频帧进行全局运动补偿的步骤的流程示意图;
图5为一个实例中检测视频帧中的运动目标,并在视频帧中将运动目标所在区域确定为感兴趣区域的步骤的流程示意图;
图6为一个实例中根据提取的特征判断特征点是否属于运动目标所在区域的步骤的流程示意图;
图7为一个实例中根据属于运动目标所在区域的特征点确定感兴趣区域的步骤的流程示意图;
图8为一个实例中视频编码装置的结构框图;
图9为一个实例中感兴趣区域获取模块的结构框图;
图10为另一个实例中视频编码装置的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实例仅仅用以解释本申请,并不用于限定本申请。
在实现本申请实例的过程中,发明人发现目前的视频编码技术适用于正常场景视频的编码,然而,对于一些复杂场景的视频,比如体育比赛或者舞台演出等视频,由于视频剧烈运动、细节丰富以及不均匀光照等各种原因,往往导致编码后的视频码流画面质量难以控制,或者为保证画质导致编码得到的视频码流占用网络资源太大,不适合 传输,因此目前的视频编码方式难以兼顾画面质量和对网络资源的占用。
基于此,本申请针对目前将视频编码后的视频码流难以兼顾画面质量和对网络资源的占用的技术问题,提供一种视频编码方法。
图1为一个实例中视频编码***的应用环境图。如图1所示,该视频编码***包括服务器110和终端120。服务器110可用于获取视频的视频帧;检测视频帧中的运动目标,并在视频帧中将运动目标所在区域确定为感兴趣区域;将视频帧中不属于感兴趣区域的非感兴趣区域进行平滑滤波后,按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。服务器110可通过网络向终端120传输视频码流。
图2A为一个实例中服务器110的内部结构示意图。如图2A所示,该服务器包括通过***总线连接的处理器、非易失性存储介质、内存储器和网络接口。其中,该服务器的非易失性存储介质存储有操作***、数据库和视频编码装置,数据库中可存储有进行视频编码所需的参数,该视频编码装置用于实现一种视频编码方法。该服务器的处理器用于提供计算和控制能力,支撑整个服务器的运行。该服务器的内存储器为非易失性存储介质中的视频编码装置的运行提供环境,该内存储器中可储存有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行视频编码方法。该服务器的网络接口用于据以与外部的终端通过网络连接通信,向终端发送视频码流等。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。本领域技术人员可以理解,图2A中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的服务器的限定,具体的服务器可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
图2B为一个实例中终端的内部结构示意图。如图2B所示,该终端包括通过***总线连接的处理器、非易失性存储介质、内存储器、网络接口和显示屏。其中,终端的非易失性存储介质存储有操作***,还存储有一种视频解码装置,该视频解码装置用于实现一种视频解码方法。该处理器用于提供计算和控制能力,支撑整个终端的运行。终端中的内存储器为非易失性存储介质中的视频解码装置的运行提供环境,该内存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种视频解码方法。网络接口用于与服务器进行网络通信,如接收服务器发送的视频码流。终端的显示屏可以是液晶显示屏或者电子墨水显示屏等,输入装置可以是显示屏上覆盖的触摸层,也可以是终端外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。该终端可以是手机、平板电脑、个人数字助理或VR(Virtual Reality,即虚拟现实)终端等。本领域技术人员可以理解,图2B中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
图3A为一个实例中视频编码方法的流程示意图。本实例以该方法应用于上述图1中的服务器110来举例说明。如图3A所示,该方法包括如下步骤:
S302A,获取视频帧。
S304A,检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域。
S306A,对所述视频帧中的第二区域进行平滑滤波,所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠。
S308A,按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。
图3B为一个实例中视频编码方法的流程示意图。本实例以该方法应用于上述图1中的服务器110来举例说明。如图3B所示,该方法具体包括如下步骤:
S302,获取视频帧。
其中,视频帧是待编码的视频的组成单元,视频帧被按次序展示便可以实现视频播放。服务器可按照待编码的视频中视频帧的次序依次获取视频帧。
在一个实例中,若获取的视频帧是关键帧,则直接对获取的视频帧执行S304;若获取的视频帧是过渡帧,则可根据该过渡帧所依赖的关键帧计算出完整的视频帧后,对完整的视频帧执行S304。其中,关键帧是指包含完整画面信息的视频帧,过渡帧则是基于关键帧计算出的包含不完整的画面信息的视频帧。
S304,检测视频帧中的运动目标,并在视频帧中将运动目标所在区域确定为感兴趣区域。
其中,运动目标是视频帧所表示的画面中运动的元素,是视频帧的前景;而视频帧中静止或者接近禁止的元素是视频帧的背景。运动目标比如位置或者姿态变化的人、移动的交通工具或者移动的光照等。感兴趣区域(Region Of Interest,ROI),是图像处理中从被处理的图像中以方框、圆、椭圆或者不规则多边形等方式勾勒出的需要处理的区域。
具体地,服务器可对视频帧进行运动目标检测,检测出视频帧中运动目标所在区域,从而将该区域确定为感兴趣区域。由于该感兴趣区域是视频帧中运动目标所在区域,因此该感兴趣区域也是视频帧中相对于非感兴趣区域被视频观看者所关注的区域。
服务器检测视频帧中的运动目标,具体可采用帧间差分法、背景减除法和基于光流的运动目标检测算法。背景减除法通过统计前若干视频帧的变化情况,从而学习背景扰动的规律。帧间差分法的主要思想就是利用视频图像序列中连续两帧或三顿的差异来检测发生运动的区域。帧间差分法的特点是动态性强,能够适应动态背景下的运动目标检测。基于光流的运动目标检测算法是利用光流方程计算出每个像素点的运动状态矢量,从而发现运动的像素点,进而检测出运动目标所在区域。
S306,将视频帧中不属于感兴趣区域的非感兴趣区域进行平滑滤波后,按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。
其中,非感兴趣区域是指视频帧中除去感兴趣区域之外的区域。对非感兴趣区域的平滑滤波是将非感兴趣区域中各像素点的像素值平滑过渡的处理过程。保真度是衡量编码后的视频码流解码后的视频帧与编码前原始的视频帧之间相似程度的量化值,保真度越高表示相似程度越高,编码后的视频码流画质损失越小;保真度越低表示相似程度越低,编码后的视频码流画质损失越大。
需要说明的是,上述感兴趣区域又可称为第一区域,上述非感性趣区域又可称为第二区域。并且上述视频帧包括上述第一区域和上述第二区域,并且上述第一区域和上述第二区域之间没有重叠。
具体地,平滑滤波可采用均值滤波、中值滤波或者高斯滤波等方式。若采用均值滤波,服务器可将非感兴趣区域中的各个像素点的像素值替换为该像素点邻域内的像素值均值。若采用中值滤波,服务器可将非感兴趣区域中的各个像素点的像素值替换为该邻域中像素值的中间值,该中间值是将该邻域中像素值按像素值大小排序后处于中间位置的像素值。若采用高斯滤波,则服务器可将非感兴趣区域中的 各个像素点的像素值替换为该像素点邻域内的各个像素值的加权平均值,且计算加权平均值的权重服从正态分布。
服务器可通过调整感兴趣区域和非感兴趣区域的量化参数(Quantizaion Parameter),实现感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式。量化参数是对视频帧进行量化编码时所采用的参数。量化参数与保真度负相关,量化参数取最小值时表示量化最精细,当量化参数取最大值时表示量化最粗糙。服务器具体可按照感兴趣区域的量化参数低于非感兴趣区域的量化参数的编码方式进行编码,从而实现感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式。
在一个实例中,服务器可通过调整感兴趣区域和非感兴趣区域的分辨率,实现感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,具体可采用感兴趣区域的分辨率低于非感兴趣区域的分辨率的编码方式。其中,视频帧某区域的分辨率是指该区域中单位面积中所包含的像素点数。
上述视频编码方法,通过检测视频帧中的运动目标,将运动目标所在区域确定为感兴趣区域,从而将视频帧分为感兴趣区域和非感兴趣区域,该感兴趣区域也是观看者所关注的区域。按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码得到相应的视频码流,即使对于复杂场景的视频,也可以将运动目标所在区域保持较高的画面质量。而且,如果直接降低非感兴趣区域的保真度会带来明显的阶梯波纹或振铃效应等压缩失真,降低画面质量。在编码之前对非感兴趣区域进行平滑滤波,减少高频信息,降低保真度下降而引起的压缩失真,非感兴趣区域被被观察到是模糊的而非充满噪声的,从而提高编码后的视频码流整体的画面质量。再者,通过降低非感兴趣区域的保真度来降低编码后的视频码流对网络资源的 占用。
在一个实例中,在S304之前,该视频编码方法还包括对视频帧进行全局运动补偿的步骤。假设视频的视频帧是由摄像机拍摄的,该摄像机的运动会导致视频帧画面整体的运动,而视频帧中一些静态的背景并不是运动的,因此这里对视频帧进行全局运动补偿,以修复摄像机运动对视频帧整体画面的影响,从而避免在检测运动目标时检测出错甚至检测到视频帧整个画面都在运动。
图4为一个实例中对视频帧进行全局运动补偿的步骤的流程示意图。如图4所示,该步骤具体包括如下步骤:
S402,获取摄像机运动参数。
具体地,由于视频中物体的表征运动是由摄像机运动和物体运动叠加而来,而本实例后续处理视频帧时只需要纯粹的物体运动,因此需要首先估计出摄像机运动参数,再利用摄像机运动参数修复视频帧,实现对视频帧的全局运动补偿。
在一个实例中,服务器可采用二维仿射模型为摄像机建模,摄像机在位置s=(x,y)处的运动向量表示为公式(1):
Figure PCTCN2017091846-appb-000001
其中,s=(x,y)是摄像机某一点在两个轴向上的位置坐标,wθ(s)表示摄像机在位置s=(x,y)处的运动向量;θ=(a1,a2,a3,a4,a5,a6)是摄像机运动参数,分别表示摄像机在两个轴向上的伸缩量、旋转量和位移量。摄像机运动参数可采用多种方法进行估计得到,比如M-estimater(M估计法)、最小二乘法或者蚁群算法。两个轴向可以是正交的。
S404,根据摄像机运动参数,对视频帧进行全局运动补偿处理。
具体地,假设摄像机的运动是观测到的表征运动的主导量,由此可以估算出摄像机运动参数,并根据摄像机运动参数修正原始的视频 帧,得到只有物体运动的视频帧。若采用二维仿射模型为摄像机建模,则服务器可根据如下公式(2)计算经过全局运动补偿处理的视频帧:
Figure PCTCN2017091846-appb-000002
其中
Figure PCTCN2017091846-appb-000003
表示经过全局运动补偿处理的视频帧,I(s)是摄像机在两个轴向上的位置坐标,wθ(s)表示摄像机在位置s=(x,y)处的运动向量。
本实例中,利用估计出的摄像机参数,对视频帧进行全局运动补偿处理,使得经过全局运动补偿处理的视频帧可消除摄像机运动带来的影响,进而可以准确检测出视频帧中运动目标所在区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。
如图5所示,在一个实例中,S304包括以下步骤:
S502,在视频帧的像素点中确定特征点。
在一个实例中,S502具体包括:将视频帧中的每个像素点作为特征点;或者,在视频帧中随机选取预设数量或预设比例的像素点作为特征点;或者,对视频帧中的像素点进行均匀采样,得到特征点。
具体地,服务器既可以将视频帧中的所有像素点均作为特征点,也可以采用设定的规则从视频帧中选择部分像素点作为特征点。预设比例是指特征点的数量占视频帧的像素点总数的比例。对视频帧中的像素点进行均匀采样,具体是指在视频帧中两个轴向上分别每隔预设个数的像素点选取像素点作为特征点。当在视频帧中随机选取预设数量或预设比例的像素点作为特征点时,或者,对视频帧中的像素点进行均匀采样得到特征点时,特征点的数量少于视频帧的像素点总数。
S504,提取特征点的特征。
在一个实例中,提取的特征包括运动特征,还包括空间特征、色彩特征和时间特征中的至少一种。
具体地,运动特征是表征特征点的运动特性的特征。假设t时刻特征点为it(x,y)。服务器可采用光流法获得特征点it(x,y)的光流向量(dx,dy),可根据光流向量中的元素构成运动特征,比如可定义运动特征xm={dx,dy}。其中,x和y分别是特征点it两个轴向上的位置坐标。
空间特征是表征特征点相对于视频帧的空间位置的特征,服务器可采用特征点it(x,y)两个轴向上的位置坐标来构成空间特征,比如可定义空间特征xs={x,y}。
色彩特征是表征特征点的色彩特性的特征,可以将特征点的像素值构成色彩特征。服务器也可以将视频帧转化为YUV颜色模式后,将特征点it(x,y)在YUV颜色模式下各个分量的像素值yt(x,y)、ut(x,y)和vt(x,y)构成色彩特征xc={yt(x,y),ut(x,y),vt(x,y)}。采用YUV颜色模式对颜色变化更加敏感,可提高提取的特征对特征点颜色特性的表达能力。
时间特征是表征特征点的时间变化特性的特征,可以用特征点it(x,y)在下一时刻t+1的色彩特征作为本时刻t的时间特征。比如可定义时间特征为xt={yt+1(x’,y’),ut+1(x’,y’),vt+1(x’,y’)},其中(x’,y’)=(x+dx,y+dy)。提取的特征可表示为:X={xs,xm,xc,xt}。
S506,根据提取的特征判断特征点是否属于运动目标所在区域。
具体地,服务器可将提取的特征输入到经过训练的分类器,由分类器输出特征点是否属于运动目标所在区域的分类结果,从而判定特征点是否属于运动目标所在区域。在一个实例中,服务器也可以将特征点进行聚类,得到视频帧中的多个区域,进而判断多个区域中每个区域是否为运动目标所在区域。
S508,根据属于运动目标所在区域的特征点确定感兴趣区域。
具体地,若将视频帧中的每个像素点作为特征点,则服务器可将判定为属于运动目标所在区域的特征点围成的区域作为感兴趣区域。若特征点的数量少于视频帧的像素点总数,则服务器可根据特征点是否属于运动目标所在区域的判断结果,估计视频帧中非特征点的像素点是否属于运动目标所在区域。
本实例中,依据视频帧中的特征点,判断各个特征点是否属于运动目标所在区域,从而将属于运动目标所在区域的特征点所构成的区域确定为感兴趣区域,可准确检测出视频帧中运动目标所在区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。
如图6所示,在一个实例中,S506包括如下步骤:
S602,根据提取的特征将特征点进行聚类,得到视频帧中的多个区域。
在一些实例中,根据提取的特征将所述特征点分为多个类别,得到所述视频帧中分别对应所述多个类别的多个区域,其中,一个区域包括属于该区域对应的类别的一个或多个特征点。
具体地,服务器可根据提取的特征将特征点聚类为多个类别,每个类别的特征点形成相应的区域,从而得到视频帧中的多个区域。服务器具体可采用k-means聚类算法、层次聚类算法、SOM(Self-organizing feature Map,自组织特征映射网络)聚类算法或者Meanshift(均值偏移)聚类算法等进行聚类。通过聚类算法,可将提取的特征在高维空间中收敛到若干局部稠密的区域。本实例中得到的每个区域便是一个完整的成块分布的前景物体或背景物体。
S604,获取多个区域各自的光流平均运动速度。
多个区域各自的光流平均运动速度,是多个区域中每个区域在光流场中运动速度的平均值。光流场是视频帧中的所有像素点构成的一 种二维瞬时速度场。
S606,将多个区域各自的光流平均运动速度分别与预设值比较。
其中,预设值是0或者接近于0的数值。服务器将多个区域各自的光流平均运动速度分别与预设值进行数值大小的比较,从而可根据比较结果确定运动目标所在区域。
S608,将多个区域中光流平均运动速度大于预设值的区域确定为运动目标所在区域。
具体地,服务器可将多个区域中光流平均运动速度大于预设值的区域标定为为运动目标所在区域,并将多个区域中光流平均运动速度小于或等于预设值的区域标定为非感兴趣区域。
本实例中,根据提取的特征将特征点进行聚类,得到视频帧中的多个区域,通过将多个区域各自的光流平均运动速度与预设值比较,可高效、准确地判定视频帧中的感兴趣区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。
在一个实例中,特征点的数量少于视频帧的像素点总数。如图7所示,S508具体包括如下步骤:
S702,在视频帧中查找与非特征点的像素点距离最近的特征点。
具体地,服务器可遍历视频帧中每个非特征点的像素点,并计算遍历的像素点与每个特征点之间的距离,从而根据计算的距离查找到与遍历的像素点距离最近的特征点,直至遍历完视频帧中所有非特征点的像素点。
S704,根据查找到的特征点是否属于运动目标所在区域的判断结果,确定非特征点的像素点是否属于运动目标所在区域。
具体地,若查找到的特征点属于运动目标所在区域,则服务器可直接判定相应遍历的像素点也属于运动目标所在区域;若查找到的特 征点不属于运动目标所在区域,则服务器可直接判定相应遍历的像素点也不属于运动目标所在区域。
S706,根据属于运动目标所在区域的像素点确定感兴趣区域。
具体地,服务器在遍历所有非特征点的像素点并确定其是否属于运动目标所在区域后,便可以获知视频帧中每个像素点是否属于感兴趣区域,进而可以根据视频帧中属于运动目标所在区域的像素点确定感兴趣区域,视频帧中属于运动目标所在区域的像素点包括属于运动目标所在区域的特征点和属于运动目标所在区域的非特征点的像素点。
本实例中,利用数量少于视频帧的像素点总数的特征点的是否属于运动目标所在区域的判断结果,估计视频帧中非特征点的像素点是否属于运动目标所在区域,可以利用少量计算高效地确定感兴趣区域,提高了视频编码效率。
在一个实例中,该方法还包括:生成标记视频帧中的每个像素点是否属于感兴趣区域的标记模板。该步骤可在执行步骤S304之后执行。其中,标记模板记录了视频帧中的每个像素点是否属于感兴趣区域的信息。该标记模板具体可以是与视频帧的画面尺寸相同的二维矩阵,该二维矩阵中的元素与视频帧的各个像素点一一对应,该二维矩阵中的每个元素是视频帧中相应的像素点是否属于感兴趣区域的标记。该标记模板中的标记取两个数值,分别表示相应视频帧中的像素点属于感兴趣区域或者不属于感兴趣区域,比如可用“1”和“0”分别表示属于或者不属于感兴趣区域。
在一个实例中,S306包括:将视频帧中由标记模板标记的不属于感兴趣区域的像素点构成的非感兴趣区域进行平滑滤波后,按照由标记模板标记形成的感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。
本实例中,通过标记模板可以简单高效地表达视频帧中的每个像素点是否属于感兴趣区域,从而在处理视频帧的每个像素点时,以标记模板为参考,对感兴趣区域和非感兴趣区域中的像素点进行差异化的编码,可进一步提高视频编码效率。
在一个实例中,视频帧包括左眼视频帧和右眼视频帧;视频码流包括左眼视频码流和右眼视频码流;视频编码方法还包括:将左眼视频码流和右眼视频码流发送到VR终端,使得VR终端将左眼视频码流和右眼视频码流分别解码后同步播放。
具体地,服务器可获取左眼视频帧和右眼视频帧,分别检测左眼视频帧和右眼视频帧中的运动目标,并分别在左眼视频帧和右眼视频帧中将检测到的运动目标所在区域确定为感兴趣区域,分别将左眼视频帧和右眼视频帧中不属于感兴趣区域的非感兴趣区域进行平滑滤波后,按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,分别得到左眼视频码流和右眼视频码流。其中,左眼视频帧和右眼视频帧用于生成视觉三维画面。左眼视频帧和右眼视频帧可以从全景视频中获取。
服务器在编码得到左眼视频码流和右眼视频码流后,将左眼视频码流和右眼视频码流推送到VR终端,使得VR终端将左眼视频码流和右眼视频码流分别解码为左眼视频帧和右眼视频帧后同步播放。通过VR终端自带或者附加的左眼镜片和右眼镜片,VR终端展示的左眼视频帧和右眼视频帧经过用户眼睛形成视觉三维画面。其中VR终端可以是自带左眼镜片、右眼镜片和显示屏的专用VR终端,也可以是手机、平板电脑等移动终端,该移动终端通过附加于移动终端的左眼镜片和右眼镜片再经过用户眼睛形成视觉三维画面。
本实例中,将视频编码为左眼视频码流和右眼视频码流后发送到VR终端,使得VR终端能够还原出左眼视频帧和右眼视频帧并同步 播放,VR终端的使用者可以观看到高质量的三维画面。而且将左眼视频码流和右眼视频码流发送到VR终端对网络资源占用小,可避免VR终端播放时发生卡顿。
如图8所示,在一个实例中,提供了一种视频编码装置800,包括感兴趣区域获取模块810、区域滤波模块820和编码模块830。
感兴趣区域获取模块810,用于获取视频帧;检测视频帧中的运动目标,并在视频帧中将运动目标所在区域确定为感兴趣区域。
区域滤波模块820,用于将视频帧中不属于感兴趣区域的非感兴趣区域进行平滑滤波。
编码模块830,用于按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。
上述视频编码装置800,通过检测视频帧中的运动目标,将运动目标所在区域确定为感兴趣区域,从而将视频帧分为感兴趣区域和非感兴趣区域,该感兴趣区域也是观看者所关注的区域。按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码得到相应的视频码流,即使对于复杂场景的视频,也可以将运动目标所在区域保持较高的画面质量。而且,如果直接降低非感兴趣区域的保真度会带来明显的阶梯波纹或振铃效应等压缩失真,降低画面质量。在编码之前对非感兴趣区域进行平滑滤波,减少高频信息,降低保真度下降而引起的压缩失真,非感兴趣区域被被观察到是模糊的而非充满噪声的,从而提高编码后的视频码流整体的画面质量。再者,通过降低非感兴趣区域的保真度来降低编码后的视频码流对网络资源的占用。
如图9所示,在一个实例中,感兴趣区域获取模块810包括:全局运动补偿模块811,用于获取摄像机运动参数;根据摄像机运动参 数,对视频帧进行全局运动补偿处理。
本实例中,利用估计出的摄像机参数,对视频帧进行全局运动补偿处理,使得经过全局运动补偿处理的视频帧可消除摄像机运动带来的影响,进而可以准确检测出视频帧中运动目标所在区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。
在一个实例中,感兴趣区域获取模块810包括:特征提取模块812和感兴趣区域判断模块813。
特征提取模块812,用于在视频帧的像素点中确定特征点;提取特征点的特征。
感兴趣区域判断模块813,用于根据提取的特征判断特征点是否属于运动目标所在区域;根据属于运动目标所在区域的特征点确定感兴趣区域。
本实例中,依据视频帧中的特征点,判断各个特征点是否属于运动目标所在区域,从而将属于运动目标所在区域的特征点所构成的区域确定为感兴趣区域,可准确检测出视频帧中运动目标所在区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。
在一个实例中,特征提取模块812还用于将视频帧中的每个像素点作为特征点;或者,在视频帧中随机选取预设数量或预设比例的像素点作为特征点;或者,对视频帧中的像素点进行均匀采样,得到特征点。
在一个实例中,提取的特征包括运动特征,还包括空间特征、色彩特征和时间特征中的至少一种。
在一个实例中,感兴趣区域判断模块813还用于根据提取的特征将特征点进行聚类,得到视频帧中的多个区域;获取多个区域各自的 光流平均运动速度;将多个区域各自的光流平均运动速度分别与预设值比较;将多个区域中光流平均运动速度大于预设值的区域确定为运动目标所在区域。
本实例中,根据提取的特征将特征点进行聚类,得到视频帧中的多个区域,通过将多个区域各自的光流平均运动速度与预设值比较,可高效、准确地判定视频帧中的感兴趣区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。
在一个实例中,特征点的数量少于视频帧的像素点总数;感兴趣区域判断模块813还用于在视频帧中查找与非特征点的像素点距离最近的特征点;根据查找到的特征点是否属于运动目标所在区域的判断结果,确定非特征点的像素点是否属于运动目标所在区域;根据属于运动目标所在区域的像素点确定感兴趣区域。
本实例中,利用数量少于视频帧的像素点总数的特征点的是否属于运动目标所在区域的判断结果,估计视频帧中非特征点的像素点是否属于运动目标所在区域,可以利用少量计算高效地确定感兴趣区域,提高了视频编码效率。
在一个实例中,感兴趣区域获取模块810还用于生成标记视频帧中的每个像素点是否属于感兴趣区域的标记模板;
区域滤波模块820还用于将视频帧中由标记模板标记的不属于感兴趣区域的像素点构成的非感兴趣区域进行平滑滤波。
编码模块830还用于按照由标记模板标记形成的感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。
本实例中,通过标记模板可以简单高效地表达视频帧中的每个像素点是否属于感兴趣区域,从而在处理视频帧的每个像素点时,以标 记模板为参考,对感兴趣区域和非感兴趣区域中的像素点进行差异化的编码,可进一步提高视频编码效率。
在一个实例中,视频帧包括左眼视频帧和右眼视频帧;视频码流包括左眼视频码流和右眼视频码流。如图10所示,视频编码装置还包括:视频码流发送模块840,用于将左眼视频码流和右眼视频码流发送到VR终端,使得VR终端将左眼视频码流和右眼视频码流分别解码后同步播放。
本实例中,将视频编码为左眼视频码流和右眼视频码流后发送到VR终端,使得VR终端能够还原出左眼视频帧和右眼视频帧并同步播放,VR终端的使用者可以观看到高质量的三维画面。而且将左眼视频码流和右眼视频码流发送到VR终端对网络资源占用小,可避免VR终端播放时发生卡顿。
本领域普通技术人员可以理解实现上述实例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实例的流程。其中,该存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。
因此,本申请还提供了一种存储介质,其中存储有数据处理程序,该数据处理程序用于执行本申请上述方法的任何一种实例。
以上实例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是, 对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (19)

  1. 一种视频编码方法,包括:
    获取视频帧;
    检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域;
    对所述视频帧中的第二区域进行平滑滤波,所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠;
    按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。
  2. 根据权利要求1所述的方法,其中,所述检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域的步骤之前,所述方法还包括:
    获取摄像机运动参数;
    根据所述摄像机运动参数,对所述视频帧进行全局运动补偿处理。
  3. 根据权利要求1所述的方法,其中,所述检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域包括:
    在所述视频帧的像素点中确定特征点;
    提取所述特征点的特征;
    当提取的所述特征点的特征属于运动目标所在区域时,根据属于所述运动目标所在区域的特征点确定所述第一区域。
  4. 根据权利要求3所述的方法,其中,所述在所述视频帧的像素点中确定特征点包括:
    将所述视频帧中的每个像素点作为特征点;或者,
    在所述视频帧中随机选取预设数量或预设比例的像素点作为特征点;或者,
    对所述视频帧中的像素点进行均匀采样,得到特征点。
  5. 根据权利要求3所述的方法,其中,所述提取的特征包括运动特征,还包括空间特征、色彩特征和时间特征中的至少一种。
  6. 根据权利要求3所述的方法,其中,所述根据提取的特征判断所述特征点是否属于运动目标所在区域包括:
    根据提取的特征将所述特征点分为多个类别,得到所述视频帧中分别对应所述多个类别的多个区域,其中,一个区域包括属于该区域对应的类别的一个或多个特征点;
    获取所述多个区域各自的光流平均运动速度;
    将所述多个区域各自的光流平均运动速度分别与预设值比较;
    将所述多个区域中光流平均运动速度大于预设值的区域确定为运动目标所在区域。
  7. 根据权利要求3所述的方法,其中,所述特征点的数量少于所述视频帧的像素点总数;所述根据属于所述运动目标所在区域的特征点确定第一区域包括:
    在所述视频帧中查找与非所述特征点的像素点距离最近的特征点;
    根据查找到的特征点是否属于所述运动目标所在区域的判断结果,确定所述非所述特征点的像素点是否属于所述运动目标所在区域;
    根据属于所述运动目标所在区域的像素点确定第一区域。
  8. 根据权利要求1所述的方法,其中,所述方法还包括:
    生成标记所述视频帧中的每个像素点是否属于所述第一区域的标记模板;
    所述对所述视频帧中的第二区域进行平滑滤波包括:
    对所述视频帧中由所述标记模板标记的不属于所述第一区域的像素点构成的第二区域进行平滑滤波。
  9. 根据权利要求8所述的方法,其中,所述按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流,包括:
    按照由所述标记模板标记形成的第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。
  10. 根据权利要求1所述的方法,其中,所述视频帧包括左眼视频帧和右眼视频帧;所述视频码流包括左眼视频码流和右眼视频码流;所述方法还包括:
    将所述左眼视频码流和右眼视频码流发送到VR终端,使得所述VR终端将所述左眼视频码流和所述右眼视频码流分别解码后同步播放。
  11. 一种视频编码装置,包括:
    一个或一个以上存储器;
    一个或一个以上处理器;其中,
    所述一个或一个以上存储器存储有一个或者一个以上指令模块,经配置由所述一个或者一个以上处理器执行;其中,
    所述一个或者一个以上指令模块包括:
    第一区域获取模块,用于获取视频帧;检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域;
    区域滤波模块,用于对所述视频帧中的第二区域进行平滑滤波;所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠;
    编码模块,用于按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。
  12. 根据权利要求11所述的装置,其中,所述第一区域获取模块包括:全局运动补偿模块,用于获取摄像机运动参数;根据所述摄像机运动参数,对所述视频帧进行全局运动补偿处理。
  13. 根据权利要求11所述的装置,其中,所述第一区域获取模块包括:
    特征提取模块,用于在所述视频帧的像素点中确定特征点;提取所述特征点的特征;
    第一区域判断模块,用于当提取的所述特征点的特征属于运动目标所在区域时,根据属于所述运动目标所在区域的特征点确定第一区域。14、根据权利要求13所述的装置,其中,所述特征提取模块还用于将所述视频帧中的每个像素点作为特征点;或者,在所述视频帧中随机选取预设数量或预设比例的像素点作为特征点;或者,对所述视频帧中的像素点进行均匀采样,得到特征点。
  14. 根据权利要求13所述的装置,其中,所述第一区域判断模块还用于根据提取的特征将所述特征点分为多个类别,得到所述视频帧中分别对应所述多个类别的多个区域,其中,一个区域包括属于该区域对应的类别的一个或多个特征点;获取所述多个区域各自的光流平均运动速度;将所述多个区域各自的光流平均运动速度分别与预设值比较;将所述多个区域中光流平均运动速度大于预设值的区域确定为运动目标所在区域。
  15. 根据权利要求13所述的装置,其中,所述特征点的数量少于所述视频帧的像素点总数;所述第一区域判断模块还用于在所述视频帧中查找与非所述特征点的像素点距离最近的特征点;根据查找到的特征点是否属于所述运动目标所在区域的判断结果,确定所述非所述特征点的像素点是否属于所述运动目标所在区域;根据属于所述运动目标所在区域的像素点确定第一区域。
  16. 根据权利要求11所述的装置,其中,所述第一区域获取模块还用于生成标记所述视频帧中的每个像素点是否属于所述第一区域的标记模板;
    所述区域滤波模块还用于将所述视频帧中由所述标记模板标记的不属于所述第一区域的像素点构成的第二区域进行平滑滤波。
  17. 根据权利要求17所述的装置,其中,所述编码模块还用于按照由所述标记模板标记形成的所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。
  18. 根据权利要求11所述的装置,其中,所述视频帧包括左眼视频帧和右眼视频帧;所述视频码流包括左眼视频码流和右眼视频码流;所述装置还包括:视频码流发送模块,用于将所述左眼视频码流和右眼视频码流发送到VR终端,使得所述VR终端将所述左眼视频码流和所述右眼视频码流分别解码后同步播放。
  19. 一种非易失性计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行如权利要求1-10任一项所述的方法。
PCT/CN2017/091846 2016-07-08 2017-07-05 视频编码方法和装置 WO2018006825A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610541399.3A CN106162177B (zh) 2016-07-08 2016-07-08 视频编码方法和装置
CN201610541399.3 2016-07-08

Publications (1)

Publication Number Publication Date
WO2018006825A1 true WO2018006825A1 (zh) 2018-01-11

Family

ID=58062467

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/091846 WO2018006825A1 (zh) 2016-07-08 2017-07-05 视频编码方法和装置

Country Status (2)

Country Link
CN (1) CN106162177B (zh)
WO (1) WO2018006825A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360436A (zh) * 2018-11-02 2019-02-19 Oppo广东移动通信有限公司 一种视频生成方法、终端及存储介质
CN110807407A (zh) * 2019-10-30 2020-02-18 东北大学 一种用于视频中高度近似动态目标的特征提取方法
CN111885332A (zh) * 2020-07-31 2020-11-03 歌尔科技有限公司 一种视频存储方法、装置、摄像头及可读存储介质
CN112532917A (zh) * 2020-10-21 2021-03-19 深圳供电局有限公司 一种基于流媒体的一体化智能监控平台
CN112672151A (zh) * 2020-12-09 2021-04-16 北京达佳互联信息技术有限公司 视频处理方法、装置、服务器及存储介质
CN113891019A (zh) * 2021-09-24 2022-01-04 深圳Tcl新技术有限公司 视频编码方法、装置、拍摄设备和存储介质
CN116389761A (zh) * 2023-05-15 2023-07-04 南京邮电大学 一种护理学临床仿真教学数据管理***
CN116684687A (zh) * 2023-08-01 2023-09-01 蓝舰信息科技南京有限公司 基于数字孪生技术的增强可视化教学方法
CN117880520A (zh) * 2024-03-11 2024-04-12 山东交通学院 一种用于机车乘务员值乘标准化监控的数据管理方法

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106162177B (zh) * 2016-07-08 2018-11-09 腾讯科技(深圳)有限公司 视频编码方法和装置
CN108156459A (zh) * 2016-12-02 2018-06-12 北京中科晶上科技股份有限公司 可伸缩视频传输方法及***
US10742999B2 (en) * 2017-01-06 2020-08-11 Mediatek Inc. Methods and apparatus for signaling viewports and regions of interest
CN108965929B (zh) * 2017-05-23 2021-10-15 华为技术有限公司 一种视频信息的呈现方法、呈现视频信息的客户端和装置
CN109743892B (zh) 2017-07-04 2020-10-13 腾讯科技(深圳)有限公司 虚拟现实内容的显示方法和装置
CN107454395A (zh) * 2017-08-23 2017-12-08 上海安威士科技股份有限公司 一种高清网络摄像机及智能码流控制方法
CN109698957B (zh) * 2017-10-24 2022-03-29 腾讯科技(深圳)有限公司 图像编码方法、装置、计算设备及存储介质
CN108063946B (zh) * 2017-11-16 2021-09-24 腾讯科技(成都)有限公司 图像编码方法和装置、存储介质及电子装置
CN108492322B (zh) * 2018-04-04 2022-04-22 南京大学 一种基于深度学习预测用户视场的方法
CN110536138B (zh) * 2018-05-25 2021-11-09 杭州海康威视数字技术股份有限公司 一种有损压缩编码方法、装置和***级芯片
CN108848389B (zh) * 2018-07-27 2021-03-30 恒信东方文化股份有限公司 一种全景视频处理方法及播放***
CN108924629B (zh) * 2018-08-28 2021-01-05 恒信东方文化股份有限公司 一种vr图像处理方法
US11212537B2 (en) * 2019-03-28 2021-12-28 Advanced Micro Devices, Inc. Side information for video data transmission
CN110213587A (zh) * 2019-07-08 2019-09-06 北京达佳互联信息技术有限公司 视频编码方法、装置、电子设备及存储介质
CN110728173A (zh) * 2019-08-26 2020-01-24 华北石油通信有限公司 基于感兴趣目标显著性检测的视频传输方法和装置
CN112261408B (zh) * 2020-09-16 2023-04-25 青岛小鸟看看科技有限公司 用于头戴显示设备的图像处理方法、装置及电子设备
CN112954398B (zh) * 2021-02-07 2023-03-24 杭州网易智企科技有限公司 编码方法、解码方法、装置、存储介质及电子设备
JP2024513036A (ja) * 2021-03-31 2024-03-21 浙江吉利控股集団有限公司 ビデオ画像処理方法、装置、機器及び記憶媒体
CN114339222A (zh) * 2021-12-20 2022-04-12 杭州当虹科技股份有限公司 视频编码方法
CN115297289B (zh) * 2022-10-08 2022-12-23 南通第二世界网络科技有限公司 一种监控视频高效存储方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101164341A (zh) * 2005-03-01 2008-04-16 高通股份有限公司 用于视频电话的质量度量偏移的关注区编码
CN101339602A (zh) * 2008-07-15 2009-01-07 中国科学技术大学 一种基于光流法的视频火灾烟雾图像识别方法
CN101341494A (zh) * 2005-10-05 2009-01-07 高通股份有限公司 基于视频帧运动的自动关注区检测
CN104160703A (zh) * 2012-01-26 2014-11-19 苹果公司 经对象检测所通知的编码
US20160021372A1 (en) * 2002-01-05 2016-01-21 Samsung Electronics Co., Ltd. Image coding and decoding method and apparatus considering human visual characteristics
CN106162177A (zh) * 2016-07-08 2016-11-23 腾讯科技(深圳)有限公司 视频编码方法和装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4375452B2 (ja) * 2007-07-18 2009-12-02 ソニー株式会社 画像処理装置、画像処理方法、及びプログラム、並びに表示装置
CN101102495B (zh) * 2007-07-26 2010-04-07 武汉大学 一种基于区域的视频图像编解码方法和装置
CN101882316A (zh) * 2010-06-07 2010-11-10 深圳市融创天下科技发展有限公司 一种图像区域划分/编码方法、装置及***
CN104125470B (zh) * 2014-08-07 2017-06-06 成都瑞博慧窗信息技术有限公司 一种视频数据传输方法
CN105100771A (zh) * 2015-07-14 2015-11-25 山东大学 一种基于场景分类和几何标注的单视点视频深度获取方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160021372A1 (en) * 2002-01-05 2016-01-21 Samsung Electronics Co., Ltd. Image coding and decoding method and apparatus considering human visual characteristics
CN101164341A (zh) * 2005-03-01 2008-04-16 高通股份有限公司 用于视频电话的质量度量偏移的关注区编码
CN101341494A (zh) * 2005-10-05 2009-01-07 高通股份有限公司 基于视频帧运动的自动关注区检测
CN101339602A (zh) * 2008-07-15 2009-01-07 中国科学技术大学 一种基于光流法的视频火灾烟雾图像识别方法
CN104160703A (zh) * 2012-01-26 2014-11-19 苹果公司 经对象检测所通知的编码
CN106162177A (zh) * 2016-07-08 2016-11-23 腾讯科技(深圳)有限公司 视频编码方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHAO, YAXIANG ET AL.: "Global motion estimation method with motion vectors and pixel recursion", JOURNAL OF IMAGE AND GRAPHICS, vol. 17, no. 2, 29 February 2012 (2012-02-29) *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360436A (zh) * 2018-11-02 2019-02-19 Oppo广东移动通信有限公司 一种视频生成方法、终端及存储介质
CN110807407A (zh) * 2019-10-30 2020-02-18 东北大学 一种用于视频中高度近似动态目标的特征提取方法
CN110807407B (zh) * 2019-10-30 2023-04-18 东北大学 一种用于视频中高度近似动态目标的特征提取方法
CN111885332A (zh) * 2020-07-31 2020-11-03 歌尔科技有限公司 一种视频存储方法、装置、摄像头及可读存储介质
CN112532917B (zh) * 2020-10-21 2023-04-14 深圳供电局有限公司 一种基于流媒体的一体化智能监控平台
CN112532917A (zh) * 2020-10-21 2021-03-19 深圳供电局有限公司 一种基于流媒体的一体化智能监控平台
CN112672151A (zh) * 2020-12-09 2021-04-16 北京达佳互联信息技术有限公司 视频处理方法、装置、服务器及存储介质
CN112672151B (zh) * 2020-12-09 2023-06-20 北京达佳互联信息技术有限公司 视频处理方法、装置、服务器及存储介质
CN113891019A (zh) * 2021-09-24 2022-01-04 深圳Tcl新技术有限公司 视频编码方法、装置、拍摄设备和存储介质
CN116389761A (zh) * 2023-05-15 2023-07-04 南京邮电大学 一种护理学临床仿真教学数据管理***
CN116389761B (zh) * 2023-05-15 2023-08-08 南京邮电大学 一种护理学临床仿真教学数据管理***
CN116684687A (zh) * 2023-08-01 2023-09-01 蓝舰信息科技南京有限公司 基于数字孪生技术的增强可视化教学方法
CN116684687B (zh) * 2023-08-01 2023-10-24 蓝舰信息科技南京有限公司 基于数字孪生技术的增强可视化教学方法
CN117880520A (zh) * 2024-03-11 2024-04-12 山东交通学院 一种用于机车乘务员值乘标准化监控的数据管理方法
CN117880520B (zh) * 2024-03-11 2024-05-10 山东交通学院 一种用于机车乘务员值乘标准化监控的数据管理方法

Also Published As

Publication number Publication date
CN106162177B (zh) 2018-11-09
CN106162177A (zh) 2016-11-23

Similar Documents

Publication Publication Date Title
WO2018006825A1 (zh) 视频编码方法和装置
US11501507B2 (en) Motion compensation of geometry information
US7203356B2 (en) Subject segmentation and tracking using 3D sensing technology for video compression in multimedia applications
US20210279971A1 (en) Method, storage medium and apparatus for converting 2d picture set to 3d model
CN103002289B (zh) 面向监控应用的视频恒定质量编码装置及其编码方法
WO2018010653A1 (zh) 全景媒体文件推送方法及装置
Yang et al. An objective assessment method based on multi-level factors for panoramic videos
CN110381268B (zh) 生成视频的方法,装置,存储介质及电子设备
KR20130115332A (ko) 증강 현실 표현을 위한 2차원 이미지 캡쳐
WO2018040982A1 (zh) 一种用于增强现实的实时图像叠加方法及装置
JP2008547097A (ja) イメージセグメンテーション
Sharma et al. A flexible architecture for multi-view 3DTV based on uncalibrated cameras
CN111476710A (zh) 基于移动平台的视频换脸方法及***
CN109698957A (zh) 图像编码方法、装置、计算设备及存储介质
US20170116741A1 (en) Apparatus and Methods for Video Foreground-Background Segmentation with Multi-View Spatial Temporal Graph Cuts
Wang et al. Deep unsupervised 3d sfm face reconstruction based on massive landmark bundle adjustment
JP2009212605A (ja) 情報処理方法、情報処理装置及びプログラム
Zhang et al. A real-time time-consistent 2D-to-3D video conversion system using color histogram
Jacobson et al. Scale-aware saliency for application to frame rate upconversion
US20230281921A1 (en) Methods of 3d clothed human reconstruction and animation from monocular image
Chittapur et al. Video forgery detection using motion extractor by referring block matching algorithm
CN114677423A (zh) 一种室内空间全景深度确定方法及相关设备
CN113657190A (zh) 人脸图片的驱动方法及相关模型的训练方法、相关装置
Tsai et al. A novel method for 2D-to-3D video conversion based on boundary information
CN108108794B (zh) 一种基于二维码图像隐藏的可视化信息增强方法和***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17823637

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17823637

Country of ref document: EP

Kind code of ref document: A1