WO2018006825A1 - Video coding method and apparatus - Google Patents

Video coding method and apparatus Download PDF

Info

Publication number
WO2018006825A1
WO2018006825A1 PCT/CN2017/091846 CN2017091846W WO2018006825A1 WO 2018006825 A1 WO2018006825 A1 WO 2018006825A1 CN 2017091846 W CN2017091846 W CN 2017091846W WO 2018006825 A1 WO2018006825 A1 WO 2018006825A1
Authority
WO
WIPO (PCT)
Prior art keywords
region
video frame
video
area
moving target
Prior art date
Application number
PCT/CN2017/091846
Other languages
French (fr)
Chinese (zh)
Inventor
万千
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018006825A1 publication Critical patent/WO2018006825A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field

Definitions

  • the present application relates to the field of video processing technologies, and in particular, to a video encoding method and apparatus.
  • a video is a form of data related to a moving image. It usually consists of a series of video frames. The video frames can be played continuously to display the dynamic images in the video. Through video coding, a video format file can be converted into a video stream suitable for transmission by using a specific compression technique.
  • the application provides a video encoding method, including:
  • the application provides a video encoding apparatus, including:
  • One or more memories are One or more memories
  • One or more processors among them,
  • the one or more memories storing one or more instruction modules configured to be executed by the one or more processors;
  • the one or more instruction modules include:
  • a region of interest acquisition module configured to acquire a video frame, detect a moving target in the video frame, and determine, in the video frame, a region where the moving target is located as a first region;
  • a region filtering module configured to perform smooth filtering on a second region in the video frame;
  • the video frame includes the first region and the second region, and between the first region and the second region No overlap;
  • an encoding module configured to encode the video frame according to an encoding manner that the fidelity of the first region is higher than the fidelity of the second region, to obtain a video bitstream.
  • the present application also proposes a non-transitory computer readable storage medium storing computer readable instructions that enable at least one processor to perform the above method.
  • FIG. 1 is an application environment diagram of a video encoding system in an example
  • 2A is a schematic diagram showing the internal structure of a server in an example
  • 2B is a schematic diagram showing the internal structure of a terminal in an example
  • 3A is a schematic flow chart of a video encoding method in an example
  • FIG. 3B is a schematic flowchart of a video encoding method in an example
  • FIG. 4 is a flow chart showing the steps of global motion compensation for a video frame in an example
  • FIG. 5 is a flow chart showing a step of detecting a moving target in a video frame and determining a region in which the moving target is located as a region of interest in the video frame;
  • FIG. 6 is a schematic flow chart showing the steps of determining whether a feature point belongs to an area where a moving target is located according to the extracted feature
  • FIG. 7 is a flow chart showing the steps of determining a region of interest according to feature points belonging to a region where a moving target is located in an example
  • Figure 8 is a block diagram showing the structure of a video encoding apparatus in an example
  • FIG. 9 is a structural block diagram of an area of interest acquisition module in an example
  • Figure 10 is a block diagram showing the structure of a video encoding apparatus in another example.
  • the inventor found that the current video coding technology is suitable for the encoding of normal scene video.
  • the video is vigorously moved and the details are rich.
  • various reasons such as uneven illumination often lead to difficulty in controlling the quality of the encoded video stream picture, or to ensure that the video stream generated by the encoding consumes too much network resources, and is not suitable. Transmission, so the current video coding method is difficult to balance the picture quality and the occupation of network resources.
  • the present application provides a video coding method for the technical problem that the video code stream after video coding is difficult to balance the picture quality and the occupation of network resources.
  • FIG. 1 is an application environment diagram of a video encoding system in an example.
  • the video encoding system includes a server 110 and a terminal 120.
  • the server 110 may be configured to acquire a video frame of the video; detect a moving target in the video frame, and determine a region where the moving target is located as a region of interest in the video frame; smooth the non-interest region of the video frame that does not belong to the region of interest After filtering, the video frame is encoded according to the fidelity of the region of interest higher than the fidelity of the non-interest region, and the video stream is obtained.
  • the server 110 can transmit the video code stream to the terminal 120 over the network.
  • the server includes a processor, a non-volatile storage medium, an internal memory, and a network interface connected by a system bus.
  • the non-volatile storage medium of the server stores an operating system, a database, and a video encoding device.
  • the database may store parameters required for video encoding, and the video encoding device is used to implement a video encoding method.
  • the server's processor is used to provide computing and control capabilities that support the operation of the entire server.
  • the internal memory of the server provides an environment for operation of a video encoding device in a non-volatile storage medium, and the internal memory can store computer readable instructions that, when executed by the processor, cause the processor to execute Video coding method.
  • the network interface of the server is used to communicate with an external terminal via a network connection, to send a video stream to the terminal, and the like.
  • the server can be implemented with a stand-alone server or a server cluster consisting of multiple servers.
  • the specific server may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • the terminal includes a processor connected through a system bus, a non-volatile storage medium, an internal memory, a network interface, and a display screen.
  • the non-volatile storage medium of the terminal stores an operating system, and further stores a video decoding device, and the video decoding device is used to implement a video decoding method.
  • the processor is used to provide computing and control capabilities to support the operation of the entire terminal.
  • the internal memory in the terminal provides an environment for operation of the video decoding device in the non-volatile storage medium, and the internal memory can store computer readable instructions that, when executed by the processor, cause the processor to execute A video decoding method.
  • the network interface is used for network communication with the server, such as receiving a video stream sent by the server.
  • the display screen of the terminal may be a liquid crystal display or an electronic ink display screen.
  • the input device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the terminal housing, or may be an external device. Keyboard, trackpad or mouse.
  • the terminal can be a mobile phone, a tablet computer, a personal digital assistant, or a VR (Virtual Reality) terminal.
  • FIG. 2B is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied.
  • the specific terminal may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
  • FIG. 3A is a schematic flow chart of a video encoding method in an example. This example is illustrated by the method applied to the server 110 of FIG. 1 described above. As shown in FIG. 3A, the method includes the following steps:
  • S304A detecting a moving target in the video frame, and determining, in the video frame, an area where the moving target is located as the first area.
  • S306A Perform smoothing filtering on a second area in the video frame, where the video frame includes the first area and the second area, and there is no overlap between the first area and the second area.
  • the video frame is encoded according to a coding manner in which the fidelity of the first area is higher than the fidelity of the second area, to obtain a video code stream.
  • FIG. 3B is a schematic flow chart of a video encoding method in an example. This example is illustrated by the method applied to the server 110 of FIG. 1 described above. As shown in FIG. 3B, the method specifically includes the following steps:
  • the video frame is a component unit of the video to be encoded, and the video frame is displayed in order to realize video playback.
  • the server may sequentially acquire the video frames in the order of the video frames in the video to be encoded.
  • the obtained video frame is a key frame
  • S304 is directly performed on the acquired video frame
  • the obtained video frame is a transition frame
  • the complete video frame may be calculated according to the key frame on which the transition frame depends. After that, S304 is performed on the complete video frame.
  • the key frame refers to a video frame containing complete picture information
  • the transition frame is a video frame containing incomplete picture information calculated based on the key frame.
  • the moving target is the element of motion in the picture represented by the video frame, which is the foreground of the video frame; and the element in the video frame that is still or nearly prohibited is the background of the video frame.
  • Moving objects such as people whose position or posture changes, moving vehicles, or moving lights.
  • a Region Of Interest (ROI) is an area that needs to be processed from a processed image in a frame, a circle, an ellipse, or an irregular polygon.
  • the server may perform motion target detection on the video frame, and detect a region where the moving target is located in the video frame, thereby determining the region as the region of interest. Since the region of interest is the region in which the moving object is located in the video frame, the region of interest is also the region of the video frame that is of interest to the video viewer relative to the non-interest region.
  • the server detects moving targets in the video frame, and specifically adopts an interframe difference method, a background subtraction method, and an optical flow based moving target detection algorithm.
  • the background subtraction method learns the law of background disturbance by counting the changes of several video frames before.
  • the main idea of the interframe difference method is to detect the region where motion occurs by using the difference of two consecutive frames or three bits in the video image sequence.
  • the interframe difference method is characterized by strong dynamics and can adapt to moving target detection under dynamic background.
  • the moving object detection algorithm based on optical flow is to calculate the motion state vector of each pixel point by using the optical flow equation, thereby finding the moving pixel point, and then detecting the moving target area.
  • the non-interest area refers to an area outside the region of interest in the video frame.
  • Smoothing filtering for non-interest regions is a process of smoothly transitioning pixel values of respective pixel points in a non-interest region.
  • the fidelity is a quantized value that measures the degree of similarity between the decoded video stream and the original video frame before encoding. The higher the fidelity, the higher the similarity, and the encoded video stream quality. The smaller the loss; the lower the fidelity, the lower the similarity, and the greater the image quality loss of the encoded video stream.
  • the above-mentioned region of interest may also be referred to as a first region, and the non-inductive region may be referred to as a second region.
  • the video frame includes the first area and the second area, and there is no overlap between the first area and the second area.
  • the smoothing filtering may adopt a method such as mean filtering, median filtering, or Gaussian filtering. If the mean filtering is used, the server can replace the pixel value of each pixel in the non-interest area with the pixel value mean in the neighborhood of the pixel. If median filtering is used, the server may replace the pixel value of each pixel in the non-interest area with the intermediate value of the pixel value in the neighborhood, and the intermediate value is the pixel value of the neighborhood sorted by the pixel value. The pixel value in the middle position.
  • mean filtering the server can replace the pixel value of each pixel in the non-interest area with the pixel value mean in the neighborhood of the pixel.
  • median filtering the server may replace the pixel value of each pixel in the non-interest area with the intermediate value of the pixel value in the neighborhood, and the intermediate value is the pixel value of the neighborhood sorted by the pixel value. The pixel value in the middle position.
  • the server can be in the non-interest area
  • the pixel values of the respective pixels are replaced by a weighted average of the respective pixel values within the neighborhood of the pixel, and the weights for calculating the weighted average are subject to a normal distribution.
  • the server can adjust the quantization parameter of the region of interest and the region of non-interest region (Quantizaion Parameter) to achieve the fidelity of the region of interest is higher than the fidelity of the non-interest region.
  • the quantization parameter is a parameter used when the video frame is quantized and encoded.
  • the quantization parameter is negatively correlated with the fidelity.
  • the quantization parameter takes the minimum value, the quantization is the finest.
  • the quantization parameter takes the maximum value, the quantization is the coarsest.
  • the server may specifically encode according to the coding mode of the quantization parameter of the region of interest lower than the quantization parameter of the non-region of interest, thereby realizing the coding mode of the fidelity of the region of interest higher than the fidelity of the non-interest region.
  • the server may adjust the resolution of the region of interest and the region of non-interest to achieve a higher fidelity of the region of interest than the fidelity of the region of interest, specifically The encoding method with a lower resolution than the resolution of the non-region of interest.
  • the resolution of a certain area of the video frame refers to the number of pixels included in the unit area in the area.
  • the video coding method by detecting a moving target in a video frame, determining a region where the moving target is located as a region of interest, thereby dividing the video frame into a region of interest and a region of non-interest, which is also of interest to the viewer. region.
  • the video frame is encoded to obtain a corresponding video stream, and even for the video of the complex scene, the region of the moving target can be maintained. High picture quality.
  • the fidelity of the non-interest area is directly reduced, compression distortion such as step ripple or ringing effect is brought about, and the picture quality is lowered.
  • the video encoding method further includes the step of global motion compensation of the video frame. Assuming that the video frame of the video is taken by the camera, the motion of the camera will cause the overall motion of the video frame picture, while some static backgrounds in the video frame are not moving, so the global motion compensation of the video frame is performed here to repair the camera motion. The effect on the overall picture of the video frame, so as to avoid detecting errors when detecting moving targets or even detecting that the entire frame of the video frame is moving.
  • FIG. 4 is a flow chart showing the steps of global motion compensation for a video frame in an example. As shown in FIG. 4, the step specifically includes the following steps:
  • the subsequent processing of the video frame in this example requires only pure object motion, so it is necessary to first estimate the camera motion parameter and then use the camera motion parameter. Fix video frames to achieve global motion compensation for video frames.
  • Camera motion parameters can be estimated using a variety of methods, such as M-estimater (M-estimator), least squares, or ant colony algorithms.
  • M-estimater M-estimator
  • the two axes can be orthogonal.
  • S404 Perform global motion compensation processing on the video frame according to the camera motion parameter.
  • the motion of the camera is the dominant amount of the observed motion, whereby the camera motion parameters can be estimated and the original video corrected according to the camera motion parameters.
  • Frame which gets a video frame with only object motion. If the camera is modeled using a two-dimensional affine model, the server can calculate the video frame processed by the global motion compensation according to the following formula (2):
  • I(s) is the positional coordinates of the camera in two axial directions
  • the global motion compensation processing is performed on the video frame by using the estimated camera parameters, so that the video frame processed by the global motion compensation can eliminate the influence of the camera motion, and thus the region of the moving target in the video frame can be accurately detected.
  • the region of interest is the real moving target area, to ensure that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • S304 includes the following steps:
  • S502 specifically includes: using each pixel in the video frame as a feature point; or randomly selecting a preset number or a preset ratio of pixel points as feature points in the video frame; or, in a video frame The pixels are uniformly sampled to obtain feature points.
  • the server may use all the pixel points in the video frame as feature points, or select a partial pixel point from the video frame as a feature point by using a set rule.
  • the preset ratio refers to the ratio of the number of feature points to the total number of pixels of the video frame.
  • the pixel points in the video frame are uniformly sampled, specifically, the pixel points are selected as feature points in every two preset pixels in the two axial directions of the video frame.
  • a preset number or a preset ratio of pixel points is randomly selected as a feature point in a video frame, or when a pixel point in a video frame is uniformly sampled to obtain a feature point, the number of feature points is less than the pixel point of the video frame. total.
  • the extracted features include motion features, and further include at least one of spatial features, color features, and temporal features.
  • the motion feature is a feature that characterizes the motion characteristics of the feature point.
  • the feature point at time t is i t (x, y).
  • x and y are the positional coordinates in the two axial directions of the feature point i t , respectively.
  • the spatial feature is a feature that characterizes the spatial position of the feature point relative to the video frame.
  • the color feature is a feature that characterizes the color characteristics of the feature point, and the pixel value of the feature point can constitute a color feature.
  • the server can also convert the video frame into the YUV color mode, and then set the pixel value y t (x, y), u t (x, y) and v of each component of the feature point i t (x, y) in the YUV color mode.
  • the YUV color mode is more sensitive to color changes, and the ability of the extracted features to express the color characteristics of the feature points can be improved.
  • the temporal feature is a feature that characterizes the temporal variation characteristic of the feature point, and the color feature at the next time t+1 of the feature point i t (x, y) can be used as the temporal feature of the present time t.
  • S506. Determine, according to the extracted feature, whether the feature point belongs to an area where the moving target is located.
  • the server may input the extracted features to the trained classifier, and the classifier outputs whether the feature points belong to the classification result of the region where the moving target is located, thereby determining whether the feature points belong to the region where the moving target is located.
  • the server may also cluster feature points to obtain multiple regions in the video frame, and then determine whether each of the multiple regions is the region where the moving target is located.
  • the server may use an area surrounded by feature points determined to belong to the area where the moving object is located as the area of interest. If the number of feature points is less than the total number of pixels of the video frame, the server may estimate whether the pixel points of the non-feature points in the video frame belong to the area where the moving target is located according to the determination result of whether the feature point belongs to the area where the moving target is located.
  • each feature point belongs to the region where the moving target is located, so that the region formed by the feature points belonging to the region where the moving target is located is determined as the region of interest, and the video frame can be accurately detected.
  • the area where the moving target is located ensures that the region of interest is the real moving target area, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • S506 includes the following steps:
  • the feature points are divided into a plurality of categories according to the extracted features, and multiple regions corresponding to the plurality of categories respectively in the video frame are obtained, wherein one region includes categories corresponding to the regions.
  • One or more feature points are divided into a plurality of categories according to the extracted features, and multiple regions corresponding to the plurality of categories respectively in the video frame are obtained, wherein one region includes categories corresponding to the regions.
  • the server may cluster the feature points into a plurality of categories according to the extracted features, and the feature points of each category form corresponding regions, thereby obtaining a plurality of regions in the video frame.
  • the server may be clustered by k-means clustering algorithm, hierarchical clustering algorithm, SOM (Self-organizing feature map) clustering algorithm or Meanshift (mean shift) clustering algorithm.
  • SOM Self-organizing feature map
  • Meanshift meanshift
  • the average moving velocity of the optical flow of each of the plurality of regions is an average value of the moving speed of each of the plurality of regions in the optical flow field.
  • the optical flow field is one of all the pixels in the video frame.
  • a two-dimensional instantaneous velocity field is one of all the pixels in the video frame.
  • S606 Compare respective average optical motion speeds of the plurality of regions with preset values.
  • the preset value is 0 or a value close to 0.
  • the server compares the average moving speeds of the respective optical flows of the plurality of regions with the preset values, so that the region of the moving target can be determined according to the comparison result.
  • S608 Determine an area in which the average moving speed of the optical flow in the plurality of areas is greater than a preset value as the area where the moving target is located.
  • the server may calibrate the area in which the average moving speed of the optical flow in the plurality of areas is greater than the preset value to the area where the moving target is located, and mark the area in which the average moving speed of the optical flow in the multiple areas is less than or equal to the preset value as Non-interest area.
  • the feature points are clustered according to the extracted features to obtain a plurality of regions in the video frame, and the video frames can be efficiently and accurately determined by comparing the average motion speed of each of the plurality of regions with a preset value.
  • the region of interest ensures that the region of interest is the region of the real moving target, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • the number of feature points is less than the total number of pixels of the video frame.
  • S508 specifically includes the following steps:
  • the server may traverse the pixel points of each non-feature point in the video frame, and calculate the distance between the traversed pixel points and each feature point, so as to find the feature closest to the traversed pixel point according to the calculated distance. Point until the pixel points of all non-feature points in the video frame are traversed.
  • the server may directly determine that the pixel point corresponding to the traversal belongs to the area where the moving target is located; If the sign is not in the area where the moving target is located, the server can directly determine that the corresponding traversed pixel does not belong to the moving target area.
  • the server can know whether each pixel in the video frame belongs to the region of interest, and then according to the moving target in the video frame.
  • the pixel points of the region determine the region of interest
  • the pixel points of the video frame belonging to the region where the moving target is located include the feature points belonging to the region where the moving target is located and the pixel points of the non-feature points belonging to the region where the moving target is located.
  • the pixel points of the non-feature points in the video frame belong to the region where the moving target is located, and the number of the feature points of the total number of pixels of the video frame is determined by the determination result of the region of the moving target.
  • the area of interest is determined to improve the video coding efficiency.
  • the method further includes generating a tag template for each pixel in the tagged video frame that belongs to the region of interest.
  • This step can be performed after performing step S304.
  • the markup template records information about whether each pixel in the video frame belongs to the region of interest.
  • the marking template may specifically be a two-dimensional matrix having the same screen size as the video frame, and the elements in the two-dimensional matrix are in one-to-one correspondence with the respective pixel points of the video frame, and each element in the two-dimensional matrix is corresponding in the video frame. Whether the pixel is a marker of the region of interest.
  • the mark in the markup template takes two values, which respectively indicate that the pixel points in the corresponding video frame belong to the region of interest or do not belong to the region of interest, for example, "1" and "0" can be used to indicate that they belong to or not belong to the region of interest.
  • S306 includes: smoothing the non-region of interest formed by the pixel of the region of the video frame not marked by the region of interest in the video frame, according to the fidelity of the region of interest formed by the marker template marker
  • the coding method of the fidelity of the non-interest area is encoded, and the video frame is encoded to obtain a video code stream.
  • the video frame includes a left-eye video frame and a right-eye video frame
  • the video code stream includes a left-eye video stream and a right-eye video stream
  • the video encoding method further includes: a left-eye video stream and a right-eye video.
  • the code stream is sent to the VR terminal, so that the VR terminal separately decodes the left-eye video stream and the right-eye video stream and plays the same.
  • the server may acquire a left-eye video frame and a right-eye video frame, respectively detect motion targets in the left-eye video frame and the right-eye video frame, and respectively detect the motions in the left-eye video frame and the right-eye video frame.
  • the target area is determined as the region of interest, and the non-interest region of the left-eye video frame and the right-eye video frame that does not belong to the region of interest is smoothed and filtered, and the fidelity of the region of interest is higher than the non-region of interest.
  • the fidelity encoding method encodes the video frame to obtain a left-eye video stream and a right-eye video stream, respectively. Among them, the left eye video frame and the right eye video frame are used to generate a visual three-dimensional picture.
  • the left eye video frame and the right eye video frame can be obtained from the panoramic video.
  • the server pushes the left-eye video stream and the right-eye video stream to the VR terminal, so that the VR terminal separately separates the left-eye video stream and the right-eye video stream. Decoded into a left-eye video frame and a right-eye video frame and played synchronously.
  • the left-eye video frame and the right-eye video frame displayed by the VR terminal form a visual three-dimensional picture through the user's eyes through the left and right glasses of the VR terminal.
  • the VR terminal may be a dedicated VR terminal with a left eyeglass lens, a right eyeglass lens and a display screen, or may be a mobile terminal, a tablet computer and the like, and the mobile terminal passes through the left eyeglass lens and the right eyeglass lens attached to the mobile terminal.
  • a visual three-dimensional picture is formed through the eyes of the user.
  • the video is encoded into a left-eye video stream and a right-eye video stream, and then sent to the VR terminal, so that the VR terminal can restore the left-eye video frame and the right-eye video frame and synchronize. Play, the user of the VR terminal can view high quality 3D images. Moreover, the left-eye video code stream and the right-eye video code stream are sent to the VR terminal to occupy a small amount of network resources, which can prevent the VR terminal from being stuck when playing.
  • a video encoding apparatus 800 including a region of interest acquisition module 810, a region filtering module 820, and an encoding module 830.
  • the region of interest acquisition module 810 is configured to acquire a video frame, detect a moving target in the video frame, and determine a region where the moving target is located as a region of interest in the video frame.
  • the region filtering module 820 is configured to perform smooth filtering on non-regions of interest in the video frame that do not belong to the region of interest.
  • the encoding module 830 is configured to encode the video frame according to a coding manner in which the fidelity of the region of interest is higher than the fidelity of the non-interest region, to obtain a video code stream.
  • the video encoding apparatus 800 determines a moving target in the video frame to determine a region where the moving target is located as a region of interest, thereby dividing the video frame into a region of interest and a region of non-interest, which is also of interest to the viewer. Area.
  • the video frame is encoded to obtain a corresponding video stream, and even for the video of the complex scene, the region of the moving target can be maintained. High picture quality.
  • the fidelity of the non-interest area is directly reduced, compression distortion such as step ripple or ringing effect is brought about, and the picture quality is lowered.
  • the region of interest acquisition module 810 includes: a global motion compensation module 811 for acquiring camera motion parameters; Number, global motion compensation processing for video frames.
  • the global motion compensation processing is performed on the video frame by using the estimated camera parameters, so that the video frame processed by the global motion compensation can eliminate the influence of the camera motion, and thus the region of the moving target in the video frame can be accurately detected.
  • the region of interest is the real moving target area, to ensure that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • the region of interest acquisition module 810 includes a feature extraction module 812 and a region of interest determination module 813.
  • the feature extraction module 812 is configured to determine a feature point in a pixel of the video frame; and extract a feature of the feature point.
  • the region of interest judging module 813 is configured to determine, according to the extracted feature, whether the feature point belongs to the region where the moving target is located; and determine the region of interest according to the feature point belonging to the region where the moving target is located.
  • each feature point belongs to the region where the moving target is located, so that the region formed by the feature points belonging to the region where the moving target is located is determined as the region of interest, and the video frame can be accurately detected.
  • the area where the moving target is located ensures that the region of interest is the real moving target area, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • the feature extraction module 812 is further configured to use each pixel point in the video frame as a feature point; or randomly select a preset number or a preset ratio of pixel points as feature points in the video frame; or, The pixels in the video frame are uniformly sampled to obtain feature points.
  • the extracted features include motion features, and further include at least one of spatial features, color features, and temporal features.
  • the region of interest determination module 813 is further configured to cluster the feature points according to the extracted features to obtain multiple regions in the video frame; The average moving speed of the optical flow; comparing the average moving speeds of the optical flows of the plurality of regions with the preset values respectively; determining the region where the average moving velocity of the optical flow in the plurality of regions is greater than the preset value is the region where the moving target is located.
  • the feature points are clustered according to the extracted features to obtain a plurality of regions in the video frame, and the video frames can be efficiently and accurately determined by comparing the average motion speed of each of the plurality of regions with a preset value.
  • the region of interest ensures that the region of interest is the region of the real moving target, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
  • the number of feature points is less than the total number of pixel points of the video frame; the region of interest determination module 813 is further configured to search for a feature point in the video frame that is closest to the pixel point of the non-feature point; according to the found feature Whether the point belongs to the judgment result of the region where the moving target is located, determines whether the pixel point of the non-feature point belongs to the region where the moving target is located; and determines the region of interest according to the pixel point belonging to the region where the moving target is located.
  • the pixel points of the non-feature points in the video frame belong to the region where the moving target is located, and the number of the feature points of the total number of pixels of the video frame is determined by the determination result of the region of the moving target.
  • the area of interest is determined to improve the video coding efficiency.
  • the region of interest acquisition module 810 is further configured to generate a tag template for each pixel in the tagged video frame to belong to the region of interest;
  • the region filtering module 820 is further configured to perform smooth filtering on the non-region of interest composed of the pixel points not marked by the region of interest marked by the mark template in the video frame.
  • the encoding module 830 is further configured to encode the video frame according to an encoding manner in which the fidelity of the region of interest formed by the marking template mark is higher than the fidelity of the non-interest region, to obtain a video bitstream.
  • the template by marking the template, it is simple and efficient to express whether each pixel in the video frame belongs to the region of interest, so that when processing each pixel of the video frame,
  • the template is used as a reference, and the coding of the pixels in the region of interest and the non-interest region is differentiated, which can further improve the video coding efficiency.
  • the video frame includes a left eye video frame and a right eye video frame;
  • the video code stream includes a left eye video stream and a right eye video stream.
  • the video encoding apparatus further includes: a video code stream sending module 840, configured to send the left-eye video stream and the right-eye video stream to the VR terminal, so that the VR terminal sets the left-eye video stream and the right eye.
  • the video streams are decoded separately and played synchronously.
  • the video is encoded into a left-eye video stream and a right-eye video stream, and then sent to the VR terminal, so that the VR terminal can restore the left-eye video frame and the right-eye video frame and play the same, and the VR terminal user can Watch high quality 3D images.
  • the left-eye video code stream and the right-eye video code stream are sent to the VR terminal to occupy a small amount of network resources, which can prevent the VR terminal from being stuck when playing.
  • the program when executed, may include a flow of an instance of each of the methods described above.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.
  • the present application also provides a storage medium in which is stored a data processing program for performing any of the above-described methods of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present application relates to a video coding method and apparatus. The method comprises: acquiring a video frame; detecting a moving target in the video frame, and determining a region in which the moving target is located as a first region in the video frame; and performing smooth filtering on a second region in the video frame, and coding the video frame in a coding mode that the fidelity of the first region is higher than the fidelity of the second region, so as to obtain a video code stream.

Description

视频编码方法和装置Video coding method and device
本申请要求于2016年07月08日提交中国专利局、申请号为201610541399.3、发明名称为“视频编码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. PCT Application No. No. No. No. No. No. No. No.
技术领域Technical field
本申请涉及视频处理技术领域,特别是涉及一种视频编码方法和装置。The present application relates to the field of video processing technologies, and in particular, to a video encoding method and apparatus.
背景background
视频是涉及动态影像的数据形式,通常包括一系列的视频帧,将视频帧连续播放就可以实现展示视频中的动态影像。通过视频编码,可以利用特定的压缩技术,将一种视频格式文件转换成适于传输的视频码流。A video is a form of data related to a moving image. It usually consists of a series of video frames. The video frames can be played continuously to display the dynamic images in the video. Through video coding, a video format file can be converted into a video stream suitable for transmission by using a specific compression technique.
技术内容Technical content
本申请提供了一种视频编码方法,包括:The application provides a video encoding method, including:
获取视频帧;Get a video frame;
检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域;Detecting a moving target in the video frame, and determining, in the video frame, the area where the moving target is located as the first area;
对所述视频帧中第二区域进行平滑滤波,所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠;Smoothing filtering a second region in the video frame, the video frame including the first region and the second region, and there is no overlap between the first region and the second region;
按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。 And encoding the video frame according to an encoding manner that the fidelity of the first region is higher than the fidelity of the second region, to obtain a video bitstream.
本申请提供了一种视频编码装置,包括:The application provides a video encoding apparatus, including:
一个或一个以上存储器;One or more memories;
一个或一个以上处理器;其中,One or more processors; among them,
所述一个或一个以上存储器存储有一个或者一个以上指令模块,经配置由所述一个或者一个以上处理器执行;其中,The one or more memories storing one or more instruction modules configured to be executed by the one or more processors; wherein
所述一个或者一个以上指令模块包括:The one or more instruction modules include:
感兴趣区域获取模块,用于获取视频帧;检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域;a region of interest acquisition module, configured to acquire a video frame, detect a moving target in the video frame, and determine, in the video frame, a region where the moving target is located as a first region;
区域滤波模块,用于将所述视频帧中第二区域进行平滑滤波;所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠;a region filtering module, configured to perform smooth filtering on a second region in the video frame; the video frame includes the first region and the second region, and between the first region and the second region No overlap;
编码模块,用于按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。And an encoding module, configured to encode the video frame according to an encoding manner that the fidelity of the first region is higher than the fidelity of the second region, to obtain a video bitstream.
本申请还提出了一种非易失性计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行以上方法。The present application also proposes a non-transitory computer readable storage medium storing computer readable instructions that enable at least one processor to perform the above method.
附图简要说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实例或现有技术中的技术方案,下面将对实例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the examples of the present application or the technical solutions in the prior art, the drawings used in the examples or the prior art description will be briefly described below. Obviously, the drawings in the following description are only Some examples of the application, for those of ordinary skill in the art, other drawings may be obtained from these drawings without the inventive effort.
图1为一个实例中视频编码***的应用环境图; 1 is an application environment diagram of a video encoding system in an example;
图2A为一个实例中服务器的内部结构示意图;2A is a schematic diagram showing the internal structure of a server in an example;
图2B为一个实例中终端的内部结构示意图;2B is a schematic diagram showing the internal structure of a terminal in an example;
图3A为一个实例中视频编码方法的流程示意图;3A is a schematic flow chart of a video encoding method in an example;
图3B为一个实例中视频编码方法的流程示意图;FIG. 3B is a schematic flowchart of a video encoding method in an example; FIG.
图4为一个实例中对视频帧进行全局运动补偿的步骤的流程示意图;4 is a flow chart showing the steps of global motion compensation for a video frame in an example;
图5为一个实例中检测视频帧中的运动目标,并在视频帧中将运动目标所在区域确定为感兴趣区域的步骤的流程示意图;5 is a flow chart showing a step of detecting a moving target in a video frame and determining a region in which the moving target is located as a region of interest in the video frame;
图6为一个实例中根据提取的特征判断特征点是否属于运动目标所在区域的步骤的流程示意图;6 is a schematic flow chart showing the steps of determining whether a feature point belongs to an area where a moving target is located according to the extracted feature;
图7为一个实例中根据属于运动目标所在区域的特征点确定感兴趣区域的步骤的流程示意图;7 is a flow chart showing the steps of determining a region of interest according to feature points belonging to a region where a moving target is located in an example;
图8为一个实例中视频编码装置的结构框图;Figure 8 is a block diagram showing the structure of a video encoding apparatus in an example;
图9为一个实例中感兴趣区域获取模块的结构框图;9 is a structural block diagram of an area of interest acquisition module in an example;
图10为另一个实例中视频编码装置的结构框图。Figure 10 is a block diagram showing the structure of a video encoding apparatus in another example.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实例仅仅用以解释本申请,并不用于限定本申请。In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the accompanying drawings and examples. It is understood that the specific examples described herein are merely illustrative of the application and are not intended to be limiting.
在实现本申请实例的过程中,发明人发现目前的视频编码技术适用于正常场景视频的编码,然而,对于一些复杂场景的视频,比如体育比赛或者舞台演出等视频,由于视频剧烈运动、细节丰富以及不均匀光照等各种原因,往往导致编码后的视频码流画面质量难以控制,或者为保证画质导致编码得到的视频码流占用网络资源太大,不适合 传输,因此目前的视频编码方式难以兼顾画面质量和对网络资源的占用。In the process of implementing the examples of the present application, the inventor found that the current video coding technology is suitable for the encoding of normal scene video. However, for some complex scene videos, such as sports games or stage performances, the video is vigorously moved and the details are rich. And various reasons such as uneven illumination often lead to difficulty in controlling the quality of the encoded video stream picture, or to ensure that the video stream generated by the encoding consumes too much network resources, and is not suitable. Transmission, so the current video coding method is difficult to balance the picture quality and the occupation of network resources.
基于此,本申请针对目前将视频编码后的视频码流难以兼顾画面质量和对网络资源的占用的技术问题,提供一种视频编码方法。Based on this, the present application provides a video coding method for the technical problem that the video code stream after video coding is difficult to balance the picture quality and the occupation of network resources.
图1为一个实例中视频编码***的应用环境图。如图1所示,该视频编码***包括服务器110和终端120。服务器110可用于获取视频的视频帧;检测视频帧中的运动目标,并在视频帧中将运动目标所在区域确定为感兴趣区域;将视频帧中不属于感兴趣区域的非感兴趣区域进行平滑滤波后,按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。服务器110可通过网络向终端120传输视频码流。FIG. 1 is an application environment diagram of a video encoding system in an example. As shown in FIG. 1, the video encoding system includes a server 110 and a terminal 120. The server 110 may be configured to acquire a video frame of the video; detect a moving target in the video frame, and determine a region where the moving target is located as a region of interest in the video frame; smooth the non-interest region of the video frame that does not belong to the region of interest After filtering, the video frame is encoded according to the fidelity of the region of interest higher than the fidelity of the non-interest region, and the video stream is obtained. The server 110 can transmit the video code stream to the terminal 120 over the network.
图2A为一个实例中服务器110的内部结构示意图。如图2A所示,该服务器包括通过***总线连接的处理器、非易失性存储介质、内存储器和网络接口。其中,该服务器的非易失性存储介质存储有操作***、数据库和视频编码装置,数据库中可存储有进行视频编码所需的参数,该视频编码装置用于实现一种视频编码方法。该服务器的处理器用于提供计算和控制能力,支撑整个服务器的运行。该服务器的内存储器为非易失性存储介质中的视频编码装置的运行提供环境,该内存储器中可储存有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行视频编码方法。该服务器的网络接口用于据以与外部的终端通过网络连接通信,向终端发送视频码流等。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。本领域技术人员可以理解,图2A中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的服务器的限定,具体的服务器可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。 2A is a schematic diagram showing the internal structure of the server 110 in one example. As shown in FIG. 2A, the server includes a processor, a non-volatile storage medium, an internal memory, and a network interface connected by a system bus. The non-volatile storage medium of the server stores an operating system, a database, and a video encoding device. The database may store parameters required for video encoding, and the video encoding device is used to implement a video encoding method. The server's processor is used to provide computing and control capabilities that support the operation of the entire server. The internal memory of the server provides an environment for operation of a video encoding device in a non-volatile storage medium, and the internal memory can store computer readable instructions that, when executed by the processor, cause the processor to execute Video coding method. The network interface of the server is used to communicate with an external terminal via a network connection, to send a video stream to the terminal, and the like. The server can be implemented with a stand-alone server or a server cluster consisting of multiple servers. Those skilled in the art can understand that the structure shown in FIG. 2A is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on a server to which the solution of the present application is applied. The specific server may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
图2B为一个实例中终端的内部结构示意图。如图2B所示,该终端包括通过***总线连接的处理器、非易失性存储介质、内存储器、网络接口和显示屏。其中,终端的非易失性存储介质存储有操作***,还存储有一种视频解码装置,该视频解码装置用于实现一种视频解码方法。该处理器用于提供计算和控制能力,支撑整个终端的运行。终端中的内存储器为非易失性存储介质中的视频解码装置的运行提供环境,该内存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种视频解码方法。网络接口用于与服务器进行网络通信,如接收服务器发送的视频码流。终端的显示屏可以是液晶显示屏或者电子墨水显示屏等,输入装置可以是显示屏上覆盖的触摸层,也可以是终端外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。该终端可以是手机、平板电脑、个人数字助理或VR(Virtual Reality,即虚拟现实)终端等。本领域技术人员可以理解,图2B中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端的限定,具体的终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。2B is a schematic diagram showing the internal structure of a terminal in an example. As shown in FIG. 2B, the terminal includes a processor connected through a system bus, a non-volatile storage medium, an internal memory, a network interface, and a display screen. The non-volatile storage medium of the terminal stores an operating system, and further stores a video decoding device, and the video decoding device is used to implement a video decoding method. The processor is used to provide computing and control capabilities to support the operation of the entire terminal. The internal memory in the terminal provides an environment for operation of the video decoding device in the non-volatile storage medium, and the internal memory can store computer readable instructions that, when executed by the processor, cause the processor to execute A video decoding method. The network interface is used for network communication with the server, such as receiving a video stream sent by the server. The display screen of the terminal may be a liquid crystal display or an electronic ink display screen. The input device may be a touch layer covered on the display screen, or may be a button, a trackball or a touchpad provided on the terminal housing, or may be an external device. Keyboard, trackpad or mouse. The terminal can be a mobile phone, a tablet computer, a personal digital assistant, or a VR (Virtual Reality) terminal. A person skilled in the art can understand that the structure shown in FIG. 2B is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the terminal to which the solution of the present application is applied. The specific terminal may include a ratio. More or fewer components are shown in the figures, or some components are combined, or have different component arrangements.
图3A为一个实例中视频编码方法的流程示意图。本实例以该方法应用于上述图1中的服务器110来举例说明。如图3A所示,该方法包括如下步骤:FIG. 3A is a schematic flow chart of a video encoding method in an example. This example is illustrated by the method applied to the server 110 of FIG. 1 described above. As shown in FIG. 3A, the method includes the following steps:
S302A,获取视频帧。S302A, acquiring a video frame.
S304A,检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域。S304A, detecting a moving target in the video frame, and determining, in the video frame, an area where the moving target is located as the first area.
S306A,对所述视频帧中的第二区域进行平滑滤波,所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠。 S306A: Perform smoothing filtering on a second area in the video frame, where the video frame includes the first area and the second area, and there is no overlap between the first area and the second area.
S308A,按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。S308A: The video frame is encoded according to a coding manner in which the fidelity of the first area is higher than the fidelity of the second area, to obtain a video code stream.
图3B为一个实例中视频编码方法的流程示意图。本实例以该方法应用于上述图1中的服务器110来举例说明。如图3B所示,该方法具体包括如下步骤:FIG. 3B is a schematic flow chart of a video encoding method in an example. This example is illustrated by the method applied to the server 110 of FIG. 1 described above. As shown in FIG. 3B, the method specifically includes the following steps:
S302,获取视频帧。S302. Acquire a video frame.
其中,视频帧是待编码的视频的组成单元,视频帧被按次序展示便可以实现视频播放。服务器可按照待编码的视频中视频帧的次序依次获取视频帧。The video frame is a component unit of the video to be encoded, and the video frame is displayed in order to realize video playback. The server may sequentially acquire the video frames in the order of the video frames in the video to be encoded.
在一个实例中,若获取的视频帧是关键帧,则直接对获取的视频帧执行S304;若获取的视频帧是过渡帧,则可根据该过渡帧所依赖的关键帧计算出完整的视频帧后,对完整的视频帧执行S304。其中,关键帧是指包含完整画面信息的视频帧,过渡帧则是基于关键帧计算出的包含不完整的画面信息的视频帧。In an example, if the obtained video frame is a key frame, S304 is directly performed on the acquired video frame; if the obtained video frame is a transition frame, the complete video frame may be calculated according to the key frame on which the transition frame depends. After that, S304 is performed on the complete video frame. The key frame refers to a video frame containing complete picture information, and the transition frame is a video frame containing incomplete picture information calculated based on the key frame.
S304,检测视频帧中的运动目标,并在视频帧中将运动目标所在区域确定为感兴趣区域。S304. Detect a moving target in the video frame, and determine a region where the moving target is located as a region of interest in the video frame.
其中,运动目标是视频帧所表示的画面中运动的元素,是视频帧的前景;而视频帧中静止或者接近禁止的元素是视频帧的背景。运动目标比如位置或者姿态变化的人、移动的交通工具或者移动的光照等。感兴趣区域(Region Of Interest,ROI),是图像处理中从被处理的图像中以方框、圆、椭圆或者不规则多边形等方式勾勒出的需要处理的区域。Among them, the moving target is the element of motion in the picture represented by the video frame, which is the foreground of the video frame; and the element in the video frame that is still or nearly prohibited is the background of the video frame. Moving objects such as people whose position or posture changes, moving vehicles, or moving lights. A Region Of Interest (ROI) is an area that needs to be processed from a processed image in a frame, a circle, an ellipse, or an irregular polygon.
具体地,服务器可对视频帧进行运动目标检测,检测出视频帧中运动目标所在区域,从而将该区域确定为感兴趣区域。由于该感兴趣区域是视频帧中运动目标所在区域,因此该感兴趣区域也是视频帧中相对于非感兴趣区域被视频观看者所关注的区域。 Specifically, the server may perform motion target detection on the video frame, and detect a region where the moving target is located in the video frame, thereby determining the region as the region of interest. Since the region of interest is the region in which the moving object is located in the video frame, the region of interest is also the region of the video frame that is of interest to the video viewer relative to the non-interest region.
服务器检测视频帧中的运动目标,具体可采用帧间差分法、背景减除法和基于光流的运动目标检测算法。背景减除法通过统计前若干视频帧的变化情况,从而学习背景扰动的规律。帧间差分法的主要思想就是利用视频图像序列中连续两帧或三顿的差异来检测发生运动的区域。帧间差分法的特点是动态性强,能够适应动态背景下的运动目标检测。基于光流的运动目标检测算法是利用光流方程计算出每个像素点的运动状态矢量,从而发现运动的像素点,进而检测出运动目标所在区域。The server detects moving targets in the video frame, and specifically adopts an interframe difference method, a background subtraction method, and an optical flow based moving target detection algorithm. The background subtraction method learns the law of background disturbance by counting the changes of several video frames before. The main idea of the interframe difference method is to detect the region where motion occurs by using the difference of two consecutive frames or three bits in the video image sequence. The interframe difference method is characterized by strong dynamics and can adapt to moving target detection under dynamic background. The moving object detection algorithm based on optical flow is to calculate the motion state vector of each pixel point by using the optical flow equation, thereby finding the moving pixel point, and then detecting the moving target area.
S306,将视频帧中不属于感兴趣区域的非感兴趣区域进行平滑滤波后,按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。S306. After performing smoothing filtering on a non-region of interest that does not belong to the region of interest in the video frame, encoding the video frame according to a coding manner in which the fidelity of the region of interest is higher than the fidelity of the non-interest region. Video stream.
其中,非感兴趣区域是指视频帧中除去感兴趣区域之外的区域。对非感兴趣区域的平滑滤波是将非感兴趣区域中各像素点的像素值平滑过渡的处理过程。保真度是衡量编码后的视频码流解码后的视频帧与编码前原始的视频帧之间相似程度的量化值,保真度越高表示相似程度越高,编码后的视频码流画质损失越小;保真度越低表示相似程度越低,编码后的视频码流画质损失越大。Wherein, the non-interest area refers to an area outside the region of interest in the video frame. Smoothing filtering for non-interest regions is a process of smoothly transitioning pixel values of respective pixel points in a non-interest region. The fidelity is a quantized value that measures the degree of similarity between the decoded video stream and the original video frame before encoding. The higher the fidelity, the higher the similarity, and the encoded video stream quality. The smaller the loss; the lower the fidelity, the lower the similarity, and the greater the image quality loss of the encoded video stream.
需要说明的是,上述感兴趣区域又可称为第一区域,上述非感性趣区域又可称为第二区域。并且上述视频帧包括上述第一区域和上述第二区域,并且上述第一区域和上述第二区域之间没有重叠。It should be noted that the above-mentioned region of interest may also be referred to as a first region, and the non-inductive region may be referred to as a second region. And the video frame includes the first area and the second area, and there is no overlap between the first area and the second area.
具体地,平滑滤波可采用均值滤波、中值滤波或者高斯滤波等方式。若采用均值滤波,服务器可将非感兴趣区域中的各个像素点的像素值替换为该像素点邻域内的像素值均值。若采用中值滤波,服务器可将非感兴趣区域中的各个像素点的像素值替换为该邻域中像素值的中间值,该中间值是将该邻域中像素值按像素值大小排序后处于中间位置的像素值。若采用高斯滤波,则服务器可将非感兴趣区域中的 各个像素点的像素值替换为该像素点邻域内的各个像素值的加权平均值,且计算加权平均值的权重服从正态分布。Specifically, the smoothing filtering may adopt a method such as mean filtering, median filtering, or Gaussian filtering. If the mean filtering is used, the server can replace the pixel value of each pixel in the non-interest area with the pixel value mean in the neighborhood of the pixel. If median filtering is used, the server may replace the pixel value of each pixel in the non-interest area with the intermediate value of the pixel value in the neighborhood, and the intermediate value is the pixel value of the neighborhood sorted by the pixel value. The pixel value in the middle position. If Gaussian filtering is used, the server can be in the non-interest area The pixel values of the respective pixels are replaced by a weighted average of the respective pixel values within the neighborhood of the pixel, and the weights for calculating the weighted average are subject to a normal distribution.
服务器可通过调整感兴趣区域和非感兴趣区域的量化参数(Quantizaion Parameter),实现感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式。量化参数是对视频帧进行量化编码时所采用的参数。量化参数与保真度负相关,量化参数取最小值时表示量化最精细,当量化参数取最大值时表示量化最粗糙。服务器具体可按照感兴趣区域的量化参数低于非感兴趣区域的量化参数的编码方式进行编码,从而实现感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式。The server can adjust the quantization parameter of the region of interest and the region of non-interest region (Quantizaion Parameter) to achieve the fidelity of the region of interest is higher than the fidelity of the non-interest region. The quantization parameter is a parameter used when the video frame is quantized and encoded. The quantization parameter is negatively correlated with the fidelity. When the quantization parameter takes the minimum value, the quantization is the finest. When the quantization parameter takes the maximum value, the quantization is the coarsest. The server may specifically encode according to the coding mode of the quantization parameter of the region of interest lower than the quantization parameter of the non-region of interest, thereby realizing the coding mode of the fidelity of the region of interest higher than the fidelity of the non-interest region.
在一个实例中,服务器可通过调整感兴趣区域和非感兴趣区域的分辨率,实现感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,具体可采用感兴趣区域的分辨率低于非感兴趣区域的分辨率的编码方式。其中,视频帧某区域的分辨率是指该区域中单位面积中所包含的像素点数。In an example, the server may adjust the resolution of the region of interest and the region of non-interest to achieve a higher fidelity of the region of interest than the fidelity of the region of interest, specifically The encoding method with a lower resolution than the resolution of the non-region of interest. The resolution of a certain area of the video frame refers to the number of pixels included in the unit area in the area.
上述视频编码方法,通过检测视频帧中的运动目标,将运动目标所在区域确定为感兴趣区域,从而将视频帧分为感兴趣区域和非感兴趣区域,该感兴趣区域也是观看者所关注的区域。按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码得到相应的视频码流,即使对于复杂场景的视频,也可以将运动目标所在区域保持较高的画面质量。而且,如果直接降低非感兴趣区域的保真度会带来明显的阶梯波纹或振铃效应等压缩失真,降低画面质量。在编码之前对非感兴趣区域进行平滑滤波,减少高频信息,降低保真度下降而引起的压缩失真,非感兴趣区域被被观察到是模糊的而非充满噪声的,从而提高编码后的视频码流整体的画面质量。再者,通过降低非感兴趣区域的保真度来降低编码后的视频码流对网络资源的 占用。In the above video coding method, by detecting a moving target in a video frame, determining a region where the moving target is located as a region of interest, thereby dividing the video frame into a region of interest and a region of non-interest, which is also of interest to the viewer. region. According to the coding method of the fidelity of the region of interest higher than the fidelity of the non-interest region, the video frame is encoded to obtain a corresponding video stream, and even for the video of the complex scene, the region of the moving target can be maintained. High picture quality. Moreover, if the fidelity of the non-interest area is directly reduced, compression distortion such as step ripple or ringing effect is brought about, and the picture quality is lowered. Smoothing the non-region of interest before encoding, reducing high-frequency information, reducing compression distortion caused by decreased fidelity, non-region of interest is observed to be fuzzy rather than full of noise, thereby improving the encoded The overall picture quality of the video stream. Furthermore, by reducing the fidelity of the non-interest area, the encoded video stream is reduced to network resources. Occupied.
在一个实例中,在S304之前,该视频编码方法还包括对视频帧进行全局运动补偿的步骤。假设视频的视频帧是由摄像机拍摄的,该摄像机的运动会导致视频帧画面整体的运动,而视频帧中一些静态的背景并不是运动的,因此这里对视频帧进行全局运动补偿,以修复摄像机运动对视频帧整体画面的影响,从而避免在检测运动目标时检测出错甚至检测到视频帧整个画面都在运动。In one example, prior to S304, the video encoding method further includes the step of global motion compensation of the video frame. Assuming that the video frame of the video is taken by the camera, the motion of the camera will cause the overall motion of the video frame picture, while some static backgrounds in the video frame are not moving, so the global motion compensation of the video frame is performed here to repair the camera motion. The effect on the overall picture of the video frame, so as to avoid detecting errors when detecting moving targets or even detecting that the entire frame of the video frame is moving.
图4为一个实例中对视频帧进行全局运动补偿的步骤的流程示意图。如图4所示,该步骤具体包括如下步骤:4 is a flow chart showing the steps of global motion compensation for a video frame in an example. As shown in FIG. 4, the step specifically includes the following steps:
S402,获取摄像机运动参数。S402. Acquire camera motion parameters.
具体地,由于视频中物体的表征运动是由摄像机运动和物体运动叠加而来,而本实例后续处理视频帧时只需要纯粹的物体运动,因此需要首先估计出摄像机运动参数,再利用摄像机运动参数修复视频帧,实现对视频帧的全局运动补偿。Specifically, since the representation motion of the object in the video is superimposed by the camera motion and the object motion, the subsequent processing of the video frame in this example requires only pure object motion, so it is necessary to first estimate the camera motion parameter and then use the camera motion parameter. Fix video frames to achieve global motion compensation for video frames.
在一个实例中,服务器可采用二维仿射模型为摄像机建模,摄像机在位置s=(x,y)处的运动向量表示为公式(1):In one example, the server can model the camera using a two-dimensional affine model, and the motion vector of the camera at position s=(x, y) is expressed as equation (1):
Figure PCTCN2017091846-appb-000001
Figure PCTCN2017091846-appb-000001
其中,s=(x,y)是摄像机某一点在两个轴向上的位置坐标,wθ(s)表示摄像机在位置s=(x,y)处的运动向量;θ=(a1,a2,a3,a4,a5,a6)是摄像机运动参数,分别表示摄像机在两个轴向上的伸缩量、旋转量和位移量。摄像机运动参数可采用多种方法进行估计得到,比如M-estimater(M估计法)、最小二乘法或者蚁群算法。两个轴向可以是正交的。Where s=(x, y) is the position coordinate of a certain point of the camera in two axial directions, w θ (s) represents the motion vector of the camera at the position s=(x, y); θ=(a 1 , a 2 , a 3 , a 4 , a 5 , a 6 ) are camera motion parameters, respectively representing the amount of expansion, rotation and displacement of the camera in two axial directions. Camera motion parameters can be estimated using a variety of methods, such as M-estimater (M-estimator), least squares, or ant colony algorithms. The two axes can be orthogonal.
S404,根据摄像机运动参数,对视频帧进行全局运动补偿处理。S404: Perform global motion compensation processing on the video frame according to the camera motion parameter.
具体地,假设摄像机的运动是观测到的表征运动的主导量,由此可以估算出摄像机运动参数,并根据摄像机运动参数修正原始的视频 帧,得到只有物体运动的视频帧。若采用二维仿射模型为摄像机建模,则服务器可根据如下公式(2)计算经过全局运动补偿处理的视频帧:Specifically, it is assumed that the motion of the camera is the dominant amount of the observed motion, whereby the camera motion parameters can be estimated and the original video corrected according to the camera motion parameters. Frame, which gets a video frame with only object motion. If the camera is modeled using a two-dimensional affine model, the server can calculate the video frame processed by the global motion compensation according to the following formula (2):
Figure PCTCN2017091846-appb-000002
Figure PCTCN2017091846-appb-000002
其中
Figure PCTCN2017091846-appb-000003
表示经过全局运动补偿处理的视频帧,I(s)是摄像机在两个轴向上的位置坐标,wθ(s)表示摄像机在位置s=(x,y)处的运动向量。
among them
Figure PCTCN2017091846-appb-000003
Represents a video frame subjected to global motion compensation processing, I(s) is the positional coordinates of the camera in two axial directions, and w θ (s) represents the motion vector of the camera at position s=(x, y).
本实例中,利用估计出的摄像机参数,对视频帧进行全局运动补偿处理,使得经过全局运动补偿处理的视频帧可消除摄像机运动带来的影响,进而可以准确检测出视频帧中运动目标所在区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。In this example, the global motion compensation processing is performed on the video frame by using the estimated camera parameters, so that the video frame processed by the global motion compensation can eliminate the influence of the camera motion, and thus the region of the moving target in the video frame can be accurately detected. To ensure that the region of interest is the real moving target area, to ensure that the video stream can effectively take into account the picture quality and the occupation of network resources.
如图5所示,在一个实例中,S304包括以下步骤:As shown in FIG. 5, in one example, S304 includes the following steps:
S502,在视频帧的像素点中确定特征点。S502. Determine a feature point in a pixel of the video frame.
在一个实例中,S502具体包括:将视频帧中的每个像素点作为特征点;或者,在视频帧中随机选取预设数量或预设比例的像素点作为特征点;或者,对视频帧中的像素点进行均匀采样,得到特征点。In an example, S502 specifically includes: using each pixel in the video frame as a feature point; or randomly selecting a preset number or a preset ratio of pixel points as feature points in the video frame; or, in a video frame The pixels are uniformly sampled to obtain feature points.
具体地,服务器既可以将视频帧中的所有像素点均作为特征点,也可以采用设定的规则从视频帧中选择部分像素点作为特征点。预设比例是指特征点的数量占视频帧的像素点总数的比例。对视频帧中的像素点进行均匀采样,具体是指在视频帧中两个轴向上分别每隔预设个数的像素点选取像素点作为特征点。当在视频帧中随机选取预设数量或预设比例的像素点作为特征点时,或者,对视频帧中的像素点进行均匀采样得到特征点时,特征点的数量少于视频帧的像素点总数。Specifically, the server may use all the pixel points in the video frame as feature points, or select a partial pixel point from the video frame as a feature point by using a set rule. The preset ratio refers to the ratio of the number of feature points to the total number of pixels of the video frame. The pixel points in the video frame are uniformly sampled, specifically, the pixel points are selected as feature points in every two preset pixels in the two axial directions of the video frame. When a preset number or a preset ratio of pixel points is randomly selected as a feature point in a video frame, or when a pixel point in a video frame is uniformly sampled to obtain a feature point, the number of feature points is less than the pixel point of the video frame. total.
S504,提取特征点的特征。S504. Extract feature of the feature point.
在一个实例中,提取的特征包括运动特征,还包括空间特征、色彩特征和时间特征中的至少一种。 In one example, the extracted features include motion features, and further include at least one of spatial features, color features, and temporal features.
具体地,运动特征是表征特征点的运动特性的特征。假设t时刻特征点为it(x,y)。服务器可采用光流法获得特征点it(x,y)的光流向量(dx,dy),可根据光流向量中的元素构成运动特征,比如可定义运动特征xm={dx,dy}。其中,x和y分别是特征点it两个轴向上的位置坐标。Specifically, the motion feature is a feature that characterizes the motion characteristics of the feature point. Assume that the feature point at time t is i t (x, y). The server can obtain the optical flow vector (dx, dy) of the feature point i t (x, y) by optical flow method, and can form a motion feature according to the elements in the optical flow vector, for example, a motion characteristic can be defined x m = {dx, dy }. Where x and y are the positional coordinates in the two axial directions of the feature point i t , respectively.
空间特征是表征特征点相对于视频帧的空间位置的特征,服务器可采用特征点it(x,y)两个轴向上的位置坐标来构成空间特征,比如可定义空间特征xs={x,y}。The spatial feature is a feature that characterizes the spatial position of the feature point relative to the video frame. The server may use the two coordinate positions of the feature point i t (x, y) to form a spatial feature, such as a definable spatial feature x s = { x,y}.
色彩特征是表征特征点的色彩特性的特征,可以将特征点的像素值构成色彩特征。服务器也可以将视频帧转化为YUV颜色模式后,将特征点it(x,y)在YUV颜色模式下各个分量的像素值yt(x,y)、ut(x,y)和vt(x,y)构成色彩特征xc={yt(x,y),ut(x,y),vt(x,y)}。采用YUV颜色模式对颜色变化更加敏感,可提高提取的特征对特征点颜色特性的表达能力。The color feature is a feature that characterizes the color characteristics of the feature point, and the pixel value of the feature point can constitute a color feature. The server can also convert the video frame into the YUV color mode, and then set the pixel value y t (x, y), u t (x, y) and v of each component of the feature point i t (x, y) in the YUV color mode. t (x, y) constitutes a color feature x c = {y t (x, y), u t (x, y), v t (x, y)}. The YUV color mode is more sensitive to color changes, and the ability of the extracted features to express the color characteristics of the feature points can be improved.
时间特征是表征特征点的时间变化特性的特征,可以用特征点it(x,y)在下一时刻t+1的色彩特征作为本时刻t的时间特征。比如可定义时间特征为xt={yt+1(x’,y’),ut+1(x’,y’),vt+1(x’,y’)},其中(x’,y’)=(x+dx,y+dy)。提取的特征可表示为:X={xs,xm,xc,xt}。The temporal feature is a feature that characterizes the temporal variation characteristic of the feature point, and the color feature at the next time t+1 of the feature point i t (x, y) can be used as the temporal feature of the present time t. For example, the definable time feature is x t ={y t+1 (x',y'), u t+1 (x',y'),v t+1 (x',y')}, where (x ', y') = (x + dx, y + dy). The extracted features can be expressed as: X = {x s , x m , x c , x t }.
S506,根据提取的特征判断特征点是否属于运动目标所在区域。S506. Determine, according to the extracted feature, whether the feature point belongs to an area where the moving target is located.
具体地,服务器可将提取的特征输入到经过训练的分类器,由分类器输出特征点是否属于运动目标所在区域的分类结果,从而判定特征点是否属于运动目标所在区域。在一个实例中,服务器也可以将特征点进行聚类,得到视频帧中的多个区域,进而判断多个区域中每个区域是否为运动目标所在区域。Specifically, the server may input the extracted features to the trained classifier, and the classifier outputs whether the feature points belong to the classification result of the region where the moving target is located, thereby determining whether the feature points belong to the region where the moving target is located. In an example, the server may also cluster feature points to obtain multiple regions in the video frame, and then determine whether each of the multiple regions is the region where the moving target is located.
S508,根据属于运动目标所在区域的特征点确定感兴趣区域。 S508. Determine a region of interest according to a feature point belonging to a region where the moving target is located.
具体地,若将视频帧中的每个像素点作为特征点,则服务器可将判定为属于运动目标所在区域的特征点围成的区域作为感兴趣区域。若特征点的数量少于视频帧的像素点总数,则服务器可根据特征点是否属于运动目标所在区域的判断结果,估计视频帧中非特征点的像素点是否属于运动目标所在区域。Specifically, if each pixel point in the video frame is used as a feature point, the server may use an area surrounded by feature points determined to belong to the area where the moving object is located as the area of interest. If the number of feature points is less than the total number of pixels of the video frame, the server may estimate whether the pixel points of the non-feature points in the video frame belong to the area where the moving target is located according to the determination result of whether the feature point belongs to the area where the moving target is located.
本实例中,依据视频帧中的特征点,判断各个特征点是否属于运动目标所在区域,从而将属于运动目标所在区域的特征点所构成的区域确定为感兴趣区域,可准确检测出视频帧中运动目标所在区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。In this example, according to the feature points in the video frame, it is determined whether each feature point belongs to the region where the moving target is located, so that the region formed by the feature points belonging to the region where the moving target is located is determined as the region of interest, and the video frame can be accurately detected. The area where the moving target is located ensures that the region of interest is the real moving target area, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
如图6所示,在一个实例中,S506包括如下步骤:As shown in FIG. 6, in an example, S506 includes the following steps:
S602,根据提取的特征将特征点进行聚类,得到视频帧中的多个区域。S602. Cluster the feature points according to the extracted features to obtain multiple regions in the video frame.
在一些实例中,根据提取的特征将所述特征点分为多个类别,得到所述视频帧中分别对应所述多个类别的多个区域,其中,一个区域包括属于该区域对应的类别的一个或多个特征点。In some examples, the feature points are divided into a plurality of categories according to the extracted features, and multiple regions corresponding to the plurality of categories respectively in the video frame are obtained, wherein one region includes categories corresponding to the regions. One or more feature points.
具体地,服务器可根据提取的特征将特征点聚类为多个类别,每个类别的特征点形成相应的区域,从而得到视频帧中的多个区域。服务器具体可采用k-means聚类算法、层次聚类算法、SOM(Self-organizing feature Map,自组织特征映射网络)聚类算法或者Meanshift(均值偏移)聚类算法等进行聚类。通过聚类算法,可将提取的特征在高维空间中收敛到若干局部稠密的区域。本实例中得到的每个区域便是一个完整的成块分布的前景物体或背景物体。Specifically, the server may cluster the feature points into a plurality of categories according to the extracted features, and the feature points of each category form corresponding regions, thereby obtaining a plurality of regions in the video frame. The server may be clustered by k-means clustering algorithm, hierarchical clustering algorithm, SOM (Self-organizing feature map) clustering algorithm or Meanshift (mean shift) clustering algorithm. Through the clustering algorithm, the extracted features can converge to a number of locally dense regions in a high-dimensional space. Each of the regions obtained in this example is a complete block-shaped foreground or background object.
S604,获取多个区域各自的光流平均运动速度。S604. Acquire an average moving speed of the optical flow of each of the plurality of regions.
多个区域各自的光流平均运动速度,是多个区域中每个区域在光流场中运动速度的平均值。光流场是视频帧中的所有像素点构成的一 种二维瞬时速度场。The average moving velocity of the optical flow of each of the plurality of regions is an average value of the moving speed of each of the plurality of regions in the optical flow field. The optical flow field is one of all the pixels in the video frame. A two-dimensional instantaneous velocity field.
S606,将多个区域各自的光流平均运动速度分别与预设值比较。S606: Compare respective average optical motion speeds of the plurality of regions with preset values.
其中,预设值是0或者接近于0的数值。服务器将多个区域各自的光流平均运动速度分别与预设值进行数值大小的比较,从而可根据比较结果确定运动目标所在区域。Among them, the preset value is 0 or a value close to 0. The server compares the average moving speeds of the respective optical flows of the plurality of regions with the preset values, so that the region of the moving target can be determined according to the comparison result.
S608,将多个区域中光流平均运动速度大于预设值的区域确定为运动目标所在区域。S608: Determine an area in which the average moving speed of the optical flow in the plurality of areas is greater than a preset value as the area where the moving target is located.
具体地,服务器可将多个区域中光流平均运动速度大于预设值的区域标定为为运动目标所在区域,并将多个区域中光流平均运动速度小于或等于预设值的区域标定为非感兴趣区域。Specifically, the server may calibrate the area in which the average moving speed of the optical flow in the plurality of areas is greater than the preset value to the area where the moving target is located, and mark the area in which the average moving speed of the optical flow in the multiple areas is less than or equal to the preset value as Non-interest area.
本实例中,根据提取的特征将特征点进行聚类,得到视频帧中的多个区域,通过将多个区域各自的光流平均运动速度与预设值比较,可高效、准确地判定视频帧中的感兴趣区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。In this example, the feature points are clustered according to the extracted features to obtain a plurality of regions in the video frame, and the video frames can be efficiently and accurately determined by comparing the average motion speed of each of the plurality of regions with a preset value. The region of interest ensures that the region of interest is the region of the real moving target, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
在一个实例中,特征点的数量少于视频帧的像素点总数。如图7所示,S508具体包括如下步骤:In one example, the number of feature points is less than the total number of pixels of the video frame. As shown in FIG. 7, S508 specifically includes the following steps:
S702,在视频帧中查找与非特征点的像素点距离最近的特征点。S702. Search for a feature point in the video frame that is closest to the pixel point of the non-feature point.
具体地,服务器可遍历视频帧中每个非特征点的像素点,并计算遍历的像素点与每个特征点之间的距离,从而根据计算的距离查找到与遍历的像素点距离最近的特征点,直至遍历完视频帧中所有非特征点的像素点。Specifically, the server may traverse the pixel points of each non-feature point in the video frame, and calculate the distance between the traversed pixel points and each feature point, so as to find the feature closest to the traversed pixel point according to the calculated distance. Point until the pixel points of all non-feature points in the video frame are traversed.
S704,根据查找到的特征点是否属于运动目标所在区域的判断结果,确定非特征点的像素点是否属于运动目标所在区域。S704. Determine, according to the judgment result that the found feature point belongs to the area where the moving target is located, determine whether the pixel point of the non-feature point belongs to the area where the moving target is located.
具体地,若查找到的特征点属于运动目标所在区域,则服务器可直接判定相应遍历的像素点也属于运动目标所在区域;若查找到的特 征点不属于运动目标所在区域,则服务器可直接判定相应遍历的像素点也不属于运动目标所在区域。Specifically, if the found feature point belongs to the area where the moving target is located, the server may directly determine that the pixel point corresponding to the traversal belongs to the area where the moving target is located; If the sign is not in the area where the moving target is located, the server can directly determine that the corresponding traversed pixel does not belong to the moving target area.
S706,根据属于运动目标所在区域的像素点确定感兴趣区域。S706, determining a region of interest according to a pixel point belonging to an area where the moving target is located.
具体地,服务器在遍历所有非特征点的像素点并确定其是否属于运动目标所在区域后,便可以获知视频帧中每个像素点是否属于感兴趣区域,进而可以根据视频帧中属于运动目标所在区域的像素点确定感兴趣区域,视频帧中属于运动目标所在区域的像素点包括属于运动目标所在区域的特征点和属于运动目标所在区域的非特征点的像素点。Specifically, after traversing all the pixels of the non-feature points and determining whether they belong to the area where the moving target is located, the server can know whether each pixel in the video frame belongs to the region of interest, and then according to the moving target in the video frame. The pixel points of the region determine the region of interest, and the pixel points of the video frame belonging to the region where the moving target is located include the feature points belonging to the region where the moving target is located and the pixel points of the non-feature points belonging to the region where the moving target is located.
本实例中,利用数量少于视频帧的像素点总数的特征点的是否属于运动目标所在区域的判断结果,估计视频帧中非特征点的像素点是否属于运动目标所在区域,可以利用少量计算高效地确定感兴趣区域,提高了视频编码效率。In this example, it is estimated whether the pixel points of the non-feature points in the video frame belong to the region where the moving target is located, and the number of the feature points of the total number of pixels of the video frame is determined by the determination result of the region of the moving target. The area of interest is determined to improve the video coding efficiency.
在一个实例中,该方法还包括:生成标记视频帧中的每个像素点是否属于感兴趣区域的标记模板。该步骤可在执行步骤S304之后执行。其中,标记模板记录了视频帧中的每个像素点是否属于感兴趣区域的信息。该标记模板具体可以是与视频帧的画面尺寸相同的二维矩阵,该二维矩阵中的元素与视频帧的各个像素点一一对应,该二维矩阵中的每个元素是视频帧中相应的像素点是否属于感兴趣区域的标记。该标记模板中的标记取两个数值,分别表示相应视频帧中的像素点属于感兴趣区域或者不属于感兴趣区域,比如可用“1”和“0”分别表示属于或者不属于感兴趣区域。In one example, the method further includes generating a tag template for each pixel in the tagged video frame that belongs to the region of interest. This step can be performed after performing step S304. The markup template records information about whether each pixel in the video frame belongs to the region of interest. The marking template may specifically be a two-dimensional matrix having the same screen size as the video frame, and the elements in the two-dimensional matrix are in one-to-one correspondence with the respective pixel points of the video frame, and each element in the two-dimensional matrix is corresponding in the video frame. Whether the pixel is a marker of the region of interest. The mark in the markup template takes two values, which respectively indicate that the pixel points in the corresponding video frame belong to the region of interest or do not belong to the region of interest, for example, "1" and "0" can be used to indicate that they belong to or not belong to the region of interest.
在一个实例中,S306包括:将视频帧中由标记模板标记的不属于感兴趣区域的像素点构成的非感兴趣区域进行平滑滤波后,按照由标记模板标记形成的感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。 In one example, S306 includes: smoothing the non-region of interest formed by the pixel of the region of the video frame not marked by the region of interest in the video frame, according to the fidelity of the region of interest formed by the marker template marker The coding method of the fidelity of the non-interest area is encoded, and the video frame is encoded to obtain a video code stream.
本实例中,通过标记模板可以简单高效地表达视频帧中的每个像素点是否属于感兴趣区域,从而在处理视频帧的每个像素点时,以标记模板为参考,对感兴趣区域和非感兴趣区域中的像素点进行差异化的编码,可进一步提高视频编码效率。In this example, by marking the template, it is simple and efficient to express whether each pixel in the video frame belongs to the region of interest, so that when processing each pixel of the video frame, the reference template is used as a reference, and the region of interest and the non- The coding of the pixels in the region of interest is differentiated to further improve the video coding efficiency.
在一个实例中,视频帧包括左眼视频帧和右眼视频帧;视频码流包括左眼视频码流和右眼视频码流;视频编码方法还包括:将左眼视频码流和右眼视频码流发送到VR终端,使得VR终端将左眼视频码流和右眼视频码流分别解码后同步播放。In one example, the video frame includes a left-eye video frame and a right-eye video frame; the video code stream includes a left-eye video stream and a right-eye video stream; and the video encoding method further includes: a left-eye video stream and a right-eye video. The code stream is sent to the VR terminal, so that the VR terminal separately decodes the left-eye video stream and the right-eye video stream and plays the same.
具体地,服务器可获取左眼视频帧和右眼视频帧,分别检测左眼视频帧和右眼视频帧中的运动目标,并分别在左眼视频帧和右眼视频帧中将检测到的运动目标所在区域确定为感兴趣区域,分别将左眼视频帧和右眼视频帧中不属于感兴趣区域的非感兴趣区域进行平滑滤波后,按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,分别得到左眼视频码流和右眼视频码流。其中,左眼视频帧和右眼视频帧用于生成视觉三维画面。左眼视频帧和右眼视频帧可以从全景视频中获取。Specifically, the server may acquire a left-eye video frame and a right-eye video frame, respectively detect motion targets in the left-eye video frame and the right-eye video frame, and respectively detect the motions in the left-eye video frame and the right-eye video frame. The target area is determined as the region of interest, and the non-interest region of the left-eye video frame and the right-eye video frame that does not belong to the region of interest is smoothed and filtered, and the fidelity of the region of interest is higher than the non-region of interest. The fidelity encoding method encodes the video frame to obtain a left-eye video stream and a right-eye video stream, respectively. Among them, the left eye video frame and the right eye video frame are used to generate a visual three-dimensional picture. The left eye video frame and the right eye video frame can be obtained from the panoramic video.
服务器在编码得到左眼视频码流和右眼视频码流后,将左眼视频码流和右眼视频码流推送到VR终端,使得VR终端将左眼视频码流和右眼视频码流分别解码为左眼视频帧和右眼视频帧后同步播放。通过VR终端自带或者附加的左眼镜片和右眼镜片,VR终端展示的左眼视频帧和右眼视频帧经过用户眼睛形成视觉三维画面。其中VR终端可以是自带左眼镜片、右眼镜片和显示屏的专用VR终端,也可以是手机、平板电脑等移动终端,该移动终端通过附加于移动终端的左眼镜片和右眼镜片再经过用户眼睛形成视觉三维画面。After the server obtains the left-eye video stream and the right-eye video stream, the server pushes the left-eye video stream and the right-eye video stream to the VR terminal, so that the VR terminal separately separates the left-eye video stream and the right-eye video stream. Decoded into a left-eye video frame and a right-eye video frame and played synchronously. The left-eye video frame and the right-eye video frame displayed by the VR terminal form a visual three-dimensional picture through the user's eyes through the left and right glasses of the VR terminal. The VR terminal may be a dedicated VR terminal with a left eyeglass lens, a right eyeglass lens and a display screen, or may be a mobile terminal, a tablet computer and the like, and the mobile terminal passes through the left eyeglass lens and the right eyeglass lens attached to the mobile terminal. A visual three-dimensional picture is formed through the eyes of the user.
本实例中,将视频编码为左眼视频码流和右眼视频码流后发送到VR终端,使得VR终端能够还原出左眼视频帧和右眼视频帧并同步 播放,VR终端的使用者可以观看到高质量的三维画面。而且将左眼视频码流和右眼视频码流发送到VR终端对网络资源占用小,可避免VR终端播放时发生卡顿。In this example, the video is encoded into a left-eye video stream and a right-eye video stream, and then sent to the VR terminal, so that the VR terminal can restore the left-eye video frame and the right-eye video frame and synchronize. Play, the user of the VR terminal can view high quality 3D images. Moreover, the left-eye video code stream and the right-eye video code stream are sent to the VR terminal to occupy a small amount of network resources, which can prevent the VR terminal from being stuck when playing.
如图8所示,在一个实例中,提供了一种视频编码装置800,包括感兴趣区域获取模块810、区域滤波模块820和编码模块830。As shown in FIG. 8, in one example, a video encoding apparatus 800 is provided, including a region of interest acquisition module 810, a region filtering module 820, and an encoding module 830.
感兴趣区域获取模块810,用于获取视频帧;检测视频帧中的运动目标,并在视频帧中将运动目标所在区域确定为感兴趣区域。The region of interest acquisition module 810 is configured to acquire a video frame, detect a moving target in the video frame, and determine a region where the moving target is located as a region of interest in the video frame.
区域滤波模块820,用于将视频帧中不属于感兴趣区域的非感兴趣区域进行平滑滤波。The region filtering module 820 is configured to perform smooth filtering on non-regions of interest in the video frame that do not belong to the region of interest.
编码模块830,用于按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。The encoding module 830 is configured to encode the video frame according to a coding manner in which the fidelity of the region of interest is higher than the fidelity of the non-interest region, to obtain a video code stream.
上述视频编码装置800,通过检测视频帧中的运动目标,将运动目标所在区域确定为感兴趣区域,从而将视频帧分为感兴趣区域和非感兴趣区域,该感兴趣区域也是观看者所关注的区域。按照感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码得到相应的视频码流,即使对于复杂场景的视频,也可以将运动目标所在区域保持较高的画面质量。而且,如果直接降低非感兴趣区域的保真度会带来明显的阶梯波纹或振铃效应等压缩失真,降低画面质量。在编码之前对非感兴趣区域进行平滑滤波,减少高频信息,降低保真度下降而引起的压缩失真,非感兴趣区域被被观察到是模糊的而非充满噪声的,从而提高编码后的视频码流整体的画面质量。再者,通过降低非感兴趣区域的保真度来降低编码后的视频码流对网络资源的占用。The video encoding apparatus 800 determines a moving target in the video frame to determine a region where the moving target is located as a region of interest, thereby dividing the video frame into a region of interest and a region of non-interest, which is also of interest to the viewer. Area. According to the coding method of the fidelity of the region of interest higher than the fidelity of the non-interest region, the video frame is encoded to obtain a corresponding video stream, and even for the video of the complex scene, the region of the moving target can be maintained. High picture quality. Moreover, if the fidelity of the non-interest area is directly reduced, compression distortion such as step ripple or ringing effect is brought about, and the picture quality is lowered. Smoothing the non-region of interest before encoding, reducing high-frequency information, reducing compression distortion caused by decreased fidelity, non-region of interest is observed to be fuzzy rather than full of noise, thereby improving the encoded The overall picture quality of the video stream. Furthermore, the occupation of the network resources by the encoded video code stream is reduced by reducing the fidelity of the non-interest area.
如图9所示,在一个实例中,感兴趣区域获取模块810包括:全局运动补偿模块811,用于获取摄像机运动参数;根据摄像机运动参 数,对视频帧进行全局运动补偿处理。As shown in FIG. 9, in one example, the region of interest acquisition module 810 includes: a global motion compensation module 811 for acquiring camera motion parameters; Number, global motion compensation processing for video frames.
本实例中,利用估计出的摄像机参数,对视频帧进行全局运动补偿处理,使得经过全局运动补偿处理的视频帧可消除摄像机运动带来的影响,进而可以准确检测出视频帧中运动目标所在区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。In this example, the global motion compensation processing is performed on the video frame by using the estimated camera parameters, so that the video frame processed by the global motion compensation can eliminate the influence of the camera motion, and thus the region of the moving target in the video frame can be accurately detected. To ensure that the region of interest is the real moving target area, to ensure that the video stream can effectively take into account the picture quality and the occupation of network resources.
在一个实例中,感兴趣区域获取模块810包括:特征提取模块812和感兴趣区域判断模块813。In one example, the region of interest acquisition module 810 includes a feature extraction module 812 and a region of interest determination module 813.
特征提取模块812,用于在视频帧的像素点中确定特征点;提取特征点的特征。The feature extraction module 812 is configured to determine a feature point in a pixel of the video frame; and extract a feature of the feature point.
感兴趣区域判断模块813,用于根据提取的特征判断特征点是否属于运动目标所在区域;根据属于运动目标所在区域的特征点确定感兴趣区域。The region of interest judging module 813 is configured to determine, according to the extracted feature, whether the feature point belongs to the region where the moving target is located; and determine the region of interest according to the feature point belonging to the region where the moving target is located.
本实例中,依据视频帧中的特征点,判断各个特征点是否属于运动目标所在区域,从而将属于运动目标所在区域的特征点所构成的区域确定为感兴趣区域,可准确检测出视频帧中运动目标所在区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。In this example, according to the feature points in the video frame, it is determined whether each feature point belongs to the region where the moving target is located, so that the region formed by the feature points belonging to the region where the moving target is located is determined as the region of interest, and the video frame can be accurately detected. The area where the moving target is located ensures that the region of interest is the real moving target area, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
在一个实例中,特征提取模块812还用于将视频帧中的每个像素点作为特征点;或者,在视频帧中随机选取预设数量或预设比例的像素点作为特征点;或者,对视频帧中的像素点进行均匀采样,得到特征点。In an example, the feature extraction module 812 is further configured to use each pixel point in the video frame as a feature point; or randomly select a preset number or a preset ratio of pixel points as feature points in the video frame; or, The pixels in the video frame are uniformly sampled to obtain feature points.
在一个实例中,提取的特征包括运动特征,还包括空间特征、色彩特征和时间特征中的至少一种。In one example, the extracted features include motion features, and further include at least one of spatial features, color features, and temporal features.
在一个实例中,感兴趣区域判断模块813还用于根据提取的特征将特征点进行聚类,得到视频帧中的多个区域;获取多个区域各自的 光流平均运动速度;将多个区域各自的光流平均运动速度分别与预设值比较;将多个区域中光流平均运动速度大于预设值的区域确定为运动目标所在区域。In an example, the region of interest determination module 813 is further configured to cluster the feature points according to the extracted features to obtain multiple regions in the video frame; The average moving speed of the optical flow; comparing the average moving speeds of the optical flows of the plurality of regions with the preset values respectively; determining the region where the average moving velocity of the optical flow in the plurality of regions is greater than the preset value is the region where the moving target is located.
本实例中,根据提取的特征将特征点进行聚类,得到视频帧中的多个区域,通过将多个区域各自的光流平均运动速度与预设值比较,可高效、准确地判定视频帧中的感兴趣区域,保证感兴趣区域是真实的运动目标所在区域,保证视频码流能够有效兼顾画面质量和对网络资源的占用。In this example, the feature points are clustered according to the extracted features to obtain a plurality of regions in the video frame, and the video frames can be efficiently and accurately determined by comparing the average motion speed of each of the plurality of regions with a preset value. The region of interest ensures that the region of interest is the region of the real moving target, ensuring that the video stream can effectively take into account the picture quality and the occupation of network resources.
在一个实例中,特征点的数量少于视频帧的像素点总数;感兴趣区域判断模块813还用于在视频帧中查找与非特征点的像素点距离最近的特征点;根据查找到的特征点是否属于运动目标所在区域的判断结果,确定非特征点的像素点是否属于运动目标所在区域;根据属于运动目标所在区域的像素点确定感兴趣区域。In one example, the number of feature points is less than the total number of pixel points of the video frame; the region of interest determination module 813 is further configured to search for a feature point in the video frame that is closest to the pixel point of the non-feature point; according to the found feature Whether the point belongs to the judgment result of the region where the moving target is located, determines whether the pixel point of the non-feature point belongs to the region where the moving target is located; and determines the region of interest according to the pixel point belonging to the region where the moving target is located.
本实例中,利用数量少于视频帧的像素点总数的特征点的是否属于运动目标所在区域的判断结果,估计视频帧中非特征点的像素点是否属于运动目标所在区域,可以利用少量计算高效地确定感兴趣区域,提高了视频编码效率。In this example, it is estimated whether the pixel points of the non-feature points in the video frame belong to the region where the moving target is located, and the number of the feature points of the total number of pixels of the video frame is determined by the determination result of the region of the moving target. The area of interest is determined to improve the video coding efficiency.
在一个实例中,感兴趣区域获取模块810还用于生成标记视频帧中的每个像素点是否属于感兴趣区域的标记模板;In one example, the region of interest acquisition module 810 is further configured to generate a tag template for each pixel in the tagged video frame to belong to the region of interest;
区域滤波模块820还用于将视频帧中由标记模板标记的不属于感兴趣区域的像素点构成的非感兴趣区域进行平滑滤波。The region filtering module 820 is further configured to perform smooth filtering on the non-region of interest composed of the pixel points not marked by the region of interest marked by the mark template in the video frame.
编码模块830还用于按照由标记模板标记形成的感兴趣区域的保真度高于非感兴趣区域的保真度的编码方式,对视频帧进行编码,得到视频码流。The encoding module 830 is further configured to encode the video frame according to an encoding manner in which the fidelity of the region of interest formed by the marking template mark is higher than the fidelity of the non-interest region, to obtain a video bitstream.
本实例中,通过标记模板可以简单高效地表达视频帧中的每个像素点是否属于感兴趣区域,从而在处理视频帧的每个像素点时,以标 记模板为参考,对感兴趣区域和非感兴趣区域中的像素点进行差异化的编码,可进一步提高视频编码效率。In this example, by marking the template, it is simple and efficient to express whether each pixel in the video frame belongs to the region of interest, so that when processing each pixel of the video frame, The template is used as a reference, and the coding of the pixels in the region of interest and the non-interest region is differentiated, which can further improve the video coding efficiency.
在一个实例中,视频帧包括左眼视频帧和右眼视频帧;视频码流包括左眼视频码流和右眼视频码流。如图10所示,视频编码装置还包括:视频码流发送模块840,用于将左眼视频码流和右眼视频码流发送到VR终端,使得VR终端将左眼视频码流和右眼视频码流分别解码后同步播放。In one example, the video frame includes a left eye video frame and a right eye video frame; the video code stream includes a left eye video stream and a right eye video stream. As shown in FIG. 10, the video encoding apparatus further includes: a video code stream sending module 840, configured to send the left-eye video stream and the right-eye video stream to the VR terminal, so that the VR terminal sets the left-eye video stream and the right eye. The video streams are decoded separately and played synchronously.
本实例中,将视频编码为左眼视频码流和右眼视频码流后发送到VR终端,使得VR终端能够还原出左眼视频帧和右眼视频帧并同步播放,VR终端的使用者可以观看到高质量的三维画面。而且将左眼视频码流和右眼视频码流发送到VR终端对网络资源占用小,可避免VR终端播放时发生卡顿。In this example, the video is encoded into a left-eye video stream and a right-eye video stream, and then sent to the VR terminal, so that the VR terminal can restore the left-eye video frame and the right-eye video frame and play the same, and the VR terminal user can Watch high quality 3D images. Moreover, the left-eye video code stream and the right-eye video code stream are sent to the VR terminal to occupy a small amount of network resources, which can prevent the VR terminal from being stuck when playing.
本领域普通技术人员可以理解实现上述实例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实例的流程。其中,该存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the above example method can be completed by a computer program to instruct related hardware, and the program can be stored in a non-volatile computer readable storage medium. The program, when executed, may include a flow of an instance of each of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.
因此,本申请还提供了一种存储介质,其中存储有数据处理程序,该数据处理程序用于执行本申请上述方法的任何一种实例。Accordingly, the present application also provides a storage medium in which is stored a data processing program for performing any of the above-described methods of the present application.
以上实例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above examples may be arbitrarily combined. For the sake of brevity of description, all possible combinations of the various technical features in the above examples are not described. However, as long as there is no contradiction between the combinations of these technical features, it should be considered as The scope of this manual.
以上实例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是, 对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。 The above examples are only illustrative of several embodiments of the present application, and the description thereof is more specific and detailed, but is not to be construed as limiting the scope of the invention. It should be noted that It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and scope of the invention. Therefore, the scope of the invention should be determined by the appended claims.

Claims (19)

  1. 一种视频编码方法,包括:A video encoding method comprising:
    获取视频帧;Get a video frame;
    检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域;Detecting a moving target in the video frame, and determining, in the video frame, the area where the moving target is located as the first area;
    对所述视频帧中的第二区域进行平滑滤波,所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠;Smoothing filtering a second region in the video frame, the video frame including the first region and the second region, and there is no overlap between the first region and the second region;
    按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。And encoding the video frame according to an encoding manner that the fidelity of the first region is higher than the fidelity of the second region, to obtain a video bitstream.
  2. 根据权利要求1所述的方法,其中,所述检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域的步骤之前,所述方法还包括:The method according to claim 1, wherein said method further comprises the step of detecting a moving object in said video frame and determining said region of said moving object as said first region in said video frame include:
    获取摄像机运动参数;Obtain camera motion parameters;
    根据所述摄像机运动参数,对所述视频帧进行全局运动补偿处理。Performing global motion compensation processing on the video frame according to the camera motion parameter.
  3. 根据权利要求1所述的方法,其中,所述检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域包括:The method of claim 1, wherein the detecting the moving target in the video frame and determining the region in which the moving target is located as the first region in the video frame comprises:
    在所述视频帧的像素点中确定特征点;Determining a feature point in a pixel of the video frame;
    提取所述特征点的特征;Extracting features of the feature points;
    当提取的所述特征点的特征属于运动目标所在区域时,根据属于所述运动目标所在区域的特征点确定所述第一区域。When the extracted feature of the feature point belongs to the region where the moving target is located, the first region is determined according to the feature point belonging to the region where the moving target is located.
  4. 根据权利要求3所述的方法,其中,所述在所述视频帧的像素点中确定特征点包括: The method of claim 3 wherein said determining feature points in pixels of said video frame comprises:
    将所述视频帧中的每个像素点作为特征点;或者,Each pixel point in the video frame is used as a feature point; or
    在所述视频帧中随机选取预设数量或预设比例的像素点作为特征点;或者,Selecting a preset number or a preset ratio of pixel points as feature points in the video frame; or
    对所述视频帧中的像素点进行均匀采样,得到特征点。The pixels in the video frame are uniformly sampled to obtain feature points.
  5. 根据权利要求3所述的方法,其中,所述提取的特征包括运动特征,还包括空间特征、色彩特征和时间特征中的至少一种。The method of claim 3 wherein the extracted features comprise motion features and further comprise at least one of spatial features, color features, and temporal features.
  6. 根据权利要求3所述的方法,其中,所述根据提取的特征判断所述特征点是否属于运动目标所在区域包括:The method according to claim 3, wherein the determining, according to the extracted feature, whether the feature point belongs to a moving target area comprises:
    根据提取的特征将所述特征点分为多个类别,得到所述视频帧中分别对应所述多个类别的多个区域,其中,一个区域包括属于该区域对应的类别的一个或多个特征点;Dividing the feature points into a plurality of categories according to the extracted features, and obtaining a plurality of regions corresponding to the plurality of categories in the video frame, wherein one region includes one or more features belonging to a category corresponding to the region point;
    获取所述多个区域各自的光流平均运动速度;Obtaining an average moving speed of the optical flow of each of the plurality of regions;
    将所述多个区域各自的光流平均运动速度分别与预设值比较;Comparing, respectively, the average moving speeds of the optical flows of the plurality of regions with a preset value;
    将所述多个区域中光流平均运动速度大于预设值的区域确定为运动目标所在区域。An area in which the average moving speed of the optical flow in the plurality of areas is greater than a preset value is determined as an area in which the moving target is located.
  7. 根据权利要求3所述的方法,其中,所述特征点的数量少于所述视频帧的像素点总数;所述根据属于所述运动目标所在区域的特征点确定第一区域包括:The method according to claim 3, wherein the number of the feature points is less than the total number of pixel points of the video frame; and determining the first region according to the feature points belonging to the region where the moving target is located includes:
    在所述视频帧中查找与非所述特征点的像素点距离最近的特征点;Finding, in the video frame, a feature point that is closest to a pixel point that is not the feature point;
    根据查找到的特征点是否属于所述运动目标所在区域的判断结果,确定所述非所述特征点的像素点是否属于所述运动目标所在区域;Determining, according to a determination result of whether the found feature point belongs to the area where the moving target is located, determining whether the pixel point of the non-the feature point belongs to the area where the moving target is located;
    根据属于所述运动目标所在区域的像素点确定第一区域。The first region is determined according to a pixel belonging to an area where the moving target is located.
  8. 根据权利要求1所述的方法,其中,所述方法还包括: The method of claim 1 wherein the method further comprises:
    生成标记所述视频帧中的每个像素点是否属于所述第一区域的标记模板;Generating a tag template that marks whether each pixel in the video frame belongs to the first region;
    所述对所述视频帧中的第二区域进行平滑滤波包括:The smoothing filtering the second region in the video frame includes:
    对所述视频帧中由所述标记模板标记的不属于所述第一区域的像素点构成的第二区域进行平滑滤波。Smoothing filtering is performed on a second region of the video frame that is formed by the marker template and that does not belong to the first region.
  9. 根据权利要求8所述的方法,其中,所述按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流,包括:The method according to claim 8, wherein the video frame is encoded according to an encoding manner in which the fidelity of the first region is higher than the fidelity of the second region, to obtain a video stream. ,include:
    按照由所述标记模板标记形成的第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。The video frame is encoded according to an encoding manner in which the fidelity of the first region formed by the mark template mark is higher than the fidelity of the second region, to obtain a video bitstream.
  10. 根据权利要求1所述的方法,其中,所述视频帧包括左眼视频帧和右眼视频帧;所述视频码流包括左眼视频码流和右眼视频码流;所述方法还包括:The method of claim 1, wherein the video frame comprises a left-eye video frame and a right-eye video frame; the video code stream comprises a left-eye video stream and a right-eye video stream; the method further comprising:
    将所述左眼视频码流和右眼视频码流发送到VR终端,使得所述VR终端将所述左眼视频码流和所述右眼视频码流分别解码后同步播放。The left-eye video code stream and the right-eye video code stream are sent to the VR terminal, so that the VR terminal separately decodes the left-eye video code stream and the right-eye video code stream and plays the same.
  11. 一种视频编码装置,包括:A video encoding device comprising:
    一个或一个以上存储器;One or more memories;
    一个或一个以上处理器;其中,One or more processors; among them,
    所述一个或一个以上存储器存储有一个或者一个以上指令模块,经配置由所述一个或者一个以上处理器执行;其中,The one or more memories storing one or more instruction modules configured to be executed by the one or more processors; wherein
    所述一个或者一个以上指令模块包括:The one or more instruction modules include:
    第一区域获取模块,用于获取视频帧;检测所述视频帧中的运动目标,并在所述视频帧中将所述运动目标所在区域确定为第一区域; a first area acquiring module, configured to acquire a video frame, detect a moving target in the video frame, and determine, in the video frame, a region where the moving target is located as a first region;
    区域滤波模块,用于对所述视频帧中的第二区域进行平滑滤波;所述视频帧包括所述第一区域和所述第二区域,并且所述第一区域和所述第二区域之间没有重叠;a region filtering module, configured to perform smoothing filtering on a second region in the video frame; the video frame includes the first region and the second region, and the first region and the second region There is no overlap between them;
    编码模块,用于按照所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。And an encoding module, configured to encode the video frame according to an encoding manner that the fidelity of the first region is higher than the fidelity of the second region, to obtain a video bitstream.
  12. 根据权利要求11所述的装置,其中,所述第一区域获取模块包括:全局运动补偿模块,用于获取摄像机运动参数;根据所述摄像机运动参数,对所述视频帧进行全局运动补偿处理。The apparatus according to claim 11, wherein the first area obtaining module comprises: a global motion compensation module, configured to acquire camera motion parameters; and perform global motion compensation processing on the video frame according to the camera motion parameters.
  13. 根据权利要求11所述的装置,其中,所述第一区域获取模块包括:The apparatus according to claim 11, wherein the first area obtaining module comprises:
    特征提取模块,用于在所述视频帧的像素点中确定特征点;提取所述特征点的特征;a feature extraction module, configured to determine a feature point in a pixel of the video frame; and extract a feature of the feature point;
    第一区域判断模块,用于当提取的所述特征点的特征属于运动目标所在区域时,根据属于所述运动目标所在区域的特征点确定第一区域。14、根据权利要求13所述的装置,其中,所述特征提取模块还用于将所述视频帧中的每个像素点作为特征点;或者,在所述视频帧中随机选取预设数量或预设比例的像素点作为特征点;或者,对所述视频帧中的像素点进行均匀采样,得到特征点。The first area determining module is configured to determine the first area according to the feature points belonging to the area where the moving target is located when the extracted feature of the feature point belongs to the area where the moving target is located. The device according to claim 13, wherein the feature extraction module is further configured to use each pixel point in the video frame as a feature point; or randomly select a preset quantity in the video frame or A pixel of a preset ratio is used as a feature point; or, a pixel point in the video frame is uniformly sampled to obtain a feature point.
  14. 根据权利要求13所述的装置,其中,所述第一区域判断模块还用于根据提取的特征将所述特征点分为多个类别,得到所述视频帧中分别对应所述多个类别的多个区域,其中,一个区域包括属于该区域对应的类别的一个或多个特征点;获取所述多个区域各自的光流平均运动速度;将所述多个区域各自的光流平均运动速度分别与预设值比较;将所述多个区域中光流平均运动速度大于预设值的区域确定为运动目标所在区域。 The apparatus according to claim 13, wherein the first area determining module is further configured to divide the feature points into a plurality of categories according to the extracted features, and obtain corresponding to the plurality of categories in the video frame respectively. a plurality of regions, wherein one region includes one or more feature points belonging to a category corresponding to the region; acquiring an average moving speed of the optical flow of each of the plurality of regions; and averaging moving speeds of the respective optical flows of the plurality of regions Comparing with the preset value respectively; determining an area in which the average moving speed of the optical flow in the plurality of areas is greater than a preset value is an area where the moving target is located.
  15. 根据权利要求13所述的装置,其中,所述特征点的数量少于所述视频帧的像素点总数;所述第一区域判断模块还用于在所述视频帧中查找与非所述特征点的像素点距离最近的特征点;根据查找到的特征点是否属于所述运动目标所在区域的判断结果,确定所述非所述特征点的像素点是否属于所述运动目标所在区域;根据属于所述运动目标所在区域的像素点确定第一区域。The apparatus according to claim 13, wherein the number of feature points is less than a total number of pixel points of the video frame; the first area determining module is further configured to search for and not the feature in the video frame. Determining, according to the judgment result of whether the found feature point belongs to the region where the moving target is located, determining whether the pixel point of the non-the feature point belongs to the region where the moving target is located; The pixel of the region where the moving target is located determines the first region.
  16. 根据权利要求11所述的装置,其中,所述第一区域获取模块还用于生成标记所述视频帧中的每个像素点是否属于所述第一区域的标记模板;The apparatus according to claim 11, wherein the first area obtaining module is further configured to generate a markup template that marks whether each pixel point in the video frame belongs to the first area;
    所述区域滤波模块还用于将所述视频帧中由所述标记模板标记的不属于所述第一区域的像素点构成的第二区域进行平滑滤波。The area filtering module is further configured to perform smoothing filtering on a second area of the video frame that is formed by the label template and that is not part of the first area.
  17. 根据权利要求17所述的装置,其中,所述编码模块还用于按照由所述标记模板标记形成的所述第一区域的保真度高于所述第二区域的保真度的编码方式,对所述视频帧进行编码,得到视频码流。The apparatus according to claim 17, wherein said encoding module is further configured to encode a fidelity of said first region formed by said mark template mark higher than a fidelity of said second region And encoding the video frame to obtain a video code stream.
  18. 根据权利要求11所述的装置,其中,所述视频帧包括左眼视频帧和右眼视频帧;所述视频码流包括左眼视频码流和右眼视频码流;所述装置还包括:视频码流发送模块,用于将所述左眼视频码流和右眼视频码流发送到VR终端,使得所述VR终端将所述左眼视频码流和所述右眼视频码流分别解码后同步播放。The apparatus of claim 11, wherein the video frame comprises a left-eye video frame and a right-eye video frame; the video code stream comprises a left-eye video stream and a right-eye video stream; the apparatus further comprising: a video code stream sending module, configured to send the left-eye video code stream and the right-eye video code stream to a VR terminal, so that the VR terminal separately decodes the left-eye video code stream and the right-eye video code stream Play after synchronization.
  19. 一种非易失性计算机可读存储介质,存储有计算机可读指令,可以使至少一个处理器执行如权利要求1-10任一项所述的方法。 A non-transitory computer readable storage medium storing computer readable instructions for causing at least one processor to perform the method of any of claims 1-10.
PCT/CN2017/091846 2016-07-08 2017-07-05 Video coding method and apparatus WO2018006825A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610541399.3 2016-07-08
CN201610541399.3A CN106162177B (en) 2016-07-08 2016-07-08 Method for video coding and device

Publications (1)

Publication Number Publication Date
WO2018006825A1 true WO2018006825A1 (en) 2018-01-11

Family

ID=58062467

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/091846 WO2018006825A1 (en) 2016-07-08 2017-07-05 Video coding method and apparatus

Country Status (2)

Country Link
CN (1) CN106162177B (en)
WO (1) WO2018006825A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360436A (en) * 2018-11-02 2019-02-19 Oppo广东移动通信有限公司 A kind of video generation method, terminal and storage medium
CN110807407A (en) * 2019-10-30 2020-02-18 东北大学 Feature extraction method for highly approximate dynamic target in video
CN111885332A (en) * 2020-07-31 2020-11-03 歌尔科技有限公司 Video storage method and device, camera and readable storage medium
CN112532917A (en) * 2020-10-21 2021-03-19 深圳供电局有限公司 Integrated intelligent monitoring platform based on streaming media
CN112672151A (en) * 2020-12-09 2021-04-16 北京达佳互联信息技术有限公司 Video processing method, device, server and storage medium
CN113891019A (en) * 2021-09-24 2022-01-04 深圳Tcl新技术有限公司 Video encoding method, video encoding device, shooting equipment and storage medium
CN116389761A (en) * 2023-05-15 2023-07-04 南京邮电大学 Clinical simulation teaching data management system of nursing
CN116684687A (en) * 2023-08-01 2023-09-01 蓝舰信息科技南京有限公司 Enhanced visual teaching method based on digital twin technology
CN117880520A (en) * 2024-03-11 2024-04-12 山东交通学院 Data management method for locomotive crewmember value multiplication standardized monitoring

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106162177B (en) * 2016-07-08 2018-11-09 腾讯科技(深圳)有限公司 Method for video coding and device
CN108156459A (en) * 2016-12-02 2018-06-12 北京中科晶上科技股份有限公司 Telescopic video transmission method and system
US10742999B2 (en) * 2017-01-06 2020-08-11 Mediatek Inc. Methods and apparatus for signaling viewports and regions of interest
CN108965929B (en) * 2017-05-23 2021-10-15 华为技术有限公司 Video information presentation method, video information presentation client and video information presentation device
WO2019006650A1 (en) * 2017-07-04 2019-01-10 腾讯科技(深圳)有限公司 Method and device for displaying virtual reality content
CN107454395A (en) * 2017-08-23 2017-12-08 上海安威士科技股份有限公司 A kind of high-definition network camera and intelligent code stream control method
CN109698957B (en) * 2017-10-24 2022-03-29 腾讯科技(深圳)有限公司 Image coding method and device, computing equipment and storage medium
CN108063946B (en) * 2017-11-16 2021-09-24 腾讯科技(成都)有限公司 Image encoding method and apparatus, storage medium, and electronic apparatus
CN108492322B (en) * 2018-04-04 2022-04-22 南京大学 Method for predicting user view field based on deep learning
CN110536138B (en) * 2018-05-25 2021-11-09 杭州海康威视数字技术股份有限公司 Lossy compression coding method and device and system-on-chip
CN108848389B (en) * 2018-07-27 2021-03-30 恒信东方文化股份有限公司 Panoramic video processing method and playing system
CN108924629B (en) * 2018-08-28 2021-01-05 恒信东方文化股份有限公司 VR image processing method
US11212537B2 (en) * 2019-03-28 2021-12-28 Advanced Micro Devices, Inc. Side information for video data transmission
CN110213587A (en) * 2019-07-08 2019-09-06 北京达佳互联信息技术有限公司 Method for video coding, device, electronic equipment and storage medium
CN110728173A (en) * 2019-08-26 2020-01-24 华北石油通信有限公司 Video transmission method and device based on target of interest significance detection
CN112261408B (en) * 2020-09-16 2023-04-25 青岛小鸟看看科技有限公司 Image processing method and device for head-mounted display equipment and electronic equipment
CN112954398B (en) * 2021-02-07 2023-03-24 杭州网易智企科技有限公司 Encoding method, decoding method, device, storage medium and electronic equipment
JP2024513036A (en) * 2021-03-31 2024-03-21 浙江吉利控股集団有限公司 Video image processing methods, devices, equipment and storage media
CN114339222A (en) * 2021-12-20 2022-04-12 杭州当虹科技股份有限公司 Video coding method
CN115297289B (en) * 2022-10-08 2022-12-23 南通第二世界网络科技有限公司 Efficient storage method for monitoring video

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101164341A (en) * 2005-03-01 2008-04-16 高通股份有限公司 Quality metric-biased region-of-interest coding for video telephony
CN101339602A (en) * 2008-07-15 2009-01-07 中国科学技术大学 Video frequency fire hazard aerosol fog image recognition method based on light stream method
CN101341494A (en) * 2005-10-05 2009-01-07 高通股份有限公司 Video frame motion-based automatic region-of-interest detection
CN104160703A (en) * 2012-01-26 2014-11-19 苹果公司 Object detection informed encoding
US20160021372A1 (en) * 2002-01-05 2016-01-21 Samsung Electronics Co., Ltd. Image coding and decoding method and apparatus considering human visual characteristics
CN106162177A (en) * 2016-07-08 2016-11-23 腾讯科技(深圳)有限公司 Method for video coding and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4375452B2 (en) * 2007-07-18 2009-12-02 ソニー株式会社 Image processing apparatus, image processing method, program, and display apparatus
CN101102495B (en) * 2007-07-26 2010-04-07 武汉大学 A video image decoding and encoding method and device based on area
CN101882316A (en) * 2010-06-07 2010-11-10 深圳市融创天下科技发展有限公司 Method, device and system for regional division/coding of image
CN104125470B (en) * 2014-08-07 2017-06-06 成都瑞博慧窗信息技术有限公司 A kind of method of transmitting video data
CN105100771A (en) * 2015-07-14 2015-11-25 山东大学 Single-viewpoint video depth obtaining method based on scene classification and geometric dimension

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160021372A1 (en) * 2002-01-05 2016-01-21 Samsung Electronics Co., Ltd. Image coding and decoding method and apparatus considering human visual characteristics
CN101164341A (en) * 2005-03-01 2008-04-16 高通股份有限公司 Quality metric-biased region-of-interest coding for video telephony
CN101341494A (en) * 2005-10-05 2009-01-07 高通股份有限公司 Video frame motion-based automatic region-of-interest detection
CN101339602A (en) * 2008-07-15 2009-01-07 中国科学技术大学 Video frequency fire hazard aerosol fog image recognition method based on light stream method
CN104160703A (en) * 2012-01-26 2014-11-19 苹果公司 Object detection informed encoding
CN106162177A (en) * 2016-07-08 2016-11-23 腾讯科技(深圳)有限公司 Method for video coding and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHAO, YAXIANG ET AL.: "Global motion estimation method with motion vectors and pixel recursion", JOURNAL OF IMAGE AND GRAPHICS, vol. 17, no. 2, 29 February 2012 (2012-02-29) *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360436A (en) * 2018-11-02 2019-02-19 Oppo广东移动通信有限公司 A kind of video generation method, terminal and storage medium
CN110807407A (en) * 2019-10-30 2020-02-18 东北大学 Feature extraction method for highly approximate dynamic target in video
CN110807407B (en) * 2019-10-30 2023-04-18 东北大学 Feature extraction method for highly approximate dynamic target in video
CN111885332A (en) * 2020-07-31 2020-11-03 歌尔科技有限公司 Video storage method and device, camera and readable storage medium
CN112532917B (en) * 2020-10-21 2023-04-14 深圳供电局有限公司 Integrated intelligent monitoring platform based on streaming media
CN112532917A (en) * 2020-10-21 2021-03-19 深圳供电局有限公司 Integrated intelligent monitoring platform based on streaming media
CN112672151A (en) * 2020-12-09 2021-04-16 北京达佳互联信息技术有限公司 Video processing method, device, server and storage medium
CN112672151B (en) * 2020-12-09 2023-06-20 北京达佳互联信息技术有限公司 Video processing method, device, server and storage medium
CN113891019A (en) * 2021-09-24 2022-01-04 深圳Tcl新技术有限公司 Video encoding method, video encoding device, shooting equipment and storage medium
CN116389761A (en) * 2023-05-15 2023-07-04 南京邮电大学 Clinical simulation teaching data management system of nursing
CN116389761B (en) * 2023-05-15 2023-08-08 南京邮电大学 Clinical simulation teaching data management system of nursing
CN116684687A (en) * 2023-08-01 2023-09-01 蓝舰信息科技南京有限公司 Enhanced visual teaching method based on digital twin technology
CN116684687B (en) * 2023-08-01 2023-10-24 蓝舰信息科技南京有限公司 Enhanced visual teaching method based on digital twin technology
CN117880520A (en) * 2024-03-11 2024-04-12 山东交通学院 Data management method for locomotive crewmember value multiplication standardized monitoring
CN117880520B (en) * 2024-03-11 2024-05-10 山东交通学院 Data management method for locomotive crewmember value multiplication standardized monitoring

Also Published As

Publication number Publication date
CN106162177B (en) 2018-11-09
CN106162177A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
WO2018006825A1 (en) Video coding method and apparatus
US11501507B2 (en) Motion compensation of geometry information
US7203356B2 (en) Subject segmentation and tracking using 3D sensing technology for video compression in multimedia applications
US20210279971A1 (en) Method, storage medium and apparatus for converting 2d picture set to 3d model
CN103002289B (en) Video constant quality coding device for monitoring application and coding method thereof
WO2018010653A1 (en) Panoramic media file push method and device
Yang et al. An objective assessment method based on multi-level factors for panoramic videos
CN110381268B (en) Method, device, storage medium and electronic equipment for generating video
KR20130115332A (en) Two-dimensional image capture for an augmented reality representation
WO2018040982A1 (en) Real time image superposition method and device for enhancing reality
JP2008547097A (en) Image segmentation
Sharma et al. A flexible architecture for multi-view 3DTV based on uncalibrated cameras
CN111476710A (en) Video face changing method and system based on mobile platform
US20170116741A1 (en) Apparatus and Methods for Video Foreground-Background Segmentation with Multi-View Spatial Temporal Graph Cuts
CN109698957A (en) Image encoding method, calculates equipment and storage medium at device
Wang et al. Deep unsupervised 3d sfm face reconstruction based on massive landmark bundle adjustment
JP2009212605A (en) Information processing method, information processor, and program
Zhang et al. A real-time time-consistent 2D-to-3D video conversion system using color histogram
CN110570441B (en) Ultra-high definition low-delay video control method and system
Jacobson et al. Scale-aware saliency for application to frame rate upconversion
US20230281921A1 (en) Methods of 3d clothed human reconstruction and animation from monocular image
Chittapur et al. Video forgery detection using motion extractor by referring block matching algorithm
Pan et al. An automatic 2D to 3D video conversion approach based on RGB-D images
CN108108794B (en) Two-dimensional code image hiding-based visual information enhancement method and system
GB2566478B (en) Probability based 360 degree video stabilisation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17823637

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17823637

Country of ref document: EP

Kind code of ref document: A1