WO2019020062A1 - 视频物体分割方法和装置、电子设备、存储介质和程序 - Google Patents

视频物体分割方法和装置、电子设备、存储介质和程序 Download PDF

Info

Publication number
WO2019020062A1
WO2019020062A1 PCT/CN2018/097106 CN2018097106W WO2019020062A1 WO 2019020062 A1 WO2019020062 A1 WO 2019020062A1 CN 2018097106 W CN2018097106 W CN 2018097106W WO 2019020062 A1 WO2019020062 A1 WO 2019020062A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
segmentation result
object segmentation
block
probability map
Prior art date
Application number
PCT/CN2018/097106
Other languages
English (en)
French (fr)
Inventor
李晓潇
齐元凯
王哲
陈恺
刘子纬
石建萍
罗平
吕健勤
汤晓鸥
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to US16/236,482 priority Critical patent/US11222211B2/en
Publication of WO2019020062A1 publication Critical patent/WO2019020062A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the present application relates to computer vision technology, and more particularly to a video object segmentation method and apparatus, an electronic device, a storage medium and a program.
  • object segmentation in video refers to grouping/segmentation of pixels in a frame of a video according to different objects, thereby subdividing the frame into multiple image sub-regions (a collection of pixels). the process of.
  • Object segmentation in video has important applications in many areas such as intelligent video analysis, security monitoring, and automatic driving.
  • the embodiment of the present application provides a technical solution for performing video object segmentation.
  • a video object segmentation method includes:
  • inter-frame transfer of the object segmentation result of the reference frame is sequentially performed from the reference frame, and an object segmentation result of at least one other frame in the at least part of the frame is obtained;
  • Determining the lost object by using the determined other frame as the target frame to update the object segmentation result of the target frame;
  • the object segmentation result after the target frame update is sequentially transmitted to at least one other frame in the video.
  • the reference frame includes: a first frame in the at least part of the frame; and the inter-frame sequence of the object segmentation result of the reference frame is sequentially performed from the reference frame Transmitting, comprising: performing inter-frame transfer of the object segmentation result of the first frame in a temporal positive direction in the at least part of the frame until a last frame in the at least part of the frame; or
  • the reference frame includes: a last frame of the at least part of the frame; and the inter-frame transfer of the object segmentation result of the reference frame is sequentially performed from the reference frame, including: segmenting the object of the last frame Inter-frame transfer in the reverse direction of the timing in the at least part of the frame until the first frame in the at least part of the frame; or
  • the reference frame includes: an intermediate frame between the first frame and the last frame in the at least part of the frame; and the inter-frame transfer of the object segmentation result of the reference frame is sequentially performed from the reference frame, including: Performing an inter-frame segmentation result of the intermediate frame in the at least partial frame in the timing positive direction until the last frame in the at least partial frame; and/or, the intermediate frame object
  • the segmentation result is inter-frame transfer in the at least partial frame in the reverse direction of the sequence until the first frame in the at least partial frame.
  • the self-reference frame sequentially performs inter-frame transmission of the object segmentation result of the reference frame, and obtains an object segmentation result of at least one other frame in the at least part of the frame.
  • the preceding frame includes: the adjacent frame or the adjacent key in the reverse direction or the reverse direction of the timing in the at least part of the frame in the at least part of the frame frame.
  • the object segmentation result of the previous frame in the propagation direction is determined according to the object segmentation result of the previous frame along the object segmentation result propagation direction of the reference frame, include:
  • determining, according to the image block and the probability map block, an object segmentation result of the at least one object in the subsequent frame including:
  • the method further includes: acquiring, according to the optical flow diagram between the previous frame and the subsequent frame, an optical flow tile corresponding to the at least one object;
  • Determining, according to the image block and the probability map block, an object segmentation result of the at least one object in the subsequent frame comprising: according to the image block, the probability map block, and the optical flow And a tile determining an object segmentation result of the at least one object in the subsequent frame.
  • determining, according to the image block, the probability map block, and the optical flow block, an object segmentation result of the at least one object in the subsequent frame include:
  • determining, according to the image block, the probability map block, and the optical flow block, an object segmentation result of the at least one object in the subsequent frame include:
  • the acquiring, according to the separately enlarged image block, the probability map block, and the optical flow block, acquiring the at least the subsequent frame The segmentation result of an object at the preset size, including:
  • the image block is larger than an object candidate frame of the object and smaller than an image size of the subsequent frame.
  • the determining, in the at least part of the frame, another frame of the object that is lost from the object segmentation result of the reference frame includes:
  • the at least one object candidate frame included in the object detection frame set is matched with the object candidate frame corresponding to the object segmentation result of the reference frame, including: respectively At least one object candidate frame included in the object detection frame set performs feature extraction; a feature of the at least one object candidate frame included in the object detection frame set, and an object candidate frame corresponding to the object segmentation result in the reference frame Feature matching;
  • Determining, according to the matching result, whether the current frame is another frame of the lost object with respect to the object segmentation result of the reference frame including: determining, according to the matching result, at least one object candidate frame included in the object detection frame set and the Whether there is an object candidate frame whose feature similarity between the features is higher than a preset threshold and the object class corresponding to the object segmentation result is inconsistent according to the object segmentation result in the reference frame; if there is similarity between the features If the degree is higher than the preset threshold, and the object candidate frame that is inconsistent according to the object classification result, the current frame is determined to be another frame of the lost object of the object segmentation result of the reference frame; otherwise, the current frame is determined. Other frames that are not lost objects relative to the object segmentation of the reference frame.
  • the determining, in the at least part of the frame, another frame of the object that is lost from the object segmentation result of the reference frame includes:
  • the at least one object candidate frame included in the object detection frame set is matched with the object candidate frame corresponding to the object segmentation result of the reference frame, including: respectively At least one object candidate frame included in the object detection frame set performs feature extraction; a feature of the at least one object candidate frame included in the object detection frame set, and an object candidate frame corresponding to the object segmentation result in the reference frame Feature matching;
  • Determining, according to the matching result, the other frames in the at least part of the frame that are different from the object segmentation result of the reference frame including: acquiring, according to the matching result, the at least one object candidate frame included in the object detection frame set and the In the object candidate frame corresponding to the object segmentation result in the reference frame, the similarity between the features is higher than the preset threshold, and the object candidate frame whose object class is inconsistent according to the object segmentation result; the similarity between the acquired features is higher than The frame corresponding to the object candidate frame in the object detection frame set in which the preset threshold is not consistent according to the object classification result is another frame of the lost object with respect to the object segmentation result of the reference frame.
  • the determining the other frame as the target frame includes:
  • one other frame is selected as the target frame from other frames that are missing objects with respect to the object segmentation result of the reference frame.
  • the object segmentation result that is updated by the target frame is sequentially delivered to at least one other frame in the video, including:
  • the at least one other frame includes: a first frame in the consecutive frames; and the object segmentation result after the target frame is updated is sequentially transmitted to the continuous
  • the at least one other frame in the frame includes: sequentially transmitting the object segmentation result after the target frame update in the sequential frame to the last frame of the consecutive frames in the sequential positive direction; or
  • the at least one other frame includes: a last frame of the consecutive frames; and sequentially transmitting the updated object segmentation result of the target frame to the at least one other frame in the consecutive frames, including: The object segmentation result after the target frame update is sequentially transmitted to the first frame in the consecutive frames in the reverse direction of the sequence in the consecutive frames; or
  • the at least one other frame includes: an intermediate frame between the first frame and the last frame in the consecutive frames; and sequentially transmitting the updated object segmentation result of the target frame to the at least the consecutive frames a further frame, comprising: transmitting an object segmentation result after updating the target frame to the last frame in the continuous frame in a sequential positive direction in a sequential positive direction in the sequential frame; and/or The object segmentation result after the target frame update is sequentially transmitted to the first frame in the continuous frame in the reverse direction of the sequence in the consecutive frames.
  • the other frame to which the target segment is updated after the target frame is updated, and the object that is updated by the target frame does not overlap.
  • another video object segmentation method including:
  • an image block including at least one object from a current frame in the video; acquiring a probability map block of the at least one object corresponding object category from an object class probability map of the adjacent frame of the current frame;
  • the determining, according to the image block and the probability map block, the object of the at least one object in the current frame Segmentation results including:
  • the method further includes: acquiring, according to the optical flow diagram between the adjacent frame and the current frame, light corresponding to the at least one object Flow block
  • Determining, according to the image block and the probability map block, an object segmentation result of the at least one object in the current frame including: according to the image block, the probability map block, and the optical flow diagram And determining a result of object segmentation of the at least one object in the current frame.
  • determining, according to the image block, the probability map block, and the optical flow block, the at least the current frame The result of object segmentation of an object, including:
  • determining, according to the image block, the probability map block, and the optical flow block, the at least the current frame The result of object segmentation of an object, including:
  • the determining, according to the separately enlarged image block, the probability map block, and the optical flow block, An object segmentation result of the at least one object in the current frame in the preset size comprising:
  • the adjacent frame of the current frame includes: the current frame in the video is adjacent to a positive direction of the timing or a reverse direction of the timing. Frame or adjacent keyframe.
  • the image block is larger than an object candidate frame of the object and smaller than the image size of the adjacent frame.
  • a video object segmentation apparatus includes:
  • the delivery network is configured to sequentially perform inter-frame transfer of the object segmentation result of the reference frame from the reference frame in at least part of the frame of the video, to obtain an object segmentation result of at least one other frame in the at least part of the frame; And sequentially transmitting the updated object segmentation result obtained by the object re-identification network to at least one other frame in the video;
  • the object re-identification network is configured to determine other frames in the at least part of the frame that are missing objects relative to the object segmentation result of the reference frame; and determine the other frames as the target frame to perform segmentation of the lost object to update the target The object segmentation result of the frame.
  • the reference frame includes: a first frame in the at least part of the frame; and the delivery network is configured to use the object segmentation result of the first frame in the Performing inter-frame transfer in the positive direction of the sequence in at least part of the frame until the last frame in the at least part of the frame; or
  • the reference frame includes: a last frame of the at least part of the frame; the delivery network is configured to transmit an object segmentation result of the last frame in an inter-frame direction in a reverse direction of the at least part of the frame until The first frame in the at least part of the frame; or,
  • the reference frame includes: an intermediate frame between the first frame and the last frame in the at least part of the frame; and the inter-frame transfer of the object segmentation result of the reference frame is sequentially performed from the reference frame, including: Performing an inter-frame segmentation result of the intermediate frame in the at least partial frame in the timing positive direction until the last frame in the at least partial frame; and/or, the intermediate frame object
  • the segmentation result is inter-frame transfer in the at least partial frame in the reverse direction of the sequence until the first frame in the at least partial frame.
  • the transmitting network sequentially performs inter-frame transmission of the object segmentation result of the reference frame from a reference frame, and obtains an object of at least one other frame in the at least part of the frame.
  • splitting the result it is used to:
  • the prior frame includes: the adjacent frame or the adjacent key frame in the backward direction or the reverse direction of the timing in the at least part of the frame in the at least part of the frame.
  • the delivery network includes:
  • a first acquiring module configured to acquire an image block including at least one object from the subsequent frame; and acquire, from the object class probability map of the previous frame, a probability map block corresponding to the object category of the at least one object;
  • a determining module configured to determine an object segmentation result of the at least one object in the subsequent frame according to at least the image block and the probability map block.
  • the determining module includes:
  • a first scaling unit configured to respectively enlarge the image block and the probability map block to a preset size
  • a first neural network configured to acquire an object segmentation result of the at least one object in the subsequent frame in the subsequent frame according to the separately enlarged image block and the probability map block;
  • a second scaling unit configured to restore an object segmentation result of the at least one object in the preset size to an object segmentation result in an original size according to an enlargement ratio of the image block and the probability map block.
  • the first acquiring module is further configured to acquire, according to the optical flow graph between the previous frame and the subsequent frame, the at least one object.
  • Optical flow block
  • the determining module is configured to determine an object segmentation result of the at least one object in the subsequent frame according to the image block, the probability map block, and the optical flow tile.
  • the determining module includes:
  • a first neural network configured to acquire, according to the image block and the probability map block, a first object segmentation result of the at least one object in the subsequent frame
  • a second neural network configured to acquire, according to the probability map block and the optical flow block, a second object segmentation result of the at least one object in the subsequent frame
  • a calculating unit configured to acquire an object segmentation result of the at least one object in the subsequent frame according to the first object segmentation result and the second object segmentation result.
  • the determining module includes:
  • a first scaling unit configured to respectively enlarge the image block, the probability map block, and the optical flow block to a preset size
  • an acquiring unit configured to acquire, according to the separately enlarged image block, the probability map block, and the optical flow block, an object segmentation of the at least one object in the subsequent frame in the preset size result
  • a second scaling unit configured to restore an object segmentation result of the at least one object at the preset size to an original size according to an enlargement ratio of the image block, the probability map block, and the optical flow block The result of the object segmentation below.
  • the acquiring unit includes:
  • a first neural network configured to acquire, according to the separately enlarged image block and the probability map block, a third object segmentation result of the at least one object in the subsequent frame;
  • a second neural network configured to acquire, according to the separately amplified probability block and the optical flow block, a fourth object segmentation result of the at least one object in the subsequent frame;
  • a calculating unit configured to determine, according to the third object segmentation result and the fourth object segmentation result, an object segmentation result of the at least one object in the subsequent frame in the preset size.
  • the image block is larger than an object candidate frame of the object and smaller than an image size of the subsequent frame.
  • the object re-identification network determines, when the other object in the at least part of the frame that is different from the object segmentation result of the reference frame, is used for:
  • the object re-identification network performs at least one object candidate frame included in the object detection frame set and an object candidate frame corresponding to the object segmentation result of the reference frame. And matching, at least one feature extraction is performed on at least one object candidate frame included in the object detection frame set; and the feature of the at least one object candidate frame included in the object detection frame set and the object in the reference frame Matching the features of the object candidate frame corresponding to the segmentation result;
  • the object re-identification network determines, by the object re-identification network, whether the current frame is another frame of the lost object of the object segmentation result of the reference frame, according to the matching result, determining, according to the matching result, at least one included in the object detection frame set And an object candidate frame corresponding to the object segmentation result in the reference frame, whether there is an object candidate frame whose similarity between the features is higher than a preset threshold and the object class corresponding to the object segmentation result is inconsistent; If there is an object candidate frame whose feature degree is higher than a preset threshold and the object class corresponding to the object segmentation result is inconsistent, it is determined that the current frame is another frame of the lost object with respect to the object segmentation result of the reference frame; otherwise And determining that the current frame is not another frame of the lost object relative to the object segmentation result of the reference frame.
  • the object re-identification network determines, when the other object in the at least part of the frame that is different from the object segmentation result of the reference frame, is used for:
  • the object re-identification network performs at least one object candidate frame included in the object detection frame set and an object candidate frame corresponding to the object segmentation result of the reference frame. And matching, at least one feature extraction is performed on at least one object candidate frame included in the object detection frame set; and the feature of the at least one object candidate frame included in the object detection frame set and the object in the reference frame Matching the features of the object candidate frame corresponding to the segmentation result;
  • the object re-identification network determines, by the object re-identification network, the at least one frame included in the object detection frame set according to the matching result, when determining, according to the matching result, another frame in the at least part of the frame that is different from the object segmentation result of the reference frame
  • the object candidate frame corresponding to the object segmentation result in the reference frame, the object candidate frame whose feature similarity is higher than the preset threshold and the object class corresponding to the object segmentation result is inconsistent;
  • the frame corresponding to the object candidate frame in the object detection frame set in which the similarity between the objects is higher than the preset threshold value and the object classification corresponding to the object segmentation result is another frame of the lost object with respect to the object segmentation result of the reference frame.
  • the object when the object re-identifying the determined other frame as the target frame, the object is used to: if the object segmentation result in the at least part of the frame relative to the reference frame
  • the other frames of the lost object include a plurality of other frames selected as the target frame from other frames that are missing objects relative to the object segmentation result of the reference frame.
  • the delivery network when the delivery network sequentially transmits the updated object segmentation result to the at least one other frame in the video, the delivery network is configured to:
  • the at least one other frame includes: a first frame in the consecutive frames; and the delivery network is configured to segment the object after the target frame is updated. Passing sequentially to the last frame of the consecutive frames in the sequential direction in the sequential frame; or
  • the at least one other frame includes: a last frame of the consecutive frames; the delivery network is configured to: sequentially transmit the object segmentation result after the target frame is updated to the sequence in the reverse direction of the consecutive frames The first frame in consecutive frames; or
  • the at least one other frame includes: an intermediate frame between the first frame and the last frame in the consecutive frames; the delivery network is configured to: the object segmentation result after the target frame is updated in the continuous frame The positive direction timing of the middle edge is sequentially transmitted to the last frame in the continuous frame in the positive direction of the sequence; and/or the object segmentation result after the update of the target frame is sequentially transmitted in the reverse direction of the sequential frame to the reverse direction of the sequence The first frame in the consecutive frames.
  • another video object segmentation apparatus including:
  • a first acquiring module configured to acquire an image block including at least one object from a current frame in the video; acquire a probability map block of the at least one object corresponding object category from an object class probability map of the adjacent frame of the current frame;
  • a determining module configured to determine an object segmentation result of the at least one object in the current frame according to at least the image block and the probability map block.
  • the determining module includes:
  • a first scaling unit configured to respectively enlarge the image block and the probability map block to a preset size
  • a first neural network configured to acquire, according to the separately enlarged image block and the probability map block, an object segmentation result of the at least one object in the current frame at the preset size
  • a second scaling unit configured to restore an object segmentation result of the at least one object in the preset size to an object segmentation result in an original size according to an enlargement ratio of the image block and the probability map block.
  • the first acquiring module is further configured to acquire the image according to an optical flow diagram between the adjacent frame and the current frame. At least one object corresponding to the optical flow block;
  • the determining module is configured to determine an object segmentation result of the at least one object in the current frame according to the image block, the probability map block, and the optical flow tile.
  • the determining module includes:
  • a first neural network configured to acquire, according to the image block and the probability map block, a first object segmentation result of the at least one object in the current frame
  • a second neural network configured to acquire, according to the probability map block and the optical flow block, a second object segmentation result of the at least one object in the current frame
  • a calculating unit configured to acquire an object segmentation result of the at least one object in the current frame according to the first object segmentation result and the second object segmentation result.
  • the determining module includes:
  • a first scaling unit configured to respectively enlarge the image block, the probability map block, and the optical flow block to a preset size
  • an acquiring unit configured to acquire, according to the separately enlarged image block, the probability map block, and the optical flow block, an object segmentation result of the at least one object in the current frame at the preset size ;
  • a second scaling unit configured to restore an object segmentation result of the at least one object at the preset size to an original size according to an enlargement ratio of the image block, the probability map block, and the optical flow block The result of the object segmentation below.
  • the acquiring unit includes:
  • a first neural network configured to acquire, according to the separately enlarged image block and the probability map block, a third object segmentation result of the at least one object in the current frame
  • a second neural network configured to acquire, according to the separately amplified probability block and the optical flow block, a fourth object segmentation result of the at least one object in the current frame
  • a calculating unit configured to determine, according to the third object segmentation result and the fourth object segmentation result, an object segmentation result of the at least one object in the current frame at the preset size.
  • the adjacent frame of the current frame includes: adjacent to the current frame in the video in the positive direction or the reverse direction of the sequence. Frame or adjacent keyframe.
  • the image block is larger than an object candidate frame of the object and smaller than an image size of the adjacent frame.
  • an electronic device includes the video object segmentation apparatus of any of the above embodiments.
  • another electronic device including:
  • a memory for storing executable instructions
  • a processor for communicating with the memory to execute the executable instructions to perform the operations of the method of any of the above-described embodiments of the present application.
  • a computer storage medium for storing computer readable instructions that, when executed, implement the operations of the method of any of the above-described embodiments of the present application.
  • a computer program comprising computer readable instructions, when a computer readable instruction is run in a device, a processor in the device executes An executable instruction that implements the steps in the method of any of the above-described embodiments of the present application.
  • a video object segmentation method and apparatus an electronic device, a storage medium, and a program are provided, according to an embodiment of the present application, in an at least part of a frame of a video, an inter-frame transfer of an object segmentation result of a reference frame is sequentially performed from a reference frame.
  • An object segmentation result of the at least one other frame in the at least part of the frame determining, in the at least part of the frame, the object segmentation result of the object frame segmentation result is lost to other frames of the object, and determining the other frame as the target frame to perform segmentation of the lost object to update the target
  • the object segmentation result of the frame; the object segmentation result after the target frame is updated is sequentially transmitted to at least one other frame in the video to correct the object segmentation result of the at least one other frame.
  • the object segmentation result of the reference frame may be transmitted to other frames in at least part of the frame of the video, so that the video object segmentation result is more sequential in time series; and the segmentation of the lost object is performed on the target frame of the lost object in the delivery.
  • the object segmentation result of the target frame Updating the object segmentation result of the target frame, and sequentially transmitting the updated object segmentation result to at least one other frame in the video, and correcting the object segmentation result of the other frames delivered to improve the occlusion
  • the object segmentation result fails to be transmitted due to a large change in the attitude of the object, and the plurality of object motions overlap and then separate, the object segmentation result may confuse or lose part of the object, and the accuracy of the video object segmentation result is improved.
  • an image block including at least one object is acquired from a current frame in the video, and an object class probability from a neighboring frame of the current frame is obtained.
  • the map acquires a probability map block corresponding to the object category of the object; and at least determines an object segmentation result of the object in the current frame according to the image block and the probability map block.
  • the object segmentation result of the object in the current frame is determined based on the image block including the object in the current frame and the probability map block of the object corresponding object category in the object class probability map of the adjacent frame, and the image can be effectively captured in the image.
  • the small size object and detail information reduce the interference of background noise in the image, thereby improving the failure of the object segmentation result transmission caused by the small size and large size change of any object in the frame, and improving the segmentation result of the video object. Accuracy.
  • FIG. 1 is a flow chart of an embodiment of a video object segmentation method of the present application.
  • FIG. 2 is a flow chart of another embodiment of a video object segmentation method of the present application.
  • FIG. 3 is a schematic diagram of a process for segmenting an object in a video by applying an embodiment of a video object segmentation method of the present application.
  • FIG. 4 is a flow chart of still another embodiment of a video object segmentation method of the present application.
  • FIG. 5 is a flowchart of still another embodiment of a video object segmentation method of the present application.
  • FIG. 6 is a diagram showing an example of inter-frame transfer of an object segmentation result in the embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a video object segmentation apparatus according to the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a delivery network in an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of another embodiment of a delivery network in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of still another embodiment of a delivery network according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of another embodiment of a delivery network in an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an application embodiment of an electronic device according to the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a flow chart of an embodiment of a video object segmentation method of the present application. As shown in FIG. 1, the video object segmentation method of this embodiment includes:
  • the frame in any embodiment of the present application is a frame image.
  • at least part of the frame may be a frame in the entire video, or a frame included in one of the videos in the video, or a set of frames extracted from at least one frame in the video, and the application may be applied. For example, video object segmentation is performed.
  • the reference frame may be the first frame of the at least partial frame.
  • the object segmentation result of the first frame is inter-frame-transferred in the timing positive direction in the at least part of the frame until the last frame in the at least part of the frame.
  • the reference frame may be the last frame of the at least partial frames.
  • the object segmentation result of the last frame is inter-frame-transferred in the reverse direction of the at least part of the frame until the first frame in the at least part of the frame.
  • the reference frame may be an intermediate frame between the first frame and the last frame in the at least part of the frames.
  • the object segmentation result of the middle frame is inter-frame-transferred in the positive timing direction in the at least part of the frame until the last frame in the at least part of the frame; and/or, the middle The object segmentation result of one frame is inter-frame-transferred in the reverse direction of the timing in the at least part of the frame until the first frame in the at least part of the frame.
  • the object segmentation result may be represented as a probability map of the object.
  • the object segmentation result of each frame can be represented as a probability map, and the value of each pixel in the probability map represents the object class of the object in the frame corresponding to the pixel.
  • the object segmentation result of each frame can also be represented as multiple probability maps, and each probability map respectively represents a probability map of an object category in the frame.
  • the object type of the object in the frame corresponding to each pixel is The value of the object represented by the probability map may be 1; otherwise, the object type of the object in the frame corresponding to each pixel is not the object type represented by the probability map, and the value of the pixel may be 0. .
  • the operation 102 may be performed by a processor invoking a corresponding instruction stored in a memory, or by a delivery network 502 being executed by the processor.
  • the operation 104 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an object recognition network 504 being executed by the processor.
  • the determined other frame is used as the target frame to perform segmentation of the lost object to update the object segmentation result of the target frame.
  • the target frame may be one or more frames in other frames of the object segmentation result missing object in the at least part of the frame.
  • the operation 106 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an object recognition network 504 being executed by the processor.
  • the object segmentation result after the target frame is updated is sequentially transmitted to at least one other frame in the video.
  • the operation 108 may be performed by a processor invoking a corresponding instruction stored in a memory or by a delivery network 502 being executed by the processor.
  • the operations 104-108 may be performed once, or may be performed by a loop execution operation until the object segmentation result of the relative reference frame in the at least part of the frames does not exist other frames of the lost object.
  • operations 102 and 108 can be regarded as a propagation process of an object segmentation result, respectively, and operations 104 and 106 can be regarded as an object re-identification process. That is, in the embodiment of the present application, the operations 104-108 can be regarded as a cyclic process in which the propagation process of the object segmentation result and the object re-identification process are alternately performed.
  • the target frame in operation 108 may be used as a reference frame, and the object segmentation result after the target frame is updated is used as the object segmentation result of the reference frame, and inter-frame transfer is performed in the video or at least part of the frame.
  • the object segmentation result of the reference frame may be transmitted to other frames in at least part of the frame of the video, so that the video object segmentation result is more sequential in time series; and the segmentation of the lost object is performed on the target frame of the lost object in the delivery. Updating the object segmentation result of the target frame, and sequentially transmitting the updated object segmentation result to at least one other frame in the video, and correcting the object segmentation result of the other frames delivered to improve the occlusion
  • the object segmentation result fails to be transmitted due to a large change in the attitude of the object, and the plurality of object motions overlap and then separate, the object segmentation result may confuse or lose part of the object, and the accuracy of the video object segmentation result is improved.
  • the inter-frame transfer of the object segmentation result of the reference frame is sequentially performed from the reference frame, and the object segmentation result of at least one other frame in at least part of the frame is obtained, and It is achieved as follows:
  • the object segmentation result of the subsequent frame in the propagation direction is determined according to the object segmentation result of the previous frame in the propagation direction of the object along the reference frame, wherein the propagation direction includes the timing positive direction and/or the timing reverse direction of the video.
  • the precedence frame and the subsequent frame are relative to the direction of propagation, and have relativity.
  • the direction of propagation may be the timing of the video or the reverse direction of the timing.
  • the frame in the forward direction in the propagation direction is the previous frame, and the frame in the backward direction in the propagation direction is the subsequent frame.
  • the previous frame may be: an adjacent frame or an adjacent key frame in a reverse direction of the timing or a reverse direction of the timing in the at least part of the frame in the frame, wherein the key frame may be in the timing of the at least part of the frame In the opposite direction of the direction or timing, the frame is separated by a predetermined number of frames between the subsequent frames.
  • the object segmentation result of the subsequent frame in the propagation direction is determined according to the object segmentation result of the previous frame in the propagation direction of the object segmentation result of the reference frame, and the following operation can be implemented by using a delivery network:
  • an image block including at least one object from a subsequent frame; acquiring a probability map block of the object category corresponding to the at least one object from the object class probability map of the previous frame;
  • the extracted image block including the object may be larger than the object candidate frame of the object and smaller than the image size of the subsequent frame, so that the feature may be subsequently extracted from the image block. Extracting more context information helps to obtain more accurate object segmentation results for the object.
  • determining the object segmentation result of the object in the posterior frame based on at least the image block and the probability map block may be implemented by using a delivery network as follows:
  • the object segmentation result of the at least one object at the preset size is restored to the object segmentation result under the original size, that is, the object of the at least one object at the preset size is segmented.
  • the reduction is performed in proportion to the above-described enlargement ratio, and the object division result of the at least one object is obtained.
  • determining an object segmentation result of the subsequent frame in the propagation direction according to the object segmentation result of the previous frame in the object segmentation result propagation direction along the reference frame may further include: according to the previous frame and the subsequent frame An optical flow diagram between the frames acquires an optical flow tile corresponding to the at least one object.
  • the optical flow graph between the previous frame and the subsequent frame can be obtained through an optical flow network.
  • determining an object segmentation result of at least one object in the subsequent frame according to the image block, the probability map block, and the optical flow block may be exemplarily implemented as follows:
  • the operation may be implemented by using a first neural network in the transmission network; and according to the probability map block and the optical flow block, Obtaining a second object segmentation result of the at least one object in the subsequent frame, the operation being implemented by transmitting a second neural network in the network;
  • the operation may be implemented by a calculation module in the delivery network. For example, acquiring the sum of the first object segmentation result and the second object segmentation result as the object segmentation result of the at least one object in the subsequent frame; or acquiring the average of the first object segmentation result and the second object segmentation result
  • the value is the result of the object segmentation of the at least one object in the subsequent frame.
  • determining an object segmentation result of at least one object in the subsequent frame may also be implemented by the delivery network by performing the following operations:
  • Enlarging the image block, the probability map block, and the optical flow block to a preset size, respectively, the operation may be implemented by transmitting a first scaling unit in the network;
  • the object segmentation result of the at least one object in the preset size is restored to the object segmentation result in the original size, that is, the at least one object is in the preset size.
  • the lower object segmentation result is reduced in proportion to the above-mentioned magnification ratio, and the object segmentation result of the at least one object is obtained. This operation can be achieved by passing a second scaling unit in the network.
  • the object segmentation result of the at least one object in the subsequent frame in the subsequent frame is obtained according to the separately enlarged image block, the probability map block, and the optical flow block, and can be exemplarily implemented as follows:
  • obtaining a third object segmentation result of at least one object in the subsequent frame according to the separately enlarged image block and the probability map block For example, using a first neural network in the delivery network, obtaining a third object segmentation result of at least one object in the subsequent frame according to the separately enlarged image block and the probability map block; and using, for example, a second neural network in the delivery network, Obtaining a fourth object segmentation result of at least one object in the subsequent frame according to the separately amplified probability map block and the optical flow block;
  • an object segmentation result of at least one object in the subsequent frame at a preset size is determined.
  • the deep residual network has the function of extracting the distinctive features of the stronger judgment.
  • the first neural network and the second neural network may be implemented by using a deep residual network.
  • the deep residual network usually has 101 network layers, which can be called a 101-layer deep residual network.
  • the deep residual network can also have more network layers.
  • the more network layers of the deep residual network the higher the accuracy of the output result, but the more computing time and memory resources required, the 101-layer depth residual.
  • the network can achieve a better balance point in output accuracy, time complexity and space complexity.
  • the probability map block of the commonly used 101-layer depth residual network output is 2048 channels, and the size of the probability map block is 1/224 of the original image size, that is, the size of the probability map block is 1*1.
  • a deep residual network of the network layer may be used, the first convolutional neural network and the second convolutional neural network.
  • the 101-layer depth residual network can be improved as follows: reduce the convolution of the convolution layer in the 101-layer depth residual network. The step size is performed and the convolution kernel is expanded to increase the convolution kernel size.
  • the operation 104 can be implemented as follows:
  • each frame corresponds to a candidate set for storing all object candidate frames in the frame
  • At least one object candidate frame (for example, each object candidate frame) included in the object detection frame set of the current frame to the subsequent frame of the object corresponding to the object segmentation result of the reference frame;
  • matching the object candidate frame included in the object detection frame set of the current frame with the object candidate frame corresponding to the object segmentation result of the reference frame may include: respectively: detecting the frame detection frame set of the current frame separately Each object candidate frame performs feature extraction; and the feature of each object candidate frame included in the object detection frame set is matched with the feature of the object candidate frame corresponding to the object segmentation result in the reference frame.
  • determining whether the current frame is another frame of the object that is lost by the object segmentation result of the reference frame according to the matching result may include: determining, according to the matching result, at least one object candidate frame included in the object detection frame set of the current frame and the reference frame The object candidate frame corresponding to the object segmentation result, whether there is an object candidate frame whose similarity between the features is higher than a preset threshold and the object class corresponding to the object segmentation result is inconsistent; if the similarity between the features is higher than Determining a threshold, and according to an object candidate frame whose object classification is inconsistent according to the object segmentation result, determining that the current frame is another frame of the object that is missing from the object segmentation result of the reference frame; otherwise, determining that the current frame is not the object of the relative reference frame is lost. Other frames of the object.
  • the operation 104 may also be implemented as follows:
  • the other frames of the object that are missing from the object segmentation result of the relative reference frame in at least part of the frame are determined according to the matching result.
  • matching at least one object candidate frame included in the object detection frame set with the object candidate frame corresponding to the object segmentation result of the reference frame may include: respectively detecting at least one object candidate included in the object detection frame set The frame performs feature extraction; matching the feature of the at least one object candidate frame included in the object detection frame set with the feature of the object candidate frame corresponding to the object segmentation result in the reference frame.
  • determining, according to the matching result, the other frames of the object that are missing from the object segmentation result in the at least part of the frame may include:
  • the similarity between the features is higher than a preset threshold.
  • an object candidate frame that is inconsistent according to the object type corresponding to the object segmentation result; and the object candidate frame in the object detection frame set in which the similarity between the acquired features is higher than a preset threshold and the object type corresponding to the object segmentation result is inconsistent The frame is the other frame of the object that is lost relative to the object of the reference frame.
  • the method may include:
  • the other frames of the object that are missing from the object segmentation result in the at least part of the frame include a plurality of frames, and other frames that are missing objects from the object segmentation result of the reference frame, another frame is selected as the target frame according to a preset strategy.
  • the operation 108 may include:
  • the object segmentation result after the target frame update is sequentially transmitted to at least one other frame in the continuous frame.
  • the at least one other frame may be the first frame in consecutive frames.
  • the object segmentation result after the target frame is updated when sequentially transmitted to at least one other frame in the continuous frame, the object segmentation result after the target frame update may be sequentially transmitted to the continuous frame in the sequential positive direction in consecutive frames. The last frame.
  • the at least one other frame may also be the last frame in consecutive frames.
  • the object segmentation result after the target frame update may be sequentially transmitted to the consecutive frames in the reverse direction of the sequential frame in consecutive frames. The first frame.
  • the at least one other frame may be an intermediate frame between the first frame and the last frame in consecutive frames.
  • the object segmentation result after the target frame update may be sequentially transmitted to the continuous frame in the sequential positive direction in consecutive frames.
  • the last frame; and/or, the object segmentation result after the target frame is updated is sequentially transmitted in the sequential frame in the reverse direction of the sequence to the first frame in the continuous frame.
  • the other frames to which the object segmentation result after the target frame is updated are transmitted, and the target frame is updated.
  • the range of other frames to which the object segmentation result is passed does not overlap.
  • the object information therein may be an object feature or an object class.
  • the object information of the missing object may be queried in the correction information table
  • the object segmentation result of the lost object has been previously corrected, and the frame number corresponding to the object information of the lost object in the correction information table is queried, and the lost object is obtained before the obtaining.
  • the object segmentation result is the other frame in the video to which the object segmentation result of the target frame is updated, and accordingly, other frames in the video to which the object segmentation result after the target frame update is sequentially transmitted are determined to ensure The other frames determined this time are not duplicated with other frames corresponding to the queried frame number.
  • the object segmentation result after the target frame is updated is sequentially transmitted to the 21st to 23rd frames in the video, and the object segmentation result is transmitted continuously for the lost object, even if the video is acquired.
  • the frame in which the lost object is lost is the 20th frame to the 27th frame. Since the object segmentation result of the 21st frame to the 23rd frame has been corrected based on the lost object last time, the object segmentation result after the target frame update is performed this time. When passed, it can be passed to frames 24 through 27 of the video.
  • the embodiment it is possible to avoid re-correcting the object segmentation result of the previous round of correction in the process of the subsequent round of object segmentation result transmission for the same lost object, thereby causing the flow of the embodiment of the present application to be infinitely looped;
  • the distance segmentation result of the distance may cause the accuracy of the object segmentation result to be deteriorated.
  • the segmentation result of the object with poor accuracy can be avoided to correct the more accurate object segmentation result on a certain frame, thereby effectively ensuring object segmentation. The accuracy of the results.
  • FIG. 2 is a flow chart of another embodiment of a video object segmentation method of the present application. As shown in FIG. 2, the video object segmentation method of this embodiment includes:
  • the operation 202 may be: receiving an object segmentation result of the reference frame, and the object segmentation result of the reference frame may be obtained in advance.
  • the operation 202 may also be implemented by the image object segmentation method: performing object segmentation on the reference frame to obtain an object segmentation result of the reference frame.
  • the object segmentation of the reference frame may be performed by obtaining an object segmentation result of the reference frame as follows:
  • Feature extraction is performed on the reference frame to obtain features of the reference frame.
  • the features of the reference frame may be represented, for example, in the form of a feature vector or a property map;
  • An object segmentation result of the reference frame is obtained by predicting an object class of each pixel in the reference frame according to the feature.
  • the operation 202 may be performed by a processor invoking a corresponding instruction stored in a memory or by a delivery network 502 being executed by the processor.
  • inter-frame transfer of the object segmentation result of the reference frame is sequentially performed from the reference frame, and an object segmentation result of each other frame in the at least part of the frame is obtained.
  • an object segmentation result of a subsequent frame in a propagation direction may be determined according to an object segmentation result of a previous frame in a direction of propagation of an object segmentation result along the reference frame, wherein the propagation direction includes a video. Timing positive direction and / or timing reverse direction.
  • the precedence frame and the subsequent frame are relative to the direction of propagation, and have relativity.
  • the direction of propagation may be the timing of the video or the reverse direction of the timing.
  • the frame in the forward direction in the propagation direction is the previous frame, and the frame in the backward direction in the propagation direction is the subsequent frame.
  • the previous frame may be: an adjacent frame or an adjacent key frame in a reverse direction of the timing or a reverse direction of the timing in the at least part of the frame in the frame, wherein the key frame may be in the timing of the at least part of the frame In the opposite direction of the direction or timing, the frame is separated by a predetermined number of frames between the subsequent frames.
  • the at least part of the frame may be a frame in the entire video, or a frame included in one of the videos in the video, or a set of frames extracted from at least one frame in the video, according to the application requirement.
  • the application embodiment performs video object segmentation.
  • the operation 204 may be referred to as a propagation process of the object segmentation result.
  • the operation 204 may be performed by a processor invoking a corresponding instruction stored in a memory, or by a delivery network 502 being executed by the processor.
  • operations 206-208 may be referred to as an object re-recognition process.
  • the operations 206-208 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by an object recognition network 504 being executed by the processor.
  • the object segmentation result after the target frame is updated is transmitted to at least one other frame in the video along the timing positive direction and/or the timing reverse direction of the video, and the target frame is used as the reference frame, and the updated object is the target frame.
  • the segmentation result is updated to the object segmentation result of the frame from the subsequent frame of the target frame to the frame of the at least one other frame in the object segmentation result propagation direction.
  • the operation 210 may be referred to as a propagation process of the object segmentation result.
  • the operation 210 may be performed by a processor invoking a corresponding instruction stored in a memory or by a delivery network 502 being executed by the processor.
  • FIG. 3 a schematic diagram of a process for dividing an object in a video by applying the video object segmentation method embodiment of the present application.
  • the picture shown in the first line is at least part of a frame in a video, which includes 82 frames of images, and the first line in FIG. 3 exemplarily indicates the first, eighth, twenty, thirty, and fifty, Frame number of 64 and 82 frames.
  • the object segmentation result can be obtained in advance, for example, by manual acquisition or by an image object segmentation method.
  • the segmentation result of the object of the first frame is transmitted between the frames in the positive direction of the video, and is transmitted to the last frame, that is, the 82nd frame, see the picture in the second line;
  • step 2 it is determined that the segmentation result of the object of the first frame is transmitted to the 82nd frame, and the other frame of the object is lost relative to the object segmentation result of the first frame, and the frame 16-36 is assumed to be included;
  • step 3 the 21st frame is selected as the target frame for segmentation of the lost object, and the object segmentation result of the target frame is updated according to the segmentation result of the lost object, see the picture in the third line;
  • the 21st frame is used as the reference frame, and the object segmentation result updated by the 21st frame is sequentially transmitted in the order of the video timing positive direction and the timing reverse direction, so as to the timing of the 21st frame edge video.
  • the object segmentation result of the adjacent frame in the positive direction and the reverse direction of the timing is updated, and the segmentation result of the missing object in the adjacent frames is retrieved, see the picture in the fourth row;
  • the 80th frame is selected as the target frame for segmentation of the lost object, and the object segmentation result of the target frame is updated according to the segmentation result of the lost object, see the picture in the 5th line;
  • the object segmentation result updated in the 80th frame is sequentially transmitted in the order of the video timing positive direction and the timing reverse direction, so as to face the 80th frame along the video timing positive direction and the timing inverse
  • the object segmentation result of the upward adjacent frame is updated to retrieve the segmentation result of the lost object in the adjacent frames, see the sixth row;
  • step two After that, returning to step two to step four again, until the object segmentation result of the first frame in the at least part of the frame does not exist other frames of the lost object.
  • FIG. 4 is a flow chart of still another embodiment of a video object segmentation method of the present application.
  • This embodiment can be exemplarily implemented by a delivery network.
  • the video object segmentation method of this embodiment includes:
  • the operation 302 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first acquisition module 602 being executed by the processor.
  • the operation 304 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a determination module 604 executed by the processor.
  • the object segmentation result of the object in the current frame is determined based on the image block including the object in the current frame and the probability map block of the object corresponding object category in the object class probability map of the adjacent frame, and the image can be effectively captured in the image.
  • the small size object and detail information reduce the interference of background noise in the image, thereby improving the failure of the object segmentation result transmission caused by the small size and large size change of any object in the frame, and improving the segmentation result of the video object. Accuracy.
  • operation 304 can include:
  • the object segmentation result of the at least one object at the preset size is restored to the object segmentation result under the original size.
  • operation 302 may further include: acquiring an optical flow tile corresponding to the object according to the optical flow diagram between the adjacent frame and the current frame. Accordingly, operation 304 can include determining an object segmentation result of the object in the current frame based on the image block, the probability map block, and the optical flow tile.
  • determining an object segmentation result of at least one object in the current frame according to the image block, the probability map block, and the optical flow block may be implemented as follows:
  • a first neural network in the network For example, by transmitting a first neural network in the network, acquiring a first object segmentation result of at least one object in the current frame according to the image block and the probability map block; and, for example, by transmitting a second neural network in the network, according to the probability map block and the light Flowing a block to obtain a second object segmentation result of at least one object in the current frame;
  • the object segmentation result of the at least one object in the current frame is obtained according to the first object segmentation result and the second object segmentation result by the calculation module in the delivery network.
  • determining an object segmentation result of the at least one object in the current frame according to the image block, the probability map block, and the optical flow block may be implemented as follows:
  • an object segmentation result of the at least one object in the current frame at a preset size according to the separately enlarged image block, the probability map block, and the optical flow block For example, by transmitting a first neural network in the network, acquiring a third object segmentation result of the at least one object in the current frame according to the separately enlarged image block and the probability map block; and, for example, by transmitting a second neural network in the network Obtaining a fourth object segmentation result of at least one object in the current frame according to the separately amplified probability block and the optical flow block; determining, by the calculation module in the transmission network, the third object segmentation result and the fourth object segmentation result, determining An object segmentation result of at least one object in the current frame at a preset size;
  • the object segmentation result of the at least one object at the preset size is restored to the object segmentation result under the original size according to the magnification ratio of the image block, the probability map block, and the optical flow block.
  • the current frame and the adjacent frame are relative to the direction of the propagation direction and have relativity.
  • the direction of propagation may be the timing of the video or the reverse direction of the timing.
  • the frame in the forward direction in the propagation direction is the adjacent frame
  • the frame in the backward direction in the propagation direction is the current frame.
  • the adjacent frame may be: the adjacent frame or the adjacent key frame in the video in the positive direction of the timing or the reverse direction of the timing in the video, wherein the adjacent frame may be in the video in the positive direction of the timing or the reverse direction of the timing, and the current A frame with frames separated by a preset number of frames.
  • the adjacent frame changes correspondingly to the current frame.
  • the size of the image block of the object acquired from the current frame may be larger than the object candidate frame of the at least one object, so that the feature can be extracted later when the feature can be extracted from the image block. More context information helps to obtain more accurate object segmentation results for the at least one object.
  • FIG. 5 is a flowchart of still another embodiment of a video object segmentation method of the present application. As shown in FIG. 5, the video object segmentation method of this embodiment includes:
  • an image block including at least one object from a current frame in the video acquires a probability map block of the at least one object corresponding object category from an object class probability map of the adjacent frame of the current frame; and according to the adjacent frame and the current frame.
  • the optical flow map obtains the optical flow block corresponding to the at least one object.
  • the operation 402 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first acquisition module 602 being executed by the processor.
  • the operation 404 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first scaling unit 702 that is executed by the processor.
  • the operation 406 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first neural network 704 and a second neural network 708, respectively, that are executed by the processor.
  • the operation 408 may be performed by a processor invoking a corresponding instruction stored in a memory or by a computing unit 710 being executed by the processor.
  • Step 410 Restore, by the second scaling unit in the network, the object segmentation result of the at least one object in the preset size to the object segmentation result in the original size according to the magnification ratio of the image block, the probability map block, and the optical flow block.
  • the operation 408 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second scaling unit 706 executed by the processor.
  • the image segment obtained by zooming from an object extracted from the current frame and the optical flow map to a preset size, and the object extracted from the adjacent frame are enlarged to a probability size block of a preset size to obtain an object segmentation result of the current frame. It can effectively capture small-sized objects and detailed information in the image, and more accurately obtain the object segmentation result of the current frame, thereby realizing the inter-frame transfer of the accurate object segmentation result, improving the object size in the frame, and the size change is large. The resulting result of the failure of the object segmentation result is improved, and the accuracy of the segmentation result of the video object is improved.
  • FIG. 6 is a diagram showing an example of inter-frame transfer of an object segmentation result in the embodiment of the present application. As shown in FIG. 6, a process of transmitting an object segmentation result of an adjacent frame (previous frame) to a current frame (a subsequent frame) through a delivery network is shown in an embodiment of a video object segmentation method of the present application.
  • the method further includes:
  • the above-mentioned delivery network is trained based on the sample video, and each frame in the sample video is marked with a labeled probability map.
  • an iterative training method or a gradient update method may be adopted, and the above-mentioned delivery network is trained and adjusted based on the labeled probability map of the sample video and the probability map of the transmission network output.
  • the parameter value of each network parameter in the network may be adopted, and the above-mentioned delivery network is trained and adjusted based on the labeled probability map of the sample video and the probability map of the transmission network output.
  • the training when the training network is trained based on the labeled probability map of the sample video and the probability map of the transmission network output, the training is completed when the preset condition is met, and the preset condition may be, for example, the number of trainings is reached.
  • the preset number of thresholds, or the difference between the probability map of the delivery network for the sample video output and the annotation probability map of the sample image, satisfies a preset difference.
  • the gradient update method when the training network is trained based on the labeled probability map of the sample video and the probability map of the delivery network output, the difference between the probability map of the delivery network for the sample video output and the annotation probability map of the sample image can be obtained.
  • the gradient update method is used to adjust the parameter values of each network parameter in the delivery network, so that the difference between the probability map of the delivery network for the sample video output and the annotation probability map of the sample image is minimized.
  • the operation of training the delivery network based on the sample video may include:
  • the delivery network is trained based on the sample video in response to the completion of the first neural network and the second neural network training.
  • an iterative training method or a gradient update method may be employed, based on the labeled video of the sample video and the probability map of the network to be trained (the first neural network, the second neural network, and/or the delivery network), for each training to be trained.
  • the network performs training to adjust the parameter values of each network parameter in each network to be used, and details are not described herein again.
  • the methods of training the first neural network, the second neural network, and the delivery network may be the same or different.
  • the first neural network and the second neural network may be trained using an iterative training method, and the delivery network is trained using a gradient update method.
  • the first neural network and the second neural network are separately trained independently, and after the first neural network and the second neural network are trained, the entire delivery network including the first neural network and the second neural network is performed. Training can help improve the network training results of the delivery network and improve the efficiency of network training.
  • any of the video object segmentation methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • any video object segmentation method provided by the embodiment of the present application may be executed by a processor, such as the processor, by executing a corresponding instruction stored in the memory, to perform any of the video object segmentation methods mentioned in the embodiments of the present application. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 7 is a schematic structural diagram of an embodiment of a video object segmentation apparatus according to the present application.
  • the video object segmentation apparatus of this embodiment can be used to implement any of the video object segmentation method embodiments shown in the above FIGS. 1-3 of the present application.
  • the video object segmentation apparatus of this embodiment includes a delivery network 502 and an object re-recognition network 504. among them:
  • the delivery network 502 is configured to perform inter-frame transfer of the object segmentation result of the reference frame sequentially from the reference frame in at least part of the frame of the video, obtain an object segmentation result of at least one other frame in the at least part of the frame; and The object segmentation result after the target frame update obtained by the recognition network 504 is sequentially transmitted to at least one other frame in the video.
  • the object segmentation result of the reference frame can be obtained and input to the delivery network 502 in advance, for example, by a manual segmentation or an object segmentation network.
  • the object segmentation result may be represented as a probability map of the object.
  • the object segmentation result of each frame may be represented as a probability map, and the value of each pixel in the probability map indicates the object class of the object in the frame corresponding to the pixel.
  • each frame may be represented as a plurality of probability maps, each probability map respectively representing a probability map of an object category in the frame, and in each probability map, the object category of the object in the frame corresponding to each pixel is The value of the object represented by the probability map may be 1; otherwise, the object type of the object in the frame corresponding to each pixel is not the object type represented by the probability map, and the value of the pixel may be 0.
  • the reference frame may be the first frame of the at least partial frame.
  • the delivery network 502 is configured to inter-frame the object segmentation result of the first frame in the positive timing direction in at least part of the frame until the last frame in at least part of the frame.
  • the reference frame may be the last frame of the at least partial frames.
  • the delivery network 502 is configured to inter-frame the result of the object segmentation of the last frame in at least part of the frame in the reverse direction of the sequence until the first frame in at least part of the frame.
  • the reference frame may be an intermediate frame between the first frame and the last frame in the at least part of the frames.
  • the delivery network 502 is configured to transmit the object segmentation result of the intermediate frame in the timing positive direction in at least part of the frame until the last frame in at least part of the frame; and/or the object in the middle frame The segmentation result is inter-framed in at least part of the frame in the reverse direction of the sequence until at least the first frame in the partial frame.
  • the object re-recognition network 504 is configured to determine another frame of the object that is lost in the object segmentation result in the at least part of the frame, and determine the other frame as the target frame to perform segmentation of the lost object to update the object segmentation result of the target frame. .
  • the object segmentation result of the reference frame may be transmitted to other frames in at least part of the frame of the video, so that the video object segmentation result is more sequential in time series; and the segmentation of the lost object is performed on the target frame of the lost object in the delivery. Updating the object segmentation result of the target frame, and sequentially transmitting the updated object segmentation result to at least one other frame in the video, and correcting the object segmentation result of the other frames delivered to improve the occlusion
  • the object segmentation result fails to be transmitted due to a large change in the attitude of the object, and the plurality of object motions overlap and then separate, the object segmentation result may confuse or lose part of the object, and the accuracy of the video object segmentation result is improved.
  • the delivery network 502 performs inter-frame transfer of the object segmentation result of the reference frame from the reference frame sequence to obtain an object segmentation result of at least one other frame in at least part of the frame. And determining, according to the object segmentation result of the previous frame in the propagation direction of the object segmentation result along the reference frame, determining the object segmentation result in the subsequent frame in the propagation direction, wherein the propagation direction includes the timing positive direction and/or timing of the video.
  • the preceding frame includes: the adjacent frame or the adjacent key frame in the reverse direction of the timing or the reverse direction of the timing in the at least part of the frame in the at least part of the frame.
  • FIG. 8 is a schematic structural diagram of an embodiment of a delivery network in an embodiment of the present application. As shown in FIG. 8, compared with the embodiment shown in FIG. 7, in this embodiment, the delivery network 502 includes:
  • the first obtaining module 602 is configured to acquire an image block including at least one object from a subsequent frame, and acquire a probability map block of the object corresponding object category from the object class probability map of the previous frame.
  • the image block may be larger than the object candidate frame of the at least one object and smaller than the image size of the subsequent frame, so that more context information can be extracted from the image block, which is more accurate. Get the probability map of the object.
  • the determining module 604 is configured to determine an object segmentation result of the at least one object in the subsequent frame according to at least the image block and the probability map block.
  • FIG. 9 is a schematic structural diagram of another embodiment of a delivery network in an embodiment of the present application.
  • the determining module 604 can include:
  • the first scaling unit 702 is configured to respectively enlarge the image block and the probability map block acquired by the first acquiring module 602 to a preset size.
  • the first neural network 704 is configured to determine, according to the separately enlarged image block and the probability map block, an object segmentation result of the at least one object in a subsequent frame in a preset size.
  • the second scaling unit 706 is configured to restore the object segmentation result of the at least one object at the preset size to the object segmentation result in the original size according to the enlargement ratio of the image block and the probability map block by the first scaling unit 702.
  • the first obtaining module 602 is further configured to acquire an optical flow tile corresponding to the at least one object according to the optical flow graph between the previous frame and the subsequent frame.
  • the determining module 604 is configured to: obtain an object segmentation result of at least one object in the subsequent frame according to the image block, the probability map block, and the optical flow block acquired by the first acquiring module 602.
  • FIG. 10 is a schematic structural diagram of another embodiment of a delivery network in an embodiment of the present application.
  • the determining module 604 includes:
  • a first neural network 704 configured to acquire, according to the image block and the probability map block acquired by the first obtaining module 602, a first object segmentation result of the at least one object in the subsequent frame;
  • a second neural network 708, configured to acquire, according to the probability map block and the optical flow block acquired by the first obtaining module 602, a second object segmentation result of the at least one object in the subsequent frame;
  • the calculating unit 710 is configured to obtain an object segmentation result of the at least one object in the subsequent frame according to the first object segmentation result and the second object segmentation result.
  • FIG. 11 is a schematic structural diagram of another embodiment of a delivery network in an embodiment of the present application. As shown in FIG. 11, in comparison with the embodiment shown in FIG. 8, in the delivery network 502 of this embodiment, the determining module 604 includes:
  • the first scaling unit 702 is configured to respectively enlarge the image block, the probability map block, and the optical flow block acquired by the first acquiring module 602 to a preset size;
  • the obtaining unit 712 is configured to obtain, according to the separately enlarged image block, the probability map block, and the optical flow block, an object segmentation result of the at least one object in a preset frame in a preset frame;
  • the second scaling unit 706 is configured to restore the object segmentation result of the at least one object in the preset size to the object in the original size according to the enlargement ratio of the image block, the probability map block, and the optical flow block by the first scaling unit 702. Segment the result.
  • the obtaining unit 712 may include:
  • a first neural network 704 configured to acquire, according to the separately enlarged image block and the probability map block, a third object segmentation result of the at least one object in the subsequent frame;
  • a second neural network 708, configured to acquire, according to the separately amplified probability map block and the optical flow block, a fourth object segmentation result of the at least one object in the subsequent frame;
  • the calculating unit 710 is configured to determine, according to the third object segmentation result and the fourth object segmentation result, an object segmentation result of the at least one object in a preset frame in a subsequent frame.
  • the object re-recognition network 504 determines, when at least a portion of the frames of the reference frame are missing other frames of the object relative to the reference frame, for:
  • the object re-identification network 504 is configured to: respectively detect at least the object detection frame set An object candidate frame performs feature extraction; matching the feature of the at least one object candidate frame included in the object detection frame set with the feature of the object candidate frame corresponding to the object segmentation result in the reference frame.
  • the object re-recognition network 504 determines, according to the matching result, whether the current frame is another frame of the lost object of the object segmentation result relative to the reference frame, and is configured to: according to the matching result, determine at least one object candidate frame included in the object detection frame set and Whether there is an object candidate frame whose feature similarity between the features is higher than a preset threshold and the object class corresponding to the object segmentation result is inconsistent according to the object segmentation result in the reference frame; if there is similarity between the features If the degree is higher than the preset threshold, and the object candidate frame whose object classification is inconsistent according to the object segmentation result is determined, the current frame is determined to be another frame of the lost object of the object segmentation result of the reference frame; otherwise, the object whose current frame is not the relative reference frame is determined. The segmentation results in other frames of the lost object.
  • the object re-recognition network 504 determines, when at least a portion of the frames of the object relative to the reference frame are missing other frames of the object, for:
  • the other frames of the object that are missing from the object segmentation result of the relative reference frame in at least part of the frame are determined according to the matching result.
  • the object re-identification network 504 is configured to: respectively detect at least the object detection frame set An object candidate frame performs feature extraction; matching the feature of the at least one object candidate frame included in the object detection frame set with the feature of the object candidate frame corresponding to the object segmentation result in the reference frame.
  • the object re-recognition network 504 determines, according to the matching result, that at least one object candidate frame included in the object detection frame set is acquired according to the matching result, when the object segmentation result of the object frame segmentation result is lost in at least part of the frame.
  • the similarity between the features is higher than the preset threshold, and the object candidate frame whose object class is inconsistent according to the object segmentation result; the similarity between the acquired features is higher than
  • the frame corresponding to the object candidate frame in the object detection frame set in which the preset threshold value and the object type corresponding to the object segmentation result are inconsistent is another frame of the object that is lost due to the object segmentation result of the reference frame.
  • the method may be: if at least part of the frame is different from the frame of the reference frame, and the other frames of the object are missing, the One of the other frames of the object that is lost relative to the reference frame is selected as the target frame.
  • the delivery network 502 is configured to: when the object segmentation result after the target frame is updated is sequentially transmitted to at least one other frame in the video, for:
  • the object segmentation result after the target frame is updated is sequentially transmitted to at least one other frame in the continuous frame.
  • the at least one other frame includes: a first frame in consecutive frames.
  • the delivery network 502 is configured to sequentially transmit the object segmentation result after the target frame is updated in the sequential direction to the last in the continuous frame in consecutive frames.
  • the at least one other frame includes: a last frame in the consecutive frames.
  • the delivery network 502 is configured to: sequentially transmit the object segmentation result after the target frame is updated in the sequential reverse direction to the first one of the consecutive frames in consecutive frames. Frame; or
  • the at least one other frame includes: an intermediate frame between the first frame and the last frame in the consecutive frames, and correspondingly, the delivery network 502 is configured to: the object segmentation result after the target frame is updated in the sequential direction in the sequential direction
  • the sequence is sequentially transmitted to the last frame in the continuous frame in the positive direction of the sequence; and/or, the object segmentation result after the target frame is updated is sequentially transmitted in the sequential frame in the reverse direction of the sequence to the first frame in the continuous frame.
  • FIG. 8 Another embodiment of the present application provides a video object segmentation apparatus.
  • FIG. 8 As another embodiment of another video object segmentation device, reference may be made to the structure shown in FIG. 8, which includes a first acquisition module 602 and a determination module 604. among them:
  • the first obtaining module 602 is configured to: acquire an image block including at least one object from a current frame in the video; and acquire a probability map block of the at least one object corresponding object category from an object class probability map of the adjacent frame of the current frame.
  • the adjacent frame of the current frame includes: an adjacent frame or an adjacent key frame in the video in the positive direction of the timing or in the reverse direction of the timing.
  • the image block may be larger than the object candidate frame of the at least one object and smaller than the image size of the subsequent frame, so that more context information may be extracted from the image block, which may help more accurately acquire the probability of the at least one object. Map.
  • the object segmentation result of the object in the current frame is determined based on the image block including the object in the current frame and the probability map block of the object corresponding object category in the object class probability map of the adjacent frame, and the image can be effectively captured in the image.
  • the small size object and detail information reduce the interference of background noise in the image, thereby improving the failure of the object segmentation result transmission caused by the small size and large size change of any object in the frame, and improving the segmentation result of the video object. Accuracy.
  • the determining module 604 is configured to determine an object segmentation result of the object in the current frame according to at least the image block and the probability map block.
  • the determining module 604 can include:
  • the first scaling unit 702 is configured to respectively enlarge the image block and the probability map block acquired by the first acquiring module 602 to a preset size.
  • the first neural network 704 is configured to obtain, according to the separately enlarged image block and the probability map block, an object segmentation result of the at least one object in the current frame at a preset size.
  • the second scaling unit 706 is configured to restore the object segmentation result of the at least one object at the preset size to the object segmentation result in the original size according to the enlargement ratio of the image block and the probability map block by the first scaling unit 702.
  • the image segment obtained by zooming from an object extracted from the current frame and the optical flow map to a preset size, and the object extracted from the adjacent frame are enlarged to a probability size block of a preset size to obtain an object segmentation result of the current frame. It can effectively capture small-sized objects and detailed information in the image, and more accurately obtain the object segmentation result of the current frame, thereby realizing the inter-frame transfer of the accurate object segmentation result, reducing the small object size and large size change in the frame. The resulting possibility of the failure of the object segmentation result is improved, and the accuracy of the segmentation result of the video object is improved.
  • the first obtaining module 602 is further configured to acquire an optical flow tile corresponding to the at least one object according to the optical flow diagram between the adjacent frame and the current frame.
  • the determining module 604 is configured to: obtain an object segmentation result of the at least one object in the current frame according to the image block, the probability map block, and the optical flow block acquired by the first acquiring module 602.
  • the at least one object extracted from the current frame and the optical flow map is enlarged to a preset size image block, and the at least one object extracted from the adjacent frame is enlarged to a preset size probability map block to obtain a current frame.
  • the object segmentation result can effectively capture small-sized objects and detailed information in the image, and more accurately obtain the object segmentation result of the current frame, thereby realizing the inter-frame transfer of the accurate object segmentation result, and reducing the object size and size change in the frame.
  • the possibility of failure of the object segmentation result caused by a large cause increases the accuracy of the segmentation result of the video object.
  • the determining module 604 can include:
  • a first neural network 704 configured to acquire, according to the image block and the probability map block acquired by the first acquiring module 602, a first object segmentation result of the object in the current frame;
  • a second neural network 708, configured to acquire a second object segmentation result of the object in the current frame according to the probability map block and the optical flow block acquired by the first acquiring module 602;
  • the calculating unit 710 is configured to obtain an object segmentation result of the object in the current frame according to the first object segmentation result and the second object segmentation result.
  • the determining module 604 can include:
  • the first scaling unit 702 is configured to respectively enlarge the image block, the probability map block, and the optical flow block acquired by the first acquiring module 602 to a preset size.
  • the obtaining unit 712 is configured to obtain, according to the separately enlarged image block, the probability map block, and the optical flow block, an object segmentation result of the at least one object in the current frame at a preset size.
  • the second scaling unit 706 is configured to restore the object segmentation result of the at least one object in the preset size to the object in the original size according to the enlargement ratio of the image block, the probability map block, and the optical flow block by the first scaling unit 702. Segment the result.
  • the obtaining unit 712 can include:
  • the first neural network 704 is configured to obtain a third object segmentation result of the at least one object in the current frame according to the separately enlarged image block and the probability map block.
  • the second neural network 706 is configured to obtain a fourth object segmentation result of at least one object in the current frame according to the separately amplified probability block and the optical flow block.
  • the calculating unit 710 is configured to determine, according to the third object segmentation result and the fourth object segmentation result, an object segmentation result of at least one object in the current frame at a preset size.
  • the embodiment of the present application further provides an electronic device, including the video object segmentation apparatus of any of the above embodiments of the present application.
  • a memory for storing executable instructions
  • a processor for communicating with a memory to execute executable instructions to perform the operations of the video object segmentation method of any of the above-described embodiments of the present application.
  • the embodiment of the present application further provides a computer storage medium for storing computer readable instructions, and when the instructions are executed, the operation of the video object segmentation method of any of the above embodiments of the present application is implemented.
  • the embodiment of the present application further provides a computer program, including computer readable instructions, when the computer readable instructions are run in the device, the processor in the device executes to implement any of the foregoing implementations of the present application.
  • the video object segmentation method of the example is a computer program, including computer readable instructions, when the computer readable instructions are run in the device, the processor in the device executes to implement any of the foregoing implementations of the present application.
  • FIG. 12 is a schematic structural diagram of an application embodiment of an electronic device according to the present application.
  • the electronic device includes one or more processors, a communication unit, etc., such as one or more central processing units (CPUs) 901, and/or one or more An image processor (GPU) 913 or the like, the processor may execute at least one according to executable instructions stored in read only memory (ROM) 902 or executable instructions loaded from random access memory (RAM) 903 from storage portion 908. Proper action and processing.
  • processors such as one or more central processing units (CPUs) 901, and/or one or more An image processor (GPU) 913 or the like
  • the processor may execute at least one according to executable instructions stored in read only memory (ROM) 902 or executable instructions loaded from random access memory (RAM) 903 from storage portion 908. Proper action and processing.
  • ROM read only memory
  • RAM random access memory
  • Communication portion 912 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, and the processor can communicate with read only memory 902 and/or random access memory 903 to execute executable instructions over bus 904.
  • the operation is performed by the communication unit 912, and communicates with other target devices via the communication unit 912, thereby performing operations corresponding to any method provided by the embodiment of the present application. For example, in at least part of the frame of the video, the sequence is sequentially started from the reference frame.
  • an inter-frame transfer result of the object segmentation result of the reference frame obtaining an object segmentation result of at least one other frame in the at least part of the frame; determining another frame in the at least part of the frame that is lost from the object segmentation result of the reference frame; The determined other frames are used as the target frame to perform segmentation of the lost object to update the object segmentation result of the target frame; and the object segmentation result after the target frame is updated is sequentially transmitted to at least one other frame in the video.
  • acquiring an image block including at least one object from a current frame in the video acquiring a probability map block of the at least one object corresponding object category from an object class probability map of the adjacent frame of the current frame; at least according to the image And a block and the probability map block, determining an object segmentation result of the at least one object in the current frame.
  • the RAM 903 at least one program and data required for the operation of the device may be stored.
  • the CPU 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904.
  • ROM 902 is an optional module.
  • the RAM 903 stores executable instructions, or writes executable instructions to the ROM 902 at runtime, the executable instructions causing the processor 901 to perform operations corresponding to any of the methods described above.
  • An input/output (I/O) interface 905 is also coupled to bus 904.
  • the communication unit 912 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, etc.; an output portion 907 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 908 including a hard disk or the like. And a communication portion 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the Internet.
  • Driver 99 is also connected to I/O interface 905 as needed.
  • a removable medium 99 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 99 as needed so that a computer program read therefrom is installed into the storage portion 908 as needed.
  • FIG. 12 is only an optional implementation manner.
  • the number and types of components in the foregoing FIG. 12 may be selected, deleted, added, or replaced according to actual needs;
  • Functional components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication can be separated, or integrated on the CPU or GPU, etc. Wait.
  • an embodiment of the present application includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing an instruction corresponding to the method step provided by the embodiment of the present application, for example, performing, in an at least part of a frame of the video, inter-frame transmission of the object segmentation result of the reference frame from the reference frame, and obtaining at least part of the at least part of the frame An instruction of an object segmentation result of another frame; determining, in the at least part of the frame, an instruction to lose another object of the object relative to the object segmentation result of the reference frame; determining the other frame as the target frame to perform segmentation of the lost object to update An instruction of an object segmentation result of the target frame; an instruction to sequentially transmit the updated object segmentation result of the target
  • an instruction to acquire an image block including at least one object from a current frame in the video an instruction to acquire a probability map block of the at least one object corresponding object category from an object class probability map of the adjacent frame of the current frame; Determining an instruction of an object segmentation result of the at least one object in the current frame according to the image block and the probability map block.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • any one of the embodiments in the present specification is described in a progressive manner, and each embodiment is mainly described as being different from the other embodiments, and the same or similar parts between the respective embodiments may be referred to each other.
  • the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • the methods and apparatus of the present application may be implemented in a number of ways.
  • the methods and apparatus of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present application are not limited to the order described above unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例公开了视频物体分割方法和装置、电子设备、存储介质和程序,其中一种方法包括:在视频的至少部分帧中,自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一个其他帧的物体分割结果;确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧;以确定的其他帧作为目标帧进行丢失物体的分割,以更新所述目标帧的物体分割结果;将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧。本申请实施例提高了视频物体分割结果的准确率。

Description

视频物体分割方法和装置、电子设备、存储介质和程序
本申请要求在2017年07月26日提交中国专利局、申请号为CN 201710619408.0、申请名称为“视频物体分割方法和装置、电子设备、存储介质和程序”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术,尤其是一种视频物体分割方法和装置、电子设备、存储介质和程序。
背景技术
由于深度卷积神经网络具备强大的学习能力,并且具有大量标注好的数据供其学习,近年来,深度卷积神经网络在许多计算机视觉任务中已经取得了巨大的成功。
在计算机视觉领域中,视频中的物体分割,是指将视频中帧中像素按照不同物体进行分组(Grouping)/分割(Segmentation),从而将帧细分为多个图像子区域(像素的集合)的过程。视频中的物体分割,在智能视频分析、安防监控、自动驾驶等很多领域均有重要应用。
发明内容
本申请实施例提供一种进行视频物体分割的技术方案。
根据本申请实施例的一个方面,提供的一种视频物体分割方法,包括:
在视频的至少部分帧中,自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一个其他帧的物体分割结果;
确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧;
以确定的其他帧作为目标帧进行丢失物体的分割,以更新所述目标帧的物体分割结果;
将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧。
可选地,在本申请任一方法实施例中,所述参考帧包括:所述至少部分帧中的第一帧;所述自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,包括:将所述第一帧的物体分割结果在所述至少部分帧中沿时序正方向进行帧间传递,直至所述至少部分帧中的最后一帧;或者,
所述参考帧包括:所述至少部分帧中的最后一帧;所述自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,包括:将所述最后一帧的物体分割结果在所述至少部分帧中沿时序反方向进行帧间传递,直至所述至少部分帧中的第一帧;或者,
所述参考帧包括:所述至少部分帧中位于第一帧与最后一帧之间的中间一帧;所述自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,包括:将所述中间一帧的物体分割结果在所述至少部分帧中沿时序正方向进行帧间传递,直至所述至少部分帧中的最后一帧;和/或,将所述中间一帧的物体分割结果在所述至少部分帧中沿时序反方向进行帧间传递,直至所述至少部分帧中的第一帧。
可选地,在本申请上述任一方法实施例中,所述自参考帧顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一其他帧的物体分割结果,包括:
根据沿所述参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定所述传播方向上在后帧的物体分割结果,所述传播方向包括所述视频的时序正方向和/或时序反方向。
可选地,在本申请上述任一方法实施例中,所述在先帧包括:所述在后帧在所述至少部分帧中沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
可选地,在本申请上述任一方法实施例中,根据沿所述参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定所述传播方向上在后帧的物体分割结果,包括:
从所述在后帧获取包括至少一物体的图像块;从所述在先帧的物体类别概率图谱获取所述至少一物体分別对应的物体类别的概率图谱块;
至少根据所述图像块和所述概率图谱块,确定所述在后帧中所述至少一物体的物体分割结果。
可选地,在本申请上述任一方法实施例中,至少根据所述图像块和所述概率图谱块,确定所述在后帧中所述至少一物体的物体分割结果,包括:
分别将所述图像块和所述概率图谱块放大至预设尺寸;
根据分别放大后的所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果;
根据所述图像块和所述概率图谱块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
可选地,在本申请上述任一方法实施例中,还包括:根据所述在先帧与所述在后帧之间的光流图获取所述至少一物体对应的光流图块;
所述至少根据所述图像块和所述概率图谱块,确定所述在后帧中所述至少一物体的物体分割结果,包括:根据所述图像块、所述概率图谱块和所述光流图块,确定所述在后帧中所述至少一物体的物体分割结果。
可选地,在本申请上述任一方法实施例中,根据所述图像块、所述概率图谱块和所述光流图块,确定所述在后帧中所述至少一物体的物体分割结果,包括:
根据所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体的第一物体分割结果;以及根据所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体的第二物体分割结果;
根据所述第一物体分割结果和所述第二物体分割结果,获取所述在后帧中所述至少一物体的物体分割结果。
可选地,在本申请上述任一方法实施例中,根据所述图像块、所述概率图谱块和所述光流图块,确定所述在后帧中所述至少一物体的物体分割结果,包括:
分别将所述图像块、所述概率图谱块和所述光流图块放大至预设尺寸;
根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果;
根据所述图像块、所述概率图谱块和所述光流图块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
可选地,在本申请上述任一方法实施例中,所述根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果,包括:
根据分别放大后的所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体的第三物体分割结果;以及根据分别放大后的所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体的第四物体分割结果;
根据所述第三物体分割结果和所述第四物体分割结果,确定所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果。
可选地,在本申请上述任一方法实施例中,所述图像块大于所述物体的物体候选框且小于所述在后帧的图像大小。
可选地,在本申请上述任一方法实施例中,所述确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧,包括:
以所述至少部分帧中的帧任一其他帧作为当前帧,对所述当前帧进行物体检测,获得所述当前帧的物体候选框集;
将所述当前帧的物体检测框集包括的至少一个物体候选框分别与所述参考帧的物体分割结果对应的物体后续框进行匹配;
根据匹配结果确定所述当前帧是否是相对所述参考帧的物体分割结果丢失物体的其他帧。
可选地,在本申请上述任一方法实施例中,将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配,包括:分别对所述物体检测框集包括的至少一个物体候选框进行特征提取;将所述物体检测框集包括的至少一个物体候选框的特征,与所述参考帧中的物体分割结果对应的物体候选框的特征进行匹配;
所述根据匹配结果确定所述当前帧是否是相对所述参考帧的物体分割结果丢失物体的其他帧,包括:根据匹配结果,确定所述物体检测框集包括的至少一个物体候选框与所述参考帧中的物体分割结果对应的物体候选框中,是否存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;若存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框,确定所述当前帧是相对所述参考帧的物体分割结果丢失物体的其他帧;否则,确定所述当前帧不是相对所述参考帧的物体分割结果丢失物体的其他帧。
可选地,在本申请上述任一方法实施例中,所述确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧,包括:
分别对所述至少部分帧中的至少一个其他帧进行物体检测,得到物体候选框集;
将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配;
根据匹配结果确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧。
可选地,在本申请上述任一方法实施例中,将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配,包括:分别对所述物体检测框集包括的至少一个物体候选框进行特征提取;将所述物体检测框集包括的至少一个物体候选框的特征,与所述参考帧中的物体分割结果对应的物体候选框的特征进行匹配;
所述根据匹配结果确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧,包括:根据匹配结果,获取所述物体检测框集包括的至少一个物体候选框与所述参考帧中的物体分割结果对应的物体候选框中,特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;获取特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的所述物体检测框集中的物体候选框对应的帧为相对所述参考帧的物体分割结果丢失物体的其他帧。
可选地,在本申请上述任一方法实施例中,所述以确定的其他帧作为目标帧,包括:
若所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧包括多个,从相对所述参考帧的物体分割结果丢失物体的其他帧中选取一个其他帧作为目标帧。
可选地,在本申请上述任一方法实施例中,将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧,包括:
获取所述至少部分帧中丢失所述丢失物体的连续帧;
将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧。
可选地,在本申请上述任一方法实施例中,所述至少一其他帧包括:所述连续帧中的第一帧;将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧,包括:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序正方向顺序传递到所述连续帧中的最后一帧;或者
所述至少一其他帧包括:所述连续帧中的最后一帧;将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧,包括:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序反方向顺序传递到所述连续帧中的第一帧;或者
所述至少一其他帧包括:所述连续帧中位于第一帧和最后一帧之间的中间帧;将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧,包括:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序正方向沿时序正方向顺序传递到所述连续帧中的最后一帧;和/或,将所述目标帧更新后的物体分割结果在所述连续帧中沿时序反方向顺序传递到所述连续帧中的第一帧。
可选地,在本申请上述任一方法实施例中,针对同一所述丢失物体,本次将所述目标帧更新后的物体分割结果传递到的其他帧,与之前将目标帧更新后的物体分割结果传递到的其他帧的范围不重叠。
根据本申请实施例的另一个方面,提供的另一种视频物体分割方法,包括:
从视频中的当前帧获取包括至少一物体的图像块;从所述当前帧的邻近帧的物体类别概率图谱获取所述至少一物体对应物体类别的概率图谱块;
至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果。
可选地,在本申请上述另一种视频物体分割方法的任一实施例中,所述至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果,包括:
分别将所述图像块和所述概率图谱块放大至预设尺寸;
根据分别放大后的所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果;
根据所述图像块和所述概率图谱块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
可选地,在本申请上述另一种视频物体分割方法的任一实施例中,还包括:根据所述邻近帧与所述当前帧之间的光流图获取所述至少一物体对应的光流图块;
所述至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果,包括:根据所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体的物体分割结果。
可选地,在本申请上述另一种视频物体分割方法的任一实施例中,根据所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体的物体分割结果,包括:
根据所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体的第一物体分割结果;以及根据所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体的第二物体分割结果;
根据所述第一物体分割结果和所述第二物体分割结果,获取所述当前帧中所述至少一物体的物体分割结果。
可选地,在本申请上述另一种视频物体分割方法的任一实施例中,根据所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体的物体分割结果,包括:
分别将所述图像块、所述概率图谱块和所述光流图块放大至预设尺寸;
根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果;
根据所述图像块、所述概率图谱块和所述光流图块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
可选地,在本申请上述另一种视频物体分割方法的任一实施例中,所述根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果,包括:
根据分别放大后的所述图像块和所述概率图谱块,获取所述当前帧中所述物体的第三物体分割结果;以及根据分别放大后的所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体的第四物体分割结果;
根据所述第三物体分割结果和所述第四物体分割结果,确定所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果。
可选地,在本申请上述另一种视频物体分割方法的任一实施例中,所述当前帧的邻近帧包括:所述视频中所述当前帧沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
可选地,在本申请上述另一种视频物体分割方法的任一实施例中,所述图像块大于所述物体的物体候选框且小于所述在邻近帧的图像大小。
根据本申请实施例的又一个方面,提供的一种视频物体分割装置,包括:
所述传递网络,用于在视频的至少部分帧中,自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一个其他帧的物体分割结果;以及将物体再识别网络获得的目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧;
所述物体再识别网络,用于确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧;以确定的其他帧作为目标帧进行丢失物体的分割,以更新所述目标帧的物体分割结果。
可选地,在本申请上述任一装置实施例中,所述参考帧包括:所述至少部分帧中的第一帧;所述传递网络用于将所述第一帧的物体分割结果在所述至少部分帧中沿时序正方向进行帧间传递,直至所述至少部分帧中的最后一帧;或者,
所述参考帧包括:所述至少部分帧中的最后一帧;所述传递网络用于将所述最后一帧的物体分割结果在所述至少部分帧中沿时序反方向进行帧间传递,直至所述至少部分帧中的第一帧;或者,
所述参考帧包括:所述至少部分帧中位于第一帧与最后一帧之间的中间一帧;所述自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,包括:将所述中间一帧的物体分割结果在所述至少部分帧中沿时序正方向进行帧间传递,直至所述至少部分帧中的最后一帧;和/或,将所述中间一帧的物体分割结果在所述至少部分帧中沿时序反方向进行帧间传递,直至所述至少部分帧中的第一帧。
可选地,在本申请上述任一装置实施例中,所述传递网络自参考帧顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一个其他帧的物体分割结果时,用于:
根据沿所述参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定所述传播方向上在后帧的物体分割结果,所述传播方向包括所述视频的时序正方向和/或时序反方向;
所述在先帧包括:所述在后帧在所述至少部分帧中沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
可选地,在本申请上述任一装置实施例中,所述传递网络包括:
第一获取模块,用于从所述在后帧获取包括至少一物体的图像块;以及从所述在先帧的物体类别概率图谱获取所述至少一物体分别对应物体类别的概率图谱块;
确定模块,用于至少根据所述图像块和所述概率图谱块,确定所述在后帧中所述至少一物体的物体分割结果。
可选地,在本申请上述任一装置实施例中,所述确定模块包括:
第一缩放单元,用于分别将所述图像块和所述概率图谱块放大至预设尺寸;
第一神经网络,用于根据分别放大后的所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果;
第二缩放单元,用于根据所述图像块和所述概率图谱块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
可选地,在本申请上述任一装置实施例中,所述第一获取模块还用于根据所述在先帧与所述在后帧之间的光流图获取所述至少一物体对应的光流图块;
所述确定模块用于:根据所述图像块、所述概率图谱块和所述光流图块,确定所述在后帧中所述至少一物体的物体分割结果。
可选地,在本申请上述任一装置实施例中,所述确定模块包括:
第一神经网络,用于根据所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体的第一物体分割结果;
第二神经网络,用于根据所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体的第二物体分割结果;
计算单元,用于根据所述第一物体分割结果和所述第二物体分割结果,获取所述在后帧中所述至少一物体的物体分割结果。
可选地,在本申请上述任一装置实施例中,所述确定模块包括:
第一缩放单元,用于分别将所述图像块、所述概率图谱块和所述光流图块放大至预设尺寸;
获取单元,用于根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果;
第二缩放单元,用于根据所述图像块、所述概率图谱块和所述光流图块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
可选地,在本申请上述任一装置实施例中,所述获取单元包括:
第一神经网络,用于根据分别放大后的所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体的第三物体分割结果;
第二神经网络,用于根据分别放大后的所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体的第四物体分割结果;
计算单元,用于根据所述第三物体分割结果和所述第四物体分割结果,确定所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果。
可选地,在本申请上述任一装置实施例中,所述图像块大于所述物体的物体候选框且小于所述在后帧的图像大小。
可选地,在本申请上述任一装置实施例中,所述物体再识别网络确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧时,用于:
以所述至少部分帧中的帧任一其他帧作为当前帧,对所述当前帧进行物体检测,获得所述当前帧的物体候选框集;
将所述当前帧的物体检测框集包括的至少一个物体候选框分别与所述参考帧的物体分割结果对应的物体后续框进行匹配;
根据匹配结果确定所述当前帧是否是相对所述参考帧的物体分割结果丢失物体的其他帧。
可选地,在本申请上述任一装置实施例中,所述物体再识别网络将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配时,用于:分别对所述物体检测框集包括的至少一个物体候选框进行特征提取;将所述物体检测框集包括的至少一个物体候选框的特征,与所述参考帧中的物体分割结果对应的物体候选框的特征进行匹配;
所述物体再识别网络根据匹配结果确定所述当前帧是否是相对所述参考帧的物体分割结果丢失物体的其他帧时,用于:根据匹配结果,确定所述物体检测框集包括的至少一个物体候选框与所述参考帧中的物体分割结果对应的物体候选框中,是否存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;若存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框,确定所述当前帧是相对所述参考帧的物体分割结果丢失物体的其他帧;否则,确定所述当前帧不是相对所述参考帧的物体分割结果丢失物体的其他帧。
可选地,在本申请上述任一装置实施例中,所述物体再识别网络确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧时,用于:
分别对所述至少部分帧中的至少一个其他帧进行物体检测,得到物体候选框集;
将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配;
根据匹配结果确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧。
可选地,在本申请上述任一装置实施例中,所述物体再识别网络将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配时,用于:分别对所述物体检测框集包括的至少一个物体候选框进行特征提取;将所述物体检测框集包括的至少一个物体候选框的特征,与所述参考帧中的物体分割结果对应的物体候选框的特征进行匹配;
所述物体再识别网络根据匹配结果确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧时,用于:根据匹配结果,获取所述物体检测框集包括的至少一个物体候选框与所述参考帧中的物体分割结果对应的物体候选框中,特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;获取特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的所述物体检测框集中的物体候选框对应的帧为相对所述参考帧的物体分割结果丢失物体的其他帧。
可选地,在本申请上述任一装置实施例中,所述物体再识别网络以确定的其他帧作为目标帧时,用于:若所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧包括多个,从相对所述参考帧的物体分割结果丢失物体的其他帧中选取一个其他帧作为目标帧。
可选地,在本申请上述任一装置实施例中,所述传递网络将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧时,用于:
获取所述至少部分帧中丢失所述丢失物体的连续帧;
将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧。
可选地,在本申请上述任一装置实施例中,所述至少一其他帧包括:所述连续帧中的第一帧;所述传递网络用于将所述目标帧更新后的物体分割结果在所述连续帧中沿时序正方向顺序传递到所述连续帧中的最后一帧;或者
所述至少一其他帧包括:所述连续帧中的最后一帧;所述传递网络用于:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序反方向顺序传递到所述连续帧中的第一帧;或者
所述至少一其他帧包括:所述连续帧中位于第一帧和最后一帧之间的中间帧;所述传递网络用于:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序正方向沿时序正方向顺序传递到所述连续帧中的最后一帧;和/或,将所述目标帧更新后的物体分割结果在所述连续帧中沿时序反方向顺序传递到所述连续帧中的第一帧。
根据本申请实施例的再一个方面,提供的另一种视频物体分割装置,包括:
第一获取模块,用于从视频中的当前帧获取包括至少一物体的图像块;从所述当前帧的邻近帧的物体类别概率图谱获取所述至少一物体对应物体类别的概率图谱块;
确定模块,用于至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果。
可选地,在本申请上述另一种视频物体分割装置的任一实施例中,所述确定模块包括:
第一缩放单元,用于分别将所述图像块和所述概率图谱块放大至预设尺寸;
第一神经网络,用于根据分别放大后的所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果;
第二缩放单元,用于根据所述图像块和所述概率图谱块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
可选地,在本申请上述另一种视频物体分割装置的任一实施例中,所述第一获取模块还用于根据所述邻近帧与所述当前帧之间的光流图获取所述至少一物体对应的光流图块;
所述确定模块用于:根据所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体的物体分割结果。
可选地,在本申请上述另一种视频物体分割装置的任一实施例中,所述确定模块包括:
第一神经网络,用于根据所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体的第一物体分割结果;
第二神经网络,用于根据所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体的第二物体分割结果;
计算单元,用于根据所述第一物体分割结果和所述第二物体分割结果,获取所述当前帧中所述至少一物体的物体分割结果。
可选地,在本申请上述另一种视频物体分割装置的任一实施例中,所述确定模块包括:
第一缩放单元,用于分别将所述图像块、所述概率图谱块和所述光流图块放大至预设尺寸;
获取单元,用于根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果;
第二缩放单元,用于根据所述图像块、所述概率图谱块和所述光流图块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
可选地,在本申请上述另一种视频物体分割装置的任一实施例中,所述获取单元包括:
第一神经网络,用于根据分别放大后的所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体的第三物体分割结果;
第二神经网络,用于根据分别放大后的所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体的第四物体分割结果;
计算单元,用于根据所述第三物体分割结果和所述第四物体分割结果,确定所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果。
可选地,在本申请上述另一种视频物体分割装置的任一实施例中,所述当前帧的邻近帧包括:所述视频中所述当前帧沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
可选地,在本申请上述另一种视频物体分割装置的任一实施例中,所述图像块大于所述物体的物体候选框且小于所述邻近帧的图像大小。
根据本申请实施例的再一个方面,提供的一种电子设备,包括上述任一实施例所述的视频物体分割装置。
根据本申请实施例的再一个方面,提供的另一种电子设备,包括:
存储器,用于存储可执行指令;以及
处理器,用于与所述存储器通信以执行所述可执行指令从而完成本申请上述任一实施例所述方法的操作。
根据本申请实施例的再一个方面,提供的一种计算机存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请上述任一实施例所述方法的操作。
根据本申请实施例的再一个方面,提供的一种计算机程序,包括计算机可读取的指令,当所述计算机可读取的指令在设备中运行时,所述设备中的处理器执行用于实现本申请上述任一实施例所述方法中的步骤的可执行指令。
基于本申请实施例提供的一种视频物体分割方法和装置、电子设备、存储介质和程序,在视频的至少部分帧中,自参考帧开始顺序进行参考帧的物体分割结果的帧间传递,获得该至少部分帧中至少一其他帧的物体分割结果;确定该至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧,以确定的其他帧作为目标帧进行丢失物体的分割,以更新目标帧的物体分割结果;将目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧,以对该至少一其他帧的物体分割结果进行修正。基于本实施例,可以将参考帧的物体分割结果传递到视频的至少部分帧中的其他帧上,使得视频物体分割结果在时序上更加连续;对传递中丢失物体的目标帧进行丢失物体的分割、以更新该目标帧的物体分割结果,并 将该目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧,对传递到的其他帧的物体分割结果进行修正,可以改善因为遮挡和物体姿态大幅度变化造成的该物体分割结果传递失败的情况、以及多个物体运动重叠再分开后,物体分割结果中会混淆或丢失部分物体的情况,提高了视频物体分割结果的准确率。
基于本申请实施例提供的另一种视频物体分割方法和装置、电子设备、存储介质和程序,从视频中的当前帧获取包括至少一物体的图像块,从当前帧的邻近帧的物体类别概率图谱获取物体对应物体类别的概率图谱块;至少根据该图像块和概率图谱块,确定当前帧中物体的物体分割结果。本申请本实施例,基于当前帧中包括物体的图像块和邻近帧的物体类别概率图谱中该物体对应物体类别的概率图谱块,来确定当前帧中物体的物体分割结果,可以有效捕获图像中的小尺寸物体和细节信息、降低图像中背景噪声的干扰,从而改善帧中任一物体尺寸较小、尺寸变化较大等原因导致的物体分割结果传递失败的情况,提高了视频物体分割结果的准确率。
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1为本申请视频物体分割方法一个实施例的流程图。
图2为本申请视频物体分割方法另一个实施例的流程图。
图3为应用本申请视频物体分割方法实施例对视频中物体进行分割的一个过程示意图。
图4为本申请视频物体分割方法又一个实施例的流程图。
图5为本申请视频物体分割方法再一个实施例的流程图。
图6为本申请实施例中将物体分割结果进行帧间传递的一个示例图。
图7为本申请视频物体分割装置一个实施例的结构示意图。
图8为本申请实施例中传递网络一个实施例的结构示意图。
图9为本申请实施例中传递网络另一个实施例的结构示意图。
图10为本申请实施例中传递网络又一个实施例的结构示意图。
图11为本申请实施例中传递网络再一个实施例的结构示意图。
图12为本申请电子设备一个应用实施例的结构示意图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
本领域技术人员可以理解,本申请实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于终端设备、计算机***、服务器等电子设备,其可与众多其它通用或专用计算***环境或配置一起操作。适于与终端设备、计算机***、服务器等电子设备一起使用的众所周知的终端设备、计算***、环境和/或配置的例子包括但不限于:个人计算机***、服务器计算机***、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的***、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机***﹑大型计算机***和包括上述任何***的分布式云计算技术环境,等等。
终端设备、计算机***、服务器等电子设备可以在由计算机***执行的计算机***可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机***/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算***存储介质上。
图1为本申请视频物体分割方法一个实施例的流程图。如图1所示,该实施例的视频物体分割方法包括:
102,在视频的至少部分帧中,自参考帧开始顺序进行参考帧的物体分割结果的帧间传递,获得该至少部分帧中该参考帧外至少一其他帧的物体分割结果,例如获得该至少部分帧中该参考帧外各其他帧的物体分割结果。
其中,本申请任一实施例中的帧即帧图像。根据应用需求,上述至少部分帧可以是整个视频中的帧,也可以是视频中其中一段视频包括的帧,或者从视频中每隔至少一帧提取出来的帧的集合,均可应用本申请实施例进行视频物体分割。
在其中一个可选示例中,上述参考帧可以是上述至少部分帧中的第一帧。相应地,该操作102中,将该第一帧的物体分割结果在上述至少部分帧中沿时序正方向进行帧间传递,直至该至少部分帧中的最后一帧。
在另一个可选示例中,上述参考帧可以是上述至少部分帧中的最后一帧。相应地,该操作102中,将该最后一帧的物体分割结果在该至少部分帧中沿时序反方向进行帧间传递,直至该至少部分帧中的第一帧。
在又一个可选示例中,上述参考帧可以是上述至少部分帧中位于第一帧与最后一帧之间的中间一帧。相应地,该操作102中,将该中间一帧的物体分割结果在该至少部分帧中沿时序正方向进行帧间传递,直至该至少部分帧中的最后一帧;和/或,将该中间一帧的物体分割结果在该至少部分帧中沿时序反方向进行帧间传递,直至该至少部分帧中的第一帧。
在本申请任一实施例的一个可选示例中,物体分割结果可以表示为物体的概率图谱。在该示例中,每帧的物体分割结果可以表示为一个概率图谱,该概率图谱中各像素的取值表示该像素对应的帧中物体的物体类别。另外,每帧的物体分割结果也可以表示为多个概率图谱,每个概率图谱分别表示帧中一个物体类别的概率图谱,在每个概率图谱中,各像素对应的帧中物体的物体类别为该概率图谱表示的物体类别的,该像素点的取值可以为1;否则,各像素对应的帧中物体的物体类别不是该概率图谱表示的物体类别的,该像素点的取值可以为0。
在一个可选示例中,该操作102可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的传递网络502执行。
104,确定上述至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧。
在一个可选示例中,该操作104可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的物体再识别网络504执行。
106,以确定的其他帧作为目标帧进行丢失物体的分割,以更新该目标帧的物体分割结果。
其中,目标帧可以是上述至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧中的一帧或多帧。
在一个可选示例中,该操作106可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的物体再识别网络504执行。
108,将该目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧。
在一个可选示例中,该操作108可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的传递网络502执行。
在本申请实施例中,操作104-108可以执行一次,也可以是一个循环执行的操作执行多次,直至上述至少部分帧中相对参考帧的物体分割结果不存在丢失物体的其他帧。其中,操作102和108可以分别看作一个物体分割结果的传播过程,操作104和106可以看作一个物体再识别过程。即:本申请实施例中,操作104-108可以看作是物体分割结果的传播过程与物体再识别过程交替执行的循环过程。在循环过程中,可以将操作108中的目标帧作为参考帧,将目标帧更新后的物体分割结果作为参考帧的物体分割结果,在视频或其至少部分帧中进行帧间传递。
基于本实施例,可以将参考帧的物体分割结果传递到视频的至少部分帧中的其他帧上,使得视频物体分割结果在时序上更加连续;对传递中丢失物体的目标帧进行丢失物体的分割、以更新该目标帧的物体分割结果,并将该目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧,对传递到的其他帧的物体分割结果进行修正,可以改善因为遮挡和物体姿态大幅度变化造成的该物体分割结果传递失败的情况、以及多个物体运动重叠再分开后,物体分割结果中会混淆或丢失部分物体的情况,提高了视频物体分割结果的准确率。
在图1所示实施例的再一个可选示例中,操作102中,自参考帧顺序进行参考帧的物体分割结果的帧间传递,获得至少部分帧中至少一其他帧的物体分割结果,可以通过如下方式实现:
根据沿参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定传播方向上在后帧的物体分割结果,其中的传播方向包括视频的时序正方向和/或时序反方向。
本申请实施例中,在先帧、在后帧是相对于传播方向的顺序而言的,具有相对性。其中的传播方向可以是视频的时序正方向或时序反方向。在传播方向上顺序靠前的帧为在先帧,在传播方向上顺序靠后的帧为在后帧。例如,在先帧可以是:在后帧在上述至少部分帧中沿时序正方向或时序反方向上的相邻帧或相邻关键帧,其中的关键帧可以是在上述至少部分帧中沿时序正方向或时序反方向上,与在后帧之间间隔在预设帧数范围内的帧。当传播方向变化时,在先帧与在后帧相应变化。
进一步示例性地,根据沿参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定传播方向上在后帧的物体分割结果,可以利用一个传递网络,执行如下操作实现:
从在后帧获取包括至少一物体的图像块;从在先帧的物体类别概率图谱获取该至少一物体分别对应的物体类别的概率图谱块;
至少根据上述图像块和概率图谱块,确定在后帧中该至少一物体的物体分割结果。
在本申请实施例的进一步可选示例中,提取的包括物体的图像块的大小可以大于该物体的物体候选框且小于该在后帧的图像大小,以便后续可以从图像块中提取特征时可以提取到更多的上下文信息,有助于更准确的获取该物体的物体分割结果。
在其中一个可选示例中,至少根据图像块和概率图谱块,确定在后帧中物体的物体分割结果,可以利用一个传递网络,执行如下操作实现:
分别将上述图像块和概率图谱块放大至预设尺寸;
根据分别放大后的图像块和概率图谱块,获取在后帧中该至少一物体在预设尺寸下的物体分割结果;
根据图像块和概率图谱块的放大比例,将该至少一物体在预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果,即:将该至少一物体在预设尺寸下的物体分割结果进行与上述放大比例相对比例的缩小,获得该至少一物体的物体分割结果。
在另一个可选示例中,根据沿参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定传播方向上在后帧的物体分割结果,还可以包括:根据在先帧与在后帧之间的光流图获取该至少一物体对应的光流图块。其中,可以通过一个光流网络获取在先帧与在后帧之间的光流图。
相应地,该可选示例中,至少根据图像块和概率图谱块,确定在后帧中至少一物体的物体分割结果,可以通过一个传递网络,执行如下操作实现:根据上述图像块、概率图谱块和光流图块,获取在后帧中该至少一物体的物体分割结果。
其中,根据上述图像块、概率图谱块和光流图块,确定在后帧中至少一物体的物体分割结果,可以示例性地通过如下方式实现:
根据上述图像块和概率图谱块,获取在后帧中该至少一物体的第一物体分割结果,该操作可以通过传递网络中的第一神经网络实现;以及根据上述概率图谱块和光流图块,获取在后帧中该至少一物体的第二物体分割结果,该操作可以通过传递网络中的第二神经网络实现;
根据上述第一物体分割结果和第二物体分割结果,获取在后帧中该至少一物体的物体分割结果,该操作可以通过传递网络中的计算模块实现。例如,获取将上述第一物体分割结果和第二物体分割结果之和,作为在后帧中该至少一物体的物体分割结果;或者,获取上述第一物体分割结果和第二物体分割结果的平均值,作为在后帧中该至少一物体的物体分割结果。
另外,根据上述图像块、概率图谱块和光流图块,确定在后帧中至少一物体的物体分割结果,也可以通过传递网络执行如下操作实现:
分别将上述图像块、概率图谱块和光流图块放大至预设尺寸,该操作可以通过传递网络中的第一缩放单元实现;
根据分别放大后的图像块、概率图谱块和光流图块,获取在后帧中该至少一物体在预设尺寸下的物体分割结果;
根据上述图像块、概率图谱块和光流图块的放大比例,将至少一物体在预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果,即:将该至少一物体在预设尺寸下的物体分割结果进行与上述放大比例相对比例的缩小,获得该至少一物体的物体分割结果。该操作可以通过传递网络中的第二缩放单元实现。
其中,根据分别放大后的图像块、概率图谱块和光流图块,获取在后帧中该至少一物体在预设尺寸下的物体分割结果,可以示例性地通过如下方式实现:
例如利用传递网络中的第一神经网络,根据分别放大后的图像块和概率图谱块,获取在后帧中至少一物体的第三物体分割结果;以及例如利用传递网络中的第二神经网络,根据分别放大后的概率图谱块和光流图块,获取在后帧中至少一物体的第四物体分割结果;
例如利用传递网络中的计算模块,根据第三物体分割结果和第四物体分割结果,确定在后帧中至少一物体在预设尺寸下的物体分割结果。
深度残差网络具有提取较强判的别性特征的作用,在本申请任一方法实施例的其中一个示例中,上述第一神经网络和第二神经网络可以采用深度残差网络实现。
本发明人通过调查研究发现,深度残差网络通常有101个网络层,可以称为101层深度残差网络。另外,深度残差网络也可以有更多网络层,深度残差网络的网络层越多,输出结果的精度越高,但是需要的计算时间、占用的显存资源也越多,101层深度残差网络在输出结果精度和时间复杂度、空间复杂度上能达到一个较好的平衡点。常用的101层深度残差网络输出的概率图谱块为2048个通道,概率图谱块的尺寸为原图像大小的1/224,即:概率图谱块的尺寸为1*1。为提高概率图谱块的精度,本申请实施例中,可以采用更多网络层的深度残差网络第一卷积神经网络和第二卷积神经网络。另外,为了增大输出的概率图谱块的尺寸,更好的抓捕图像中的细节信息,可以101层深度残差网络做如下改进实现:降低101层深度残差网络中卷积层的卷积步长,并对卷积核进行膨胀操作以增大卷积核尺寸。
另外,在本申请上述各视频物体分割方法实施例中,操作104可以通过如下方式实现:
以上述至少部分帧中的任一其他帧作为当前帧,对该当前帧进行物体检测,获得当前帧的物体候选框集。其中,每个帧对应一个候选集,用于存放该帧中的所有物体候选框;
将当前帧的物体检测框集包括的至少一个物体候选框(例如各物体候选框)分别与参考帧的物体分割结果对应的物体后续框进行匹配;
根据匹配结果确定当前帧是否是相对参考帧的物体分割结果丢失物体的其他帧。
在其中一个可选示例中,将当前帧的物体检测框集包括的各物体候选框与参考帧的物体分割结果对应的物体候选框进行匹配,可以包括:分别对当前帧的物体检测框集包括的各物体候选框进行特征提取;将物体检测框集包括的各物体候选框的特征,与参考帧中的物体分割结果对应的物体候选框的特征进行匹配。
相应地,根据匹配结果确定当前帧是否是相对参考帧的物体分割结果丢失物体的其他帧,可以包括:根据匹配结果,确定当前帧的物体检测框集包括的至少一个物体候选框与参考帧中的物体分割结果对应的物体候选框中,是否存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;若存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框,确定当前帧是相对参考帧的物体分割结果丢失物体的其他帧;否则,确定当前帧不是相对参考帧的物体分割结果丢失物体的其他帧。
或者,在本申请上述各视频物体分割方法实施例中,操作104也可以通过如下方式实现:
分别对至少部分帧中的至少一个其他帧进行物体检测,得到物体候选框集;
将该物体检测框集包括的至少一个物体候选框(例如各物体候选框)与参考帧的物体分割结果对应的物体候选框进行匹配;
根据匹配结果确定至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧。
在其中一个可选示例中,将物体检测框集包括的至少一个物体候选框与参考帧的物体分割结果对应的物体候选框进行匹配,可以包括:分别对物体检测框集包括的至少一个物体候选框进行特征提取;将该物体检测框集包括的至少一个物体候选框的特征,与参考帧中的物体分割结果对应的物体候选框的特征进行匹配。
相应地,根据匹配结果确定至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧,可以包括:
根据匹配结果,获取该物体检测框集包括的至少一个物体候选框(例如各物体候选框)与参考帧中的物体分割结果对应的物体候选框中,特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;获取特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体检测框集中的物体候选框对应的帧为相对参考帧的物体分割结果丢失物体的其他帧。
则相应地,在操作106中,以确定的其他帧作为目标帧时,可以包括:
若该至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧包括多个,从相对参考帧的物体分割结果丢失物体的其他帧中,按照预设策略选取一个其他帧作为目标帧。
例如,从相对参考帧的物体分割结果丢失物体的其他帧中,随机选取一个其他帧作为目标帧;或者,从该物体检测框集包括的各物体候选框中,选取一个与参考帧中的物体分割结果对应的物体候选框的特征之间的相似度最高、且根据物体分割结果对应的物体类别不一致的物体候选框所在的其他帧为目标帧。
另外,在本申请上述各视频物体分割方法实施例中,操作108可以包括:
获取至少部分中丢失上述丢失物体的连续帧;
将目标帧更新后的物体分割结果顺序传递到该连续帧中的至少一其他帧。
示例性地,上述至少一其他帧可以是连续帧中的第一帧。相应地,将目标帧更新后的物体分割结果顺序传递到连续帧中的至少一其他帧时,可以将目标帧更新后的物体分割结果在连续帧中沿时序正方向顺序传递到连续帧中的最后一帧。
或者,上述至少一其他帧也可以是连续帧中的最后一帧。相应地,将目标帧更新后的物体分割结果顺序传递到连续帧中的至少一其他帧时,可以将目标帧更新后的物体分割结果在连续帧中沿时序反方向顺序传递到连续帧中的第一帧。
或者,上述至少一其他帧还可以是连续帧中位于第一帧和最后一帧之间的中间帧。相应地,将目标帧更新后的物体分割结果顺序传递到连续帧中的至少一其他帧时,可以将目标帧更新后的物体分割结果在连续帧中沿时序正方向顺序传递到连续帧中的最后一帧;和/或,将目标帧更新后的物体分割结果在连续帧中沿沿时序反方向顺序传递到连续帧中的第一帧。
另外,在本申请上述任一视频物体分割方法实施例的一个可选示例中,针对同一丢失物体,本次将目标帧更新后的物体分割结果传递到的其他帧,与之前将目标帧更新后的物体分割结果传递到的其他帧的范围不重叠。
在其中一个可选示例中,每次将该目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧时,可以在修正信息表中记录该丢失物体的物体信息与目标帧更新后的物体分割结果传递到的其他帧的帧序号。其中的物体信息可以是物体特征或者物体类别等。
在每次将该目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧时,可以查询修正信息表中是否包括丢失物体的物体信息;
若修正信息表中包括该丢失物体的物体信息,说明之前已经对该丢失物体的物体分割结果进行过修正,查询修正信息表中该丢失物体的物体信息对应的帧序号,获得之前基于该丢失物体的物体分割结果将目标帧更新后的物体分割结果传递到的视频中的其他帧,据此来确定本次将该目标帧更新后的物体分割结果顺序传递到的视频中的其他帧,以保障本次确定的其他帧与查询到的帧序号对应的其他帧不重复。
例如,之前对该丢失物体,将目标帧更新后的物体分割结果顺序传递到视频中的第21帧至第23帧,本次针对该丢失物体继续进行物体分割结果传递时,即使获取到视频中丢失该丢失物体的帧为第20帧至第27帧,由于上次基于该丢失物体已经对第21帧至第23帧的物体分割结果进行过修正,本次进行目标帧更新后的物体分割结果传递时,可以传递到视频中的第24帧至第27帧。
基于本实施例,可以避免针对同一丢失物体,在后一轮物体分割结果传递过程中对前一轮修正后的物体分割结果进行再次修正,从而导致本申请实施例的流程无限循环;并且,长距离传递物体分割结果可能导致物体分割结果准确性变差,基于本实施例可以避免以长距离传递导致准确性较差的物体分割结果修正某一帧上较为准确的物体分割结果,有效确保物体分割结果的准确性。
图2为本申请视频物体分割方法另一个实施例的流程图。如图2所示,该实施例的视频物体分割方法包括:
202,获取视频的至少部分帧中参考帧的物体分割结果。
作为本申请实施例的一种可选实现方式,该操作202可以是:接收参考帧的物体分割结果,该参考帧的物体分割结果可以预先获得。
另外,作为本申请实施例的另一种可选实现方式,该操作202也可以通过如下图像物体分割方法实现:对参考帧进行物体分割,获得该参考帧的物体分割结果。
例如,可以通过如下方式对参考帧进行物体分割,获得该参考帧的物体分割结果:
对参考帧进行特征提取,获得该参考帧的特征。示例性地,该参考帧的特征例如可以表示为一个特征向量或者特性图的形式;
根据该特征预测参考帧中各像素的物体类别,获得参考帧的物体分割结果。
在一个可选示例中,该操作202可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的传递网络502执行。
204,在视频的至少部分帧中,自参考帧开始顺序进行参考帧的物体分割结果的帧间传递,获得该至少部分帧中各其他帧的物体分割结果。
可选地,可以针对该至少部分帧中,根据沿参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定传播方向上在后帧的物体分割结果,其中的传播方向包括视频的时序正方向和/或时序反方向。
本申请实施例中,在先帧、在后帧是相对于传播方向的顺序而言的,具有相对性。其中的传播方向可以是视频的时序正方向或时序反方向。在传播方向上顺序靠前的帧为在先帧,在传播方向上顺序靠后的帧为在后帧。例如,在先帧可以是:在后帧在上述至少部分帧中沿时序正方向或时序反方向上的相邻帧或相邻关键帧,其中的关键帧可以是在上述至少部分帧中沿时序正方向或时序反方向上,与在后帧之间间隔在预设帧数范围内的帧。当传播方向变化时,在先帧与在后帧相应变化。
其中,根据应用需求,上述至少部分帧可以是整个视频中的帧,也可以是视频中其中一段视频包括的帧,或者从视频中每隔至少一帧提取出来的帧的集合,均可应用本申请实施例进行视频物体分割。
其中,操作204可以称为物体分割结果的传播过程。
在一个可选示例中,该操作204可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的传递网络502执行。
206,确定上述至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧。
208,以确定的其他帧中的一帧作为目标帧进行丢失物体的分割,并更新该目标帧的物体分割结果。
其中,操作206-208可以称为物体再识别过程。
在一个可选示例中,该操作206-208可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的物体再识别网络504执行。
210,将该目标帧更新后的物体分割结果沿视频的时序正方向和/或时序反方向传递到视频中的至少一其他帧,以该 目标帧作为参考帧,以该目标帧更新后的物体分割结果更新至物体分割结果传播方向上该目标帧的在后帧至上述至少一其他帧中各帧的物体分割结果。
其中,操作210可以称为物体分割结果的传播过程。
在一个可选示例中,该操作210可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的传递网络502执行。
之后,再返回执行操作206,直至上述至少部分帧中不存在相对参考帧的物体分割结果丢失物体的其他帧。
如图3所示,为应用本申请视频物体分割方法实施例对视频中物体进行分割的一个过程示意图。如图3所示,第1行所示图片为一个视频中的至少部分帧,其包括82帧图像,图3中第1行示例性地标出了其中第1、8、20、37、52、64和82帧的帧序号。假设第1帧为参考帧,其物体分割结果可以预先获得,例如通过人工获取或者通过图像物体分割方法获得。
在步骤一中,从第1帧开始,将第1帧的物体的分割结果沿视频的时序正方向帧间传递,传递到最后1帧,即:第82帧,参见第2行图片;
在步骤二中,确定将第1帧的物体的分割结果传递至第82帧的过程中,相对第1帧的物体分割结果丢失物体的其他帧,假设包括第16-36帧;
在步骤三中,选取第21帧作为目标帧进行丢失物体的分割,并根据该丢失物体的分割结果更新该目标帧的物体分割结果,参见第3行图片;
在步骤四中,以第21帧作为参考帧,将该第21帧更新后的物体分割结果分别沿视频的时序正方向和时序反方向顺序进行帧间传递,以对第21帧沿视频的时序正方向和时序反方向上的邻近帧的物体分割结果进行更新,找回这些邻近帧中丢失该丢失物体的分割结果,参见第4行图片;
之后,返回重新执行步骤二至步骤四:
确定将第1帧的物体的分割结果传递至第82帧的过程中,相对第1帧的物体分割结果丢失物体的其他帧,假设包括第60-82帧;
选取第80帧作为目标帧进行丢失物体的分割,并根据该丢失物体的分割结果更新该目标帧的物体分割结果,参见第5行图片;
以第80帧作为参考帧,将该第80帧更新后的物体分割结果分别沿视频的时序正方向和时序反方向顺序进行帧间传递,以对第80帧沿视频的时序正方向和时序反方向上的邻近帧的物体分割结果进行更新,找回这些邻近帧中丢失该丢失物体的分割结果,参见第6行图;
之后,返回重新执行步骤二至步骤四,直至上述至少部分帧中相对第1帧的物体分割结果不存在丢失物体的其他帧。
图4为本申请视频物体分割方法又一个实施例的流程图。该实施例可以示例性地通过一个传递网络实现。如图4所示,该实施例的视频物体分割方法包括:
302,从视频中的当前帧获取包括至少一物体(即一个物体或多个物体)的图像块;从当前帧的邻近帧的物体类别概率图谱获取该至少一物体对应物体类别的概率图谱块。
在一个可选示例中,该操作302可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一获取模块602执行。
304,至少根据上述图像块和概率图谱块,确定当前帧中该至少一物体的物体分割结果。
在一个可选示例中,该操作304可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的确定模块604执行。
本申请本实施例,基于当前帧中包括物体的图像块和邻近帧的物体类别概率图谱中该物体对应物体类别的概率图谱块,来确定当前帧中物体的物体分割结果,可以有效捕获图像中的小尺寸物体和细节信息、降低图像中背景噪声的干扰,从而改善帧中任一物体尺寸较小、尺寸变化较大等原因导致的物体分割结果传递失败的情况,提高了视频物体分割结果的准确率。
在其中一个可选示例中,操作304可以包括:
分别将图像块和概率图谱块放大至预设尺寸;
根据分别放大后的图像块和概率图谱块,获取当前帧中该至少一物体在预设尺寸下的物体分割结果;
根据上述图像块和概率图谱块的放大比例,将该至少一物体在预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
在另一个可选示例中,操作302中还可以包括:根据上述邻近帧与当前帧之间的光流图获取该物体对应的光流图块。相应地,操作304可以包括:根据图像块、概率图谱块和光流图块,确定当前帧中物体的物体分割结果。
在进一步示例中,根据图像块、概率图谱块和光流图块,确定当前帧中至少一物体的物体分割结果,可以通过如下方式实现:
例如通过传递网络中的第一神经网络,根据图像块和概率图谱块,获取当前帧中至少一物体的第一物体分割结果;以及例如通过传递网络中的第二神经网络,根据概率图谱块和光流图块,获取当前帧中至少一物体的第二物体分割结果;
通过传递网络中的计算模块,根据第一物体分割结果和第二物体分割结果,获取当前帧中该至少一物体的物体分割结果。
或者,在进一步示例中,根据图像块、概率图谱块和光流图块,确定当前帧中该至少一物体的物体分割结果,可以通过如下方式实现:
通过传递网络中的第一缩放单元,分别将上述图像块、概率图谱块和光流图块放大至预设尺寸;
根据分别放大后的图像块、概率图谱块和光流图块,获取当前帧中该至少一物体在预设尺寸下的物体分割结果。例如,通过传递网络中的第一神经网络,根据分别放大后的图像块和概率图谱块,获取当前帧中该至少一物体的第三物体分割结果;以及例如通过传递网络中的第二神经网络,根据分别放大后的概率图谱块和光流图块,获取当前帧中至少一物体的第四物体分割结果;通过传递网络中的计算模块,根据第三物体分割结果和第四物体分割结果,确定当前帧中至 少一物体在预设尺寸下的物体分割结果;
例如通过传递网络中的第二缩放单元,根据图像块、概率图谱块和光流图块的放大比例,将至少一物体在预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
其中,当前帧、邻近帧是相对于传播方向的顺序而言的,具有相对性。其中的传播方向可以是视频的时序正方向或时序反方向。在传播方向上顺序靠前的帧为邻近帧,在传播方向上顺序靠后的帧为当前帧。例如,邻近帧可以是:当前帧在视频中沿时序正方向或时序反方向上的相邻帧或相邻关键帧,其中的邻近帧可以是在视频中沿时序正方向或时序反方向上,与当前帧之间间隔在预设帧数范围内的帧。当传播方向变化时,邻近帧与在当前帧相应变化。
另外,在上述任一实施例的操作302中,从当前帧获取的物体的图像块的大小可以大于该至少一物体的物体候选框,以便后续可以从该图像块中提取特征时可以提取到更多的上下文信息,有助于更准确的获取该至少一物体的物体分割结果。
图5为本申请视频物体分割方法再一个实施例的流程图。如图5所示,该实施例的视频物体分割方法包括:
402,从视频中的当前帧获取包括至少一物体的图像块;从当前帧的邻近帧的物体类别概率图谱获取该至少一物体对应物体类别的概率图谱块;以及根据上述邻近帧与当前帧之间的光流图获取该至少一物体对应的光流图块。
在一个可选示例中,该操作402可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一获取模块602执行。
404,通过传递网络中的第一缩放单元,分别将上述图像块、概率图谱块和光流图块放大至预设尺寸。
在一个可选示例中,该操作404可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一缩放单元702执行。
406,通过传递网络中的第一神经网络,根据分别放大后的图像块和概率图谱块,获取当前帧中该至少一物体的第三物体分割结果;以及通过传递网络中的第二神经网络,根据分别放大后的概率图谱块和光流图块,获取当前帧中该至少一物体的第四物体分割结果。
在一个可选示例中,该操作406可以由处理器调用存储器存储的相应指令执行,也可以分别由被处理器运行的第一神经网络704和第二神经网络708执行。
408,通过传递网络中的计算模块,根据第三物体分割结果和第四物体分割结果,确定当前帧中该至少一物体在预设尺寸下的物体分割结果。
在一个可选示例中,该操作408可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的计算单元710执行。
410,通过传递网络中的第二缩放单元,根据图像块、概率图谱块和光流图块的放大比例,将该至少一物体在预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
在一个可选示例中,该操作408可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二缩放单元706执行。
本申请本实施例,基于从当前帧和光流图提取的一物体放大至预设尺寸的图像块、从邻近帧中提取的该物体放大至预设尺寸的概率图谱块获取当前帧的物体分割结果,可以有效捕获图像中的小尺寸物体和细节信息,更准确的获取当前帧的物体分割结果,从而实现精确物体分割结果的帧间传递,改善帧中物体尺寸较小、尺寸变化较大等原因导致的物体分割结果传递失败的情况,提高了视频物体分割结果的准确率。
图6为本申请实施例中将物体分割结果进行帧间传递的一个示例图。如图6所示,示出了本申请任一视频物体分割方法实施例中,通过传递网络将邻近帧(在先帧)的物体分割结果传递至当前帧(在后帧)的一个过程。
在本申请上述任一实施例的视频物体分割方法中,还可以包括:
基于样本视频对上述传递网络进行训练,其中的样本视频中的各帧标注有标注概率图谱。
在本申请任一实施例的其中一个可选实现方式中,可以采用迭代训练法或梯度更新法,基于样本视频的标注概率图谱和传递网络输出的概率图谱,对上述传递网络进行训练,调整传递网络中各网络参数的参数值。
其中,采用迭代训练法,基于样本视频的标注概率图谱和传递网络输出的概率图谱对上述传递网络进行训练时,在满足预设条件时,完成训练,其中的预设条件例如可以是训练次数达到预设次数阈值,或者传递网络针对样本视频输出的概率图谱与该样本图像的标注概率图谱之间的差异满足预设差值。
采用梯度更新法,基于样本视频的标注概率图谱和传递网络输出的概率图谱对上述传递网络进行训练时,可以获取传递网络针对样本视频输出的概率图谱与该样本图像的标注概率图谱之间的差异,利用梯度更新法调整传递网络中各网络参数的参数值,使得传递网络针对样本视频输出的概率图谱与该样本图像的标注概率图谱之间的差异最小化。
在其中一个可选示例中,基于样本视频对传递网络进行训练的操作,可以包括:
基于样本视频对第一神经网络进行训练;以及基于样本视频对第二神经网络进行训练;
响应于第一神经网络和第二神经网络训练完成,基于样本视频对传递网络进行训练。
类似地,可以采用迭代训练法或梯度更新法,基于样本视频的标注概率图谱和待训练网络(第一神经网络、第二神经网络、和/或传递网络)输出的概率图谱,对各待训练网络进行训练,调整各待网络中各网络参数的参数值,此处不再赘述。对第一神经网络、第二神经网络、传递网络进行训练的方法可以相同,也可以不同。例如,可以采用迭代训练法对第一神经网络和第二神经网络进行训练,采用梯度更新法对传递网络进行训练。
基于该实施例,分别对第一神经网络和第二神经网络进行独立训练,在第一神经网络和第二神经网络训练完成后,再对包括第一神经网络和第二神经网络的整个传递网络进行训练,有助于提高传递网络的网络训练结果并提升网络训练效率。
本申请实施例提供的任一种视频物体分割方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的任一种视频物体分割方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的任一种视频物体分割方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述 的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
图7为本申请视频物体分割装置一个实施例的结构示意图。该实施例的视频物体分割装置可用于实现本申请上述图1-3所示任一视频物体分割方法实施例。如图7所示,该实施例的视频物体分割装置包括:传递网络502和物体再识别网络504。其中:
传递网络502,用于在视频的至少部分帧中,自参考帧开始顺序进行参考帧的物体分割结果的帧间传递,获得该至少部分帧中至少一其他帧的物体分割结果;以及将物体再识别网络504获得的目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧。
其中,参考帧的物体分割结果,例如可以通过人工分割或者物体分割网络,预先获得并输入给传递网络502。在本申请任一视频物体分割装置实施例中,物体分割结果可以表示为物体的概率图谱。例如,每帧的物体分割结果可以表示为一个概率图谱,该概率图谱中各像素的取值表示该像素对应的帧中物体的物体类别。另外,每帧的物体分割结果可以表示为多个概率图谱,每个概率图谱分别表示帧中一个物体类别的概率图谱,在每个概率图谱中,各像素对应的帧中物体的物体类别为该概率图谱表示的物体类别的,该像素点的取值可以为1;否则,各像素对应的帧中物体的物体类别不是该概率图谱表示的物体类别的,该像素点的取值可以为0。
在其中一个可选示例中,上述参考帧可以是上述至少部分帧中的第一帧。相应地,传递网络502用于将第一帧的物体分割结果在至少部分帧中沿时序正方向进行帧间传递,直至至少部分帧中的最后一帧。
在另一个可选示例中,上述参考帧可以是上述至少部分帧中的最后一帧。相应地,传递网络502用于将最后一帧的物体分割结果在至少部分帧中沿时序反方向进行帧间传递,直至至少部分帧中的第一帧。
在又一个可选示例中,上述参考帧可以是上述至少部分帧中位于第一帧与最后一帧之间的中间一帧。相应地,传递网络502用于将中间一帧的物体分割结果在至少部分帧中沿时序正方向进行帧间传递,直至至少部分帧中的最后一帧;和/或,将中间一帧的物体分割结果在至少部分帧中沿时序反方向进行帧间传递,直至至少部分帧中的第一帧。
物体再识别网络504,用于确定上述至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧,以确定的其他帧作为目标帧进行丢失物体的分割,以更新该目标帧的物体分割结果。
基于本实施例,可以将参考帧的物体分割结果传递到视频的至少部分帧中的其他帧上,使得视频物体分割结果在时序上更加连续;对传递中丢失物体的目标帧进行丢失物体的分割、以更新该目标帧的物体分割结果,并将该目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧,对传递到的其他帧的物体分割结果进行修正,可以改善因为遮挡和物体姿态大幅度变化造成的该物体分割结果传递失败的情况、以及多个物体运动重叠再分开后,物体分割结果中会混淆或丢失部分物体的情况,提高了视频物体分割结果的准确率。
在图7所示视频物体分割装置实施例的一个可选示例中,传递网络502自参考帧顺序进行参考帧的物体分割结果的帧间传递,获得至少部分帧中至少一其他帧的物体分割结果时,用于:根据沿参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定传播方向上在后帧的物体分割结果,其中的传播方向包括视频的时序正方向和/或时序反方向。其中的在先帧包括:该在后帧在至少部分帧中沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
图8为本申请实施例中传递网络一个实施例的结构示意图。如图8所示,与图7所示实施例相比,该实施例中,传递网络502包括:
第一获取模块602,用于从在后帧获取包括至少一物体的图像块;以及从在先帧的物体类别概率图谱获取该物体对应物体类别的概率图谱块。
本申请任一实施例中,上述图像块可以大于该至少一物体的物体候选框且小于在后帧的图像大小,以便后续可以从图像块中提取更多的上下文信息,有助于更准确的获取该物体的概率图谱。
确定模块604,用于至少根据上述图像块和概率图谱块,确定在后帧中该至少一物体的物体分割结果。
图9为本申请实施例中传递网络另一个实施例的结构示意图。如图9所示,在一个可选示例中,确定模块604可以包括:
第一缩放单元702,用于分别将第一获取模块602获取到的图像块和概率图谱块放大至预设尺寸。
第一神经网络704,用于根据分别放大后的图像块和概率图谱块,确定在后帧中该至少一物体在预设尺寸下的物体分割结果。
第二缩放单元706,用于根据第一缩放单元702对图像块和概率图谱块的放大比例,将至少一物体在预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
在另一个可选示例中,第一获取模块602还用于根据在先帧与在后帧之间的光流图获取该至少一物体对应的光流图块。相应地,该实施例中,确定模块604用于:根据第一获取模块602获取到的图像块、概率图谱块和光流图块,获取在后帧中至少一物体的物体分割结果。如图10所示,为本申请实施例中传递网络又一个实施例的结构示意图。
再参见图10,与图8所示的实施例相比,该实施例中,确定模块604包括:
第一神经网络704,用于根据第一获取模块602获取到的图像块和概率图谱块,获取在后帧中该至少一物体的第一物体分割结果;
第二神经网络708,用于根据第一获取模块602获取到的概率图谱块和光流图块,获取在后帧中该至少一物体的第二物体分割结果;
计算单元710,用于根据上述第一物体分割结果和第二物体分割结果,获取在后帧中该至少一物体的物体分割结果。
图11为本申请实施例中传递网络再一个实施例的结构示意图。如图11所示,与图8所示的实施例相比,在该实施例的传递网络502中,确定模块604包括:
第一缩放单元702,用于分别将第一获取模块602获取到的图像块、概率图谱块和光流图块放大至预设尺寸;
获取单元712,用于根据分别放大后的图像块、概率图谱块和光流图块,获取在后帧中该至少一物体在预设尺寸下的物体分割结果;
第二缩放单元706,用于根据第一缩放单元702对图像块、概率图谱块和光流图块的放大比例,将该至少一物体在 预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
进一步地,再参见图11,在其中一个可选示例中,获取单元712可以包括:
第一神经网络704,用于根据分别放大后的图像块和概率图谱块,获取在后帧中该至少一物体的第三物体分割结果;
第二神经网络708,用于根据分别放大后的概率图谱块和光流图块,获取在后帧中该至少一物体的第四物体分割结果;
计算单元710,用于根据上述第三物体分割结果和第四物体分割结果,确定在后帧中该至少一物体在预设尺寸下的物体分割结果。
在本发明上述任一视频物体分割装置实施例的一个可选示例中,物体再识别网络504确定至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧时,用于:
以至少部分帧中的帧任一其他帧作为当前帧,对当前帧进行物体检测,获得当前帧的物体候选框集;
将当前帧的物体检测框集包括的至少一物体候选框分别与参考帧的物体分割结果对应的物体后续框进行匹配;
根据匹配结果确定当前帧是否是相对参考帧的物体分割结果丢失物体的其他帧。
在进一步示例中,物体再识别网络504将物体检测框集包括的至少一物体候选框与参考帧的物体分割结果对应的物体候选框进行匹配时,用于:分别对物体检测框集包括的至少一物体候选框进行特征提取;将物体检测框集包括的至少一物体候选框的特征,与参考帧中的物体分割结果对应的物体候选框的特征进行匹配。
相应地,物体再识别网络504根据匹配结果确定当前帧是否是相对参考帧的物体分割结果丢失物体的其他帧时,用于:根据匹配结果,确定物体检测框集包括的至少一物体候选框与参考帧中的物体分割结果对应的物体候选框中,是否存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;若存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框,确定当前帧是相对参考帧的物体分割结果丢失物体的其他帧;否则,确定当前帧不是相对参考帧的物体分割结果丢失物体的其他帧。
在本发明上述任一视频物体分割装置实施例的另一个可选示例中,物体再识别网络504确定至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧时,用于:
分别对至少部分帧中的至少一其他帧进行物体检测,得到物体候选框集;
将物体检测框集包括的至少一物体候选框与参考帧的物体分割结果对应的物体候选框进行匹配;
根据匹配结果确定至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧。
在进一步示例中,物体再识别网络504将物体检测框集包括的至少一物体候选框与参考帧的物体分割结果对应的物体候选框进行匹配时,用于:分别对物体检测框集包括的至少一物体候选框进行特征提取;将物体检测框集包括的至少一物体候选框的特征,与参考帧中的物体分割结果对应的物体候选框的特征进行匹配。
相应地,物体再识别网络504根据匹配结果确定至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧时,用于:根据匹配结果,获取物体检测框集包括的至少一物体候选框与参考帧中的物体分割结果对应的物体候选框中,特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;获取特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体检测框集中的物体候选框对应的帧为相对参考帧的物体分割结果丢失物体的其他帧。
在上述另一个可选示例中,物体再识别网络504以确定的其他帧作为目标帧时,用于:若至少部分帧中相对参考帧的物体分割结果丢失物体的其他帧包括多个,可以从相对参考帧的物体分割结果丢失物体的其他帧中选取一个其他帧作为目标帧。
在本申请上述至少一视频物体分割装置实施例的又一个可选示例中,传递网络502将目标帧更新后的物体分割结果顺序传递到视频中的至少一其他帧时,用于:
获取至少部分帧中丢失上述丢失物体的连续帧;以及
将目标帧更新后的物体分割结果顺序传递到连续帧中的至少一其他帧。
其中,该至少一其他帧包括:连续帧中的第一帧,相应地,传递网络502用于将目标帧更新后的物体分割结果在连续帧中沿时序正方向顺序传递到连续帧中的最后一帧;或者
该至少一其他帧包括:连续帧中的最后一帧,相应地,传递网络502用于:将目标帧更新后的物体分割结果在连续帧中沿时序反方向顺序传递到连续帧中的第一帧;或者
该至少一其他帧包括:连续帧中位于第一帧和最后一帧之间的中间帧,相应地,传递网络502用于:将目标帧更新后的物体分割结果在连续帧中沿时序正方向沿时序正方向顺序传递到连续帧中的最后一帧;和/或,将目标帧更新后的物体分割结果在连续帧中沿沿时序反方向顺序传递到连续帧中的第一帧。
本申请实施例还提供了另一种视频物体分割装置。作为另一种视频物体分割装置的其中一个实施例,可以参见图8所示结构,其包括第一获取模块602和确定模块604。其中:
第一获取模块602,用于从视频中的当前帧获取包括至少一物体的图像块;从当前帧的邻近帧的物体类别概率图谱获取该至少一物体对应物体类别的概率图谱块。
其中,当前帧的邻近帧包括:视频中该当前帧沿时序正方向或时序反方向上的相邻帧或相邻关键帧。上述图像块可以大于该至少一物体的物体候选框且小于在后帧的图像大小,以便后续可以从该图像块中提取更多的上下文信息,有助于更准确的获取该至少一物体的概率图谱。
本申请本实施例,基于当前帧中包括物体的图像块和邻近帧的物体类别概率图谱中该物体对应物体类别的概率图谱块,来确定当前帧中物体的物体分割结果,可以有效捕获图像中的小尺寸物体和细节信息、降低图像中背景噪声的干扰,从而改善帧中任一物体尺寸较小、尺寸变化较大等原因导致的物体分割结果传递失败的情况,提高了视频物体分割结果的准确率。
确定模块604,用于至少根据上述图像块和概率图谱块,确定当前帧中该物体的物体分割结果。
参见图9,在该另一种视频物体分割装置实施例的其中一个可选示例中,确定模块604可以包括:
第一缩放单元702,用于分别将第一获取模块602获取到的图像块和概率图谱块放大至预设尺寸。
第一神经网络704,用于根据分别放大后的图像块和概率图谱块,获取当前帧中该至少一物体在预设尺寸下的物体分割结果。
第二缩放单元706,用于根据第一缩放单元702对图像块和概率图谱块的放大比例,将该至少一物体在预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
本申请本实施例,基于从当前帧和光流图提取的一物体放大至预设尺寸的图像块、从邻近帧中提取的该物体放大至预设尺寸的概率图谱块获取当前帧的物体分割结果,可以有效捕获图像中的小尺寸物体和细节信息,更准确的获取当前帧的物体分割结果,从而实现精确物体分割结果的帧间传递,降低帧中物体尺寸较小、尺寸变化较大等原因导致的物体分割结果传递失败的可能性,提高了视频物体分割结果的准确率。
在上述另一种视频物体分割装置实施例的另一个可选示例中,第一获取模块602还用于根据邻近帧与当前帧之间的光流图获取至少一物体对应的光流图块。相应地,该实施例中,确定模块604用于:根据第一获取模块602获取到的图像块、概率图谱块和光流图块,获取当前帧中该至少一物体的物体分割结果。
本申请本实施例,基于从当前帧和光流图提取的至少一物体放大至预设尺寸的图像块、从邻近帧中提取的该至少一物体放大至预设尺寸的概率图谱块获取当前帧的物体分割结果,可以有效捕获图像中的小尺寸物体和细节信息,更准确的获取当前帧的物体分割结果,从而实现精确物体分割结果的帧间传递,降低帧中物体尺寸较小、尺寸变化较大等原因导致的物体分割结果传递失败的可能性,提高了视频物体分割结果的准确率。
参见图10,在该另一个可选示例中,确定模块604可以包括:
第一神经网络704,用于根据第一获取模块602获取到的图像块和概率图谱块,获取当前帧中该物体的第一物体分割结果;
第二神经网络708,用于根据第一获取模块602获取到的概率图谱块和光流图块,获取当前帧中该物体的第二物体分割结果;
计算单元710,用于根据上述第一物体分割结果和第二物体分割结果,获取当前帧中该物体的物体分割结果。
参见图11,在上述另一个可选示例中,确定模块604可以包括:
第一缩放单元702,用于分别将第一获取模块602获取到的图像块、概率图谱块和光流图块放大至预设尺寸。
获取单元712,用于根据分别放大后的图像块、概率图谱块和光流图块,获取当前帧中该至少一物体在预设尺寸下的物体分割结果。
第二缩放单元706,用于根据第一缩放单元702对图像块、概率图谱块和光流图块的放大比例,将该至少一物体在预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
其中,获取单元712可以包括:
第一神经网络704,用于根据分别放大后的图像块和概率图谱块,获取当前帧中该至少一物体的第三物体分割结果。
第二神经网络706,用于根据分别放大后的概率图谱块和光流图块,获取当前帧中至少一物体的第四物体分割结果。
计算单元710,用于根据上述第三物体分割结果和第四物体分割结果,确定当前帧中至少一物体在预设尺寸下的物体分割结果。
本申请实施例还提供了一种电子设备,包括本申请上述任一实施例的视频物体分割装置。
本申请实施例提供的另一种电子设备,包括:
存储器,用于存储可执行指令;以及
处理器,用于与存储器通信以执行可执行指令从而完成本申请上述任一实施例的视频物体分割方法的操作。
另外,本申请实施例还提供了一种计算机存储介质,用于存储计算机可读取的指令,指令被执行时实现本申请上述任一实施例的视频物体分割方法的操作。
另外,本申请实施例还提供了一种计算机程序,包括计算机可读取的指令,当计算机可读取的指令在设备中运行时,设备中的处理器执行用于实现本申请上述任一实施例的视频物体分割方法。
图12为本申请电子设备一个应用实施例的结构示意图。下面参考图12,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备的结构示意图。如图12所示,该电子设备包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)901,和/或一个或多个图像处理器(GPU)913等,处理器可以根据存储在只读存储器(ROM)902中的可执行指令或者从存储部分908加载到随机访问存储器(RAM)903中的可执行指令而执行至少一种适当的动作和处理。通信部912可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器902和/或随机访问存储器903中通信以执行可执行指令,通过总线904与通信部912相连、并经通信部912与其他目标设备通信,从而完成本申请实施例提供的任一方法对应的操作,例如,在视频的至少部分帧中,自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一其他帧的物体分割结果;确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧;以确定的其他帧作为目标帧进行丢失物体的分割,以更新所述目标帧的物体分割结果;将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧。再如,从视频中的当前帧获取包括至少一物体的图像块;从所述当前帧的邻近帧的物体类别概率图谱获取所述至少一物体对应物体类别的概率图谱块;至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果。
此外,在RAM 903中,还可存储有装置操作所需的至少一种程序和数据。CPU901、ROM902以及RAM903通过总线904彼此相连。在有RAM903的情况下,ROM902为可选模块。RAM903存储可执行指令,或在运行时向ROM902中写入可执行指令,可执行指令使处理器901执行本申请上述任一方法对应的操作。输入/输出(I/O)接口905也连接至总线904。通信部912可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分907;包括硬盘等的存储部分908;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信处理。驱动器99也根据需要连接至I/O接口905。可拆卸介质99,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器99上,以便于从其上读出的计算机程序 根据需要被安装入存储部分908。
需要说明的,如图12所示的架构仅为一种可选实现方式,在实践过程中,可根据实际需要对上述图12的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请公开的保护范围。
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,在视频的至少部分帧中,自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一其他帧的物体分割结果的指令;确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧的指令;以确定的其他帧作为目标帧进行丢失物体的分割,以更新所述目标帧的物体分割结果的指令;将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧的指令。再如,从视频中的当前帧获取包括至少一物体的图像块的指令;从所述当前帧的邻近帧的物体类别概率图谱获取所述至少一物体对应物体类别的概率图谱块的指令;至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果的指令。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
本说明书中任一个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于装置、设备等实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本申请的方法和装置。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明,本申请的方法的步骤不限于以上描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施例。

Claims (56)

  1. 一种视频物体分割方法,其特征在于,包括:
    在视频的至少部分帧中,自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一其他帧的物体分割结果;
    确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧;
    以确定的其他帧作为目标帧进行丢失物体的分割,以更新所述目标帧的物体分割结果;
    将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧。
  2. 根据权利要求1所述的方法,其特征在于,所述参考帧包括:所述至少部分帧中的第一帧;所述自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,包括:将所述第一帧的物体分割结果在所述至少部分帧中沿时序正方向进行帧间传递,直至所述至少部分帧中的最后一帧;或者,
    所述参考帧包括:所述至少部分帧中的最后一帧;所述自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,包括:将所述最后一帧的物体分割结果在所述至少部分帧中沿时序反方向进行帧间传递,直至所述至少部分帧中的第一帧;或者,
    所述参考帧包括:所述至少部分帧中位于第一帧与最后一帧之间的中间一帧;所述自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,包括:将所述中间一帧的物体分割结果在所述至少部分帧中沿时序正方向进行帧间传递,直至所述至少部分帧中的最后一帧;和/或,将所述中间一帧的物体分割结果在所述至少部分帧中沿时序反方向进行帧间传递,直至所述至少部分帧中的第一帧。
  3. 根据权利要求1或2所述的方法,其特征在于,所述自参考帧顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一其他帧的物体分割结果,包括:
    根据沿所述参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定所述传播方向上在后帧的物体分割结果,所述传播方向包括所述视频的时序正方向和/或时序反方向。
  4. 根据权利要求3所述的方法,其特征在于,所述在先帧包括:所述在后帧在所述至少部分帧中沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
  5. 根据权利要求3或4所述的方法,其特征在于,根据沿所述参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定所述传播方向上在后帧的物体分割结果,包括:
    从所述在后帧获取包括至少一物体的图像块;从所述在先帧的物体类别概率图谱获取所述至少一物体分別对应的物体类别的概率图谱块;
    至少根据所述图像块和所述概率图谱块,确定所述在后帧中所述至少一物体的物体分割结果。
  6. 根据权利要求5所述的方法,其特征在于,至少根据所述图像块和所述概率图谱块,确定所述在后帧中所述至少一物体的物体分割结果,包括:
    分别将所述图像块和所述概率图谱块放大至预设尺寸;
    根据分别放大后的所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果;
    根据所述图像块和所述概率图谱块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
  7. 根据权利要求5所述的方法,其特征在于,还包括:根据所述在先帧与所述在后帧之间的光流图获取所述至少一物体对应的光流图块;
    所述至少根据所述图像块和所述概率图谱块,确定所述在后帧中所述至少一物体的物体分割结果,包括:根据所述图像块、所述概率图谱块和所述光流图块,确定所述在后帧中所述至少一物体的物体分割结果。
  8. 根据权利要求7所述的方法,其特征在于,根据所述图像块、所述概率图谱块和所述光流图块,确定所述在后帧中所述至少一物体的物体分割结果,包括:
    根据所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体的第一物体分割结果;以及根据所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体的第二物体分割结果;
    根据所述第一物体分割结果和所述第二物体分割结果,获取所述在后帧中所述至少一物体的物体分割结果。
  9. 根据权利要求7所述的方法,其特征在于,根据所述图像块、所述概率图谱块和所述光流图块,确定所述在后帧中所述至少一物体的物体分割结果,包括:
    分别将所述图像块、所述概率图谱块和所述光流图块放大至预设尺寸;
    根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果;
    根据所述图像块、所述概率图谱块和所述光流图块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
  10. 根据权利要求9所述的方法,其特征在于,所述根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果,包括:
    根据分别放大后的所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体的第三物体分割结果;以及根据分别放大后的所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体的第四物体分割结果;
    根据所述第三物体分割结果和所述第四物体分割结果,确定所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果。
  11. 根据权利要求5-10任一所述的方法,其特征在于,所述图像块大于所述物体的物体候选框且小于所述在后帧的图像大小。
  12. 根据权利要求1-11任一所述的方法,其特征在于,所述确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧,包括:
    以所述至少部分帧中的帧任一其他帧作为当前帧,对所述当前帧进行物体检测,获得所述当前帧的物体候选框集;
    将所述当前帧的物体检测框集包括的至少一个物体候选框分别与所述参考帧的物体分割结果对应的物体后续框进行匹配;
    根据匹配结果确定所述当前帧是否是相对所述参考帧的物体分割结果丢失物体的其他帧。
  13. 根据权利要求12所述的方法,其特征在于,将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配,包括:分别对所述物体检测框集包括的至少一个物体候选框进行特征提取;将所述物体检测框集包括的至少一个物体候选框的特征,与所述参考帧中的物体分割结果对应的物体候选框的特征进行匹配;
    所述根据匹配结果确定所述当前帧是否是相对所述参考帧的物体分割结果丢失物体的其他帧,包括:根据匹配结果,确定所述物体检测框集包括的至少一个物体候选框与所述参考帧中的物体分割结果对应的物体候选框中,是否存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;若存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框,确定所述当前帧是相对所述参考帧的物体分割结果丢失物体的其他帧;否则,确定所述当前帧不是相对所述参考帧的物体分割结果丢失物体的其他帧。
  14. 根据权利要求1-11任一所述的方法,其特征在于,所述确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧,包括:
    分别对所述至少部分帧中的至少一个其他帧进行物体检测,得到物体候选框集;
    将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配;
    根据匹配结果确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧。
  15. 根据权利要求14所述的方法,其特征在于,将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配,包括:分别对所述物体检测框集包括的至少一个物体候选框进行特征提取;将所述物体检测框集包括的至少一个物体候选框的特征,与所述参考帧中的物体分割结果对应的物体候选框的特征进行匹配;
    所述根据匹配结果确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧,包括:根据匹配结果,获取所述物体检测框集包括的至少一个物体候选框与所述参考帧中的物体分割结果对应的物体候选框中,特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;获取特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的所述物体检测框集中的物体候选框对应的帧为相对所述参考帧的物体分割结果丢失物体的其他帧。
  16. 根据权利要求14或15所述的方法,其特征在于,所述以确定的其他帧作为目标帧,包括:
    若所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧包括多个,从相对所述参考帧的物体分割结果丢失物体的其他帧中选取一个其他帧作为目标帧。
  17. 根据权利要求1-16任一所述的方法,其特征在于,将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧,包括:
    获取所述至少部分帧中丢失所述丢失物体的连续帧;
    将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧。
  18. 根据权利要求17所述的方法,其特征在于,所述至少一其他帧包括:所述连续帧中的第一帧;将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧,包括:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序正方向顺序传递到所述连续帧中的最后一帧;或者
    所述至少一其他帧包括:所述连续帧中的最后一帧;将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧,包括:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序反方向顺序传递到所述连续帧中的第一帧;或者
    所述至少一其他帧包括:所述连续帧中位于第一帧和最后一帧之间的中间帧;将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧,包括:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序正方向沿时序正方向顺序传递到所述连续帧中的最后一帧;和/或,将所述目标帧更新后的物体分割结果在所述连续帧中沿时序反方向顺序传递到所述连续帧中的第一帧。
  19. 根据权利要求1-18任一所述的方法,其特征在于,针对同一所述丢失物体,本次将所述目标帧更新后的物体分割结果传递到的其他帧,与之前将目标帧更新后的物体分割结果传递到的其他帧的范围不重叠。
  20. 一种视频物体分割方法,其特征在于,包括:
    从视频中的当前帧获取包括至少一物体的图像块;从所述当前帧的邻近帧的物体类别概率图谱获取所述至少一物体对应物体类别的概率图谱块;
    至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果。
  21. 根据权利要求20所述的方法,其特征在于,所述至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果,包括:
    分别将所述图像块和所述概率图谱块放大至预设尺寸;
    根据分别放大后的所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果;
    根据所述图像块和所述概率图谱块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
  22. 根据权利要求20所述的方法,其特征在于,还包括:根据所述邻近帧与所述当前帧之间的光流图获取所述至少一物体对应的光流图块;
    所述至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果,包括:根据所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体的物体分割结果。
  23. 根据权利要求22所述的方法,其特征在于,根据所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体的物体分割结果,包括:
    根据所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体的第一物体分割结果;以及根据所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体的第二物体分割结果;
    根据所述第一物体分割结果和所述第二物体分割结果,获取所述当前帧中所述至少一物体的物体分割结果。
  24. 根据权利要求22所述的方法,其特征在于,根据所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体的物体分割结果,包括:
    分别将所述图像块、所述概率图谱块和所述光流图块放大至预设尺寸;
    根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果;
    根据所述图像块、所述概率图谱块和所述光流图块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
  25. 根据权利要求24所述的方法,其特征在于,所述根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果,包括:
    根据分别放大后的所述图像块和所述概率图谱块,获取所述当前帧中所述物体的第三物体分割结果;以及根据分别放大后的所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体的第四物体分割结果;
    根据所述第三物体分割结果和所述第四物体分割结果,确定所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果。
  26. 根据权利要求20-25任一所述的方法,其特征在于,所述当前帧的邻近帧包括:所述视频中所述当前帧沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
  27. 根据权利要求20-26任一所述的方法,其特征在于,所述图像块大于所述物体的物体候选框且小于所述在邻近帧的图像大小。
  28. 一种视频物体分割装置,其特征在于,包括:传递网络和物体再识别网络;
    所述传递网络,用于在视频的至少部分帧中,自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一个其他帧的物体分割结果;以及将物体再识别网络获得的目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧;
    所述物体再识别网络,用于确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧;以确定的其他帧作为目标帧进行丢失物体的分割,以更新所述目标帧的物体分割结果。
  29. 根据权利要求28所述的装置,其特征在于,所述参考帧包括:所述至少部分帧中的第一帧;所述传递网络用于将所述第一帧的物体分割结果在所述至少部分帧中沿时序正方向进行帧间传递,直至所述至少部分帧中的最后一帧;或者,
    所述参考帧包括:所述至少部分帧中的最后一帧;所述传递网络用于将所述最后一帧的物体分割结果在所述至少部分帧中沿时序反方向进行帧间传递,直至所述至少部分帧中的第一帧;或者,
    所述参考帧包括:所述至少部分帧中位于第一帧与最后一帧之间的中间一帧;所述自参考帧开始顺序进行所述参考帧的物体分割结果的帧间传递,包括:将所述中间一帧的物体分割结果在所述至少部分帧中沿时序正方向进行帧间传递,直至所述至少部分帧中的最后一帧;和/或,将所述中间一帧的物体分割结果在所述至少部分帧中沿时序反方向进行帧间传递,直至所述至少部分帧中的第一帧。
  30. 根据权利要求28或29所述的装置,其特征在于,所述传递网络自参考帧顺序进行所述参考帧的物体分割结果的帧间传递,获得所述至少部分帧中至少一个其他帧的物体分割结果时,用于:
    根据沿所述参考帧的物体分割结果传播方向的在先帧的物体分割结果,确定所述传播方向上在后帧的物体分割结果,所述传播方向包括所述视频的时序正方向和/或时序反方向;
    所述在先帧包括:所述在后帧在所述至少部分帧中沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
  31. 根据权利要求30所述的装置,其特征在于,所述传递网络包括:
    第一获取模块,用于从所述在后帧获取包括至少一物体的图像块;以及从所述在先帧的物体类别概率图谱获取所述至少一物体分别对应物体类别的概率图谱块;
    确定模块,用于至少根据所述图像块和所述概率图谱块,确定所述在后帧中所述至少一物体的物体分割结果。
  32. 根据权利要求31所述的装置,其特征在于,所述确定模块包括:
    第一缩放单元,用于分别将所述图像块和所述概率图谱块放大至预设尺寸;
    第一神经网络,用于根据分别放大后的所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果;
    第二缩放单元,用于根据所述图像块和所述概率图谱块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
  33. 根据权利要求31所述的装置,其特征在于,所述第一获取模块还用于根据所述在先帧与所述在后帧之间的光流图获取所述至少一物体对应的光流图块;
    所述确定模块用于:根据所述图像块、所述概率图谱块和所述光流图块,确定所述在后帧中所述至少一物体的物体分割结果。
  34. 根据权利要求33所述的装置,其特征在于,所述确定模块包括:
    第一神经网络,用于根据所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体的第一物体分割结果;
    第二神经网络,用于根据所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体的第二物体分割结果;
    计算单元,用于根据所述第一物体分割结果和所述第二物体分割结果,获取所述在后帧中所述至少一物体的物体分割结果。
  35. 根据权利要求33所述的装置,其特征在于,所述确定模块包括:
    第一缩放单元,用于分别将所述图像块、所述概率图谱块和所述光流图块放大至预设尺寸;
    获取单元,用于根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果;
    第二缩放单元,用于根据所述图像块、所述概率图谱块和所述光流图块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
  36. 根据权利要求35所述的装置,其特征在于,所述获取单元包括:
    第一神经网络,用于根据分别放大后的所述图像块和所述概率图谱块,获取所述在后帧中所述至少一物体的第三物体分割结果;
    第二神经网络,用于根据分别放大后的所述概率图谱块和所述光流图块,获取所述在后帧中所述至少一物体的第四物体分割结果;
    计算单元,用于根据所述第三物体分割结果和所述第四物体分割结果,确定所述在后帧中所述至少一物体在所述预设尺寸下的物体分割结果。
  37. 根据权利要求31-36任一所述的装置,其特征在于,所述图像块大于所述物体的物体候选框且小于所述在后帧的图像大小。
  38. 根据权利要求28-37任一所述的装置,其特征在于,所述物体再识别网络确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧时,用于:
    以所述至少部分帧中的帧任一其他帧作为当前帧,对所述当前帧进行物体检测,获得所述当前帧的物体候选框集;
    将所述当前帧的物体检测框集包括的至少一个物体候选框分别与所述参考帧的物体分割结果对应的物体后续框进行匹配;
    根据匹配结果确定所述当前帧是否是相对所述参考帧的物体分割结果丢失物体的其他帧。
  39. 根据权利要求38所述的装置,其特征在于,所述物体再识别网络将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配时,用于:分别对所述物体检测框集包括的至少一个物体候选框进行特征提取;将所述物体检测框集包括的至少一个物体候选框的特征,与所述参考帧中的物体分割结果对应的物体候选框的特征进行匹配;
    所述物体再识别网络根据匹配结果确定所述当前帧是否是相对所述参考帧的物体分割结果丢失物体的其他帧时,用于:根据匹配结果,确定所述物体检测框集包括的至少一个物体候选框与所述参考帧中的物体分割结果对应的物体候选框中,是否存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;若存在特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框,确定所述当前帧是相对所述参考帧的物体分割结果丢失物体的其他帧;否则,确定所述当前帧不是相对所述参考帧的物体分割结果丢失物体的其他帧。
  40. 根据权利要求28-37任一所述的装置,其特征在于,所述物体再识别网络确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧时,用于:
    分别对所述至少部分帧中的至少一个其他帧进行物体检测,得到物体候选框集;
    将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配;
    根据匹配结果确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧。
  41. 根据权利要求40所述的装置,其特征在于,所述物体再识别网络将所述物体检测框集包括的至少一个物体候选框与所述参考帧的物体分割结果对应的物体候选框进行匹配时,用于:分别对所述物体检测框集包括的至少一个物体候选框进行特征提取;将所述物体检测框集包括的至少一个物体候选框的特征,与所述参考帧中的物体分割结果对应的物体候选框的特征进行匹配;
    所述物体再识别网络根据匹配结果确定所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧时,用于:根据匹配结果,获取所述物体检测框集包括的至少一个物体候选框与所述参考帧中的物体分割结果对应的物体候选框中,特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的物体候选框;获取特征之间的相似度高于预设阈值、且根据物体分割结果对应的物体类别不一致的所述物体检测框集中的物体候选框对应的帧为相对所述参考帧的物体分割结果丢失物体的其他帧。
  42. 根据权利要求40或41所述的装置,其特征在于,所述物体再识别网络以确定的其他帧作为目标帧时,用于:若所述至少部分帧中相对所述参考帧的物体分割结果丢失物体的其他帧包括多个,从相对所述参考帧的物体分割结果丢失物体的其他帧中选取一个其他帧作为目标帧。
  43. 根据权利要求28-42任一所述的装置,其特征在于,所述传递网络将所述目标帧更新后的物体分割结果顺序传递到所述视频中的至少一其他帧时,用于:
    获取所述至少部分帧中丢失所述丢失物体的连续帧;
    将所述目标帧更新后的物体分割结果顺序传递到所述连续帧中的所述至少一其他帧。
  44. 根据权利要求43所述的装置,其特征在于,所述至少一其他帧包括:所述连续帧中的第一帧;所述传递网络用于将所述目标帧更新后的物体分割结果在所述连续帧中沿时序正方向顺序传递到所述连续帧中的最后一帧;或者
    所述至少一其他帧包括:所述连续帧中的最后一帧;所述传递网络用于:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序反方向顺序传递到所述连续帧中的第一帧;或者
    所述至少一其他帧包括:所述连续帧中位于第一帧和最后一帧之间的中间帧;所述传递网络用于:将所述目标帧更新后的物体分割结果在所述连续帧中沿时序正方向沿时序正方向顺序传递到所述连续帧中的最后一帧;和/或,将所述目标帧更新后的物体分割结果在所述连续帧中沿时序反方向顺序传递到所述连续帧中的第一帧。
  45. 一种视频物体分割装置,其特征在于,包括:
    第一获取模块,用于从视频中的当前帧获取包括至少一物体的图像块;从所述当前帧的邻近帧的物体类别概率图谱获取所述至少一物体对应物体类别的概率图谱块;
    确定模块,用于至少根据所述图像块和所述概率图谱块,确定所述当前帧中所述至少一物体的物体分割结果。
  46. 根据权利要求45所述的装置,其特征在于,所述确定模块包括:
    第一缩放单元,用于分别将所述图像块和所述概率图谱块放大至预设尺寸;
    第一神经网络,用于根据分别放大后的所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果;
    第二缩放单元,用于根据所述图像块和所述概率图谱块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
  47. 根据权利要求45所述的装置,其特征在于,所述第一获取模块还用于根据所述邻近帧与所述当前帧之间的光流图获取所述至少一物体对应的光流图块;
    所述确定模块用于:根据所述图像块、所述概率图谱块和所述光流图块,确定所述当前帧中所述至少一物体的物体分割结果。
  48. 根据权利要求47所述的装置,其特征在于,所述确定模块包括:
    第一神经网络,用于根据所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体的第一物体分割结果;
    第二神经网络,用于根据所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体的第二物体分割结果;
    计算单元,用于根据所述第一物体分割结果和所述第二物体分割结果,获取所述当前帧中所述至少一物体的物体分割结果。
  49. 根据权利要求47所述的装置,其特征在于,所述确定模块包括:
    第一缩放单元,用于分别将所述图像块、所述概率图谱块和所述光流图块放大至预设尺寸;
    获取单元,用于根据分别放大后的所述图像块、所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果;
    第二缩放单元,用于根据所述图像块、所述概率图谱块和所述光流图块的放大比例,将所述至少一物体在所述预设尺寸下的物体分割结果恢复为原始尺寸下的物体分割结果。
  50. 根据权利要求49所述的装置,其特征在于,所述获取单元包括:
    第一神经网络,用于根据分别放大后的所述图像块和所述概率图谱块,获取所述当前帧中所述至少一物体的第三物体分割结果;
    第二神经网络,用于根据分别放大后的所述概率图谱块和所述光流图块,获取所述当前帧中所述至少一物体的第四物体分割结果;
    计算单元,用于根据所述第三物体分割结果和所述第四物体分割结果,确定所述当前帧中所述至少一物体在所述预设尺寸下的物体分割结果。
  51. 根据权利要求45-50任一所述的装置,其特征在于,所述当前帧的邻近帧包括:所述视频中所述当前帧沿时序正方向或时序反方向上的相邻帧或相邻关键帧。
  52. 根据权利要求45-51任一所述的装置,其特征在于,所述图像块大于所述物体的物体候选框且小于所述邻近帧的图像大小。
  53. 一种电子设备,其特征在于,包括权利要求28-52任一所述的视频物体分割装置。
  54. 一种电子设备,其特征在于,包括:
    存储器,用于存储可执行指令;以及
    处理器,用于与所述存储器通信以执行所述可执行指令从而完成权利要求1-27任一所述方法的操作。
  55. 一种计算机存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-27任一所述方法的操作。
  56. 一种计算机程序,包括计算机可读取的指令,其特征在于,当所述计算机可读取的指令在设备中运行时,所述设备中的处理器执行用于实现权利要求1-27任一所述方法中的步骤的可执行指令。
PCT/CN2018/097106 2017-07-26 2018-07-25 视频物体分割方法和装置、电子设备、存储介质和程序 WO2019020062A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/236,482 US11222211B2 (en) 2017-07-26 2018-12-29 Method and apparatus for segmenting video object, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710619408.0 2017-07-26
CN201710619408.0A CN108229290B (zh) 2017-07-26 2017-07-26 视频物体分割方法和装置、电子设备、存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/236,482 Continuation US11222211B2 (en) 2017-07-26 2018-12-29 Method and apparatus for segmenting video object, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2019020062A1 true WO2019020062A1 (zh) 2019-01-31

Family

ID=62655131

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/097106 WO2019020062A1 (zh) 2017-07-26 2018-07-25 视频物体分割方法和装置、电子设备、存储介质和程序

Country Status (3)

Country Link
US (1) US11222211B2 (zh)
CN (1) CN108229290B (zh)
WO (1) WO2019020062A1 (zh)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229290B (zh) * 2017-07-26 2021-03-02 北京市商汤科技开发有限公司 视频物体分割方法和装置、电子设备、存储介质
CN108717701B (zh) * 2018-05-24 2021-03-02 北京乐蜜科技有限责任公司 一种制作影片残影特效的方法、装置、电子设备及介质
CN109493330A (zh) * 2018-11-06 2019-03-19 电子科技大学 一种基于多任务学习的细胞核实例分割方法
CN109711354A (zh) * 2018-12-28 2019-05-03 哈尔滨工业大学(威海) 一种基于视频属性表示学习的目标跟踪方法
CN109816611B (zh) * 2019-01-31 2021-02-12 北京市商汤科技开发有限公司 视频修复方法及装置、电子设备和存储介质
KR20210061072A (ko) 2019-11-19 2021-05-27 삼성전자주식회사 비디오 세그먼테이션 방법 및 장치
KR20210067442A (ko) * 2019-11-29 2021-06-08 엘지전자 주식회사 객체 인식을 위한 자동 레이블링 장치 및 방법
CN111178245B (zh) * 2019-12-27 2023-12-22 佑驾创新(北京)技术有限公司 车道线检测方法、装置、计算机设备和存储介质
CN111901600B (zh) * 2020-08-06 2021-06-11 中标慧安信息技术股份有限公司 一种损失较低的视频压缩方法
CN113570607B (zh) * 2021-06-30 2024-02-06 北京百度网讯科技有限公司 目标分割的方法、装置及电子设备
CN113963305B (zh) * 2021-12-21 2022-03-11 网思科技股份有限公司 一种视频关键帧和特写片段提取方法
JP7391150B1 (ja) 2022-08-02 2023-12-04 三菱電機株式会社 同定装置、同定方法及び同定プログラム

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637253A (zh) * 2011-12-30 2012-08-15 清华大学 基于视觉显著性和超像素分割的视频前景目标提取方法
CN104134217A (zh) * 2014-07-29 2014-11-05 中国科学院自动化研究所 一种基于超体素图割的视频显著物体分割方法
US9390511B2 (en) * 2013-08-23 2016-07-12 Futurewei Technologies, Inc. Temporally coherent segmentation of RGBt volumes with aid of noisy or incomplete auxiliary data
CN106447689A (zh) * 2016-09-27 2017-02-22 微美光速资本投资管理(北京)有限公司 一种全息视频流的分割方法
CN108229290A (zh) * 2017-07-26 2018-06-29 北京市商汤科技开发有限公司 视频物体分割方法和装置、电子设备、存储介质和程序

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2576771B2 (ja) * 1993-09-28 1997-01-29 日本電気株式会社 動き補償予測装置
AU5273496A (en) * 1995-03-22 1996-10-08 Idt International Digital Technologies Deutschland Gmbh Method and apparatus for coordination of motion determination over multiple frames
US7725825B2 (en) * 2004-09-28 2010-05-25 Ricoh Company, Ltd. Techniques for decoding and reconstructing media objects from a still visual representation
JP4741650B2 (ja) * 2005-03-17 2011-08-03 ブリティッシュ・テレコミュニケーションズ・パブリック・リミテッド・カンパニー ビデオシーケンスにおけるオブジェクト追跡の方法
CN101299277B (zh) * 2008-06-25 2011-04-06 北京中星微电子有限公司 一种黑白图像彩色化处理的方法和***
CN101389037B (zh) * 2008-09-28 2012-05-30 湖北科创高新网络视频股份有限公司 一种时空域分割多状态视频编码的方法和装置
US20110109721A1 (en) * 2009-11-06 2011-05-12 Sony Corporation Dynamic reference frame reordering for frame sequential stereoscopic video encoding
US10212462B2 (en) * 2012-01-11 2019-02-19 Videonetics Technology Private Limited Integrated intelligent server based system for unified multiple sensory data mapped imagery analysis
KR20160016812A (ko) * 2013-05-28 2016-02-15 톰슨 라이센싱 조밀한 모션 필드들을 통한 기저 비디오 시퀀스로의 이미지 편집들 전파
CN103985114B (zh) * 2014-03-21 2016-08-24 南京大学 一种监控视频人物前景分割与分类的方法
CN104361601A (zh) * 2014-11-25 2015-02-18 上海电力学院 一种基于标记融合的概率图形模型图像分割方法
KR102153607B1 (ko) * 2016-01-22 2020-09-08 삼성전자주식회사 영상에서의 전경 검출 장치 및 방법
US9756248B1 (en) * 2016-03-02 2017-09-05 Conduent Business Services, Llc Methods and systems for camera drift correction
US10475186B2 (en) * 2016-06-23 2019-11-12 Intel Corportation Segmentation of objects in videos using color and depth information
EP3479160B1 (en) * 2016-06-30 2024-07-24 Magic Leap, Inc. Estimating pose in 3d space
CN106599789B (zh) * 2016-07-29 2019-10-11 北京市商汤科技开发有限公司 视频类别识别方法和装置、数据处理装置和电子设备
CN106897742B (zh) * 2017-02-21 2020-10-27 北京市商汤科技开发有限公司 用于检测视频中物体的方法、装置和电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637253A (zh) * 2011-12-30 2012-08-15 清华大学 基于视觉显著性和超像素分割的视频前景目标提取方法
US9390511B2 (en) * 2013-08-23 2016-07-12 Futurewei Technologies, Inc. Temporally coherent segmentation of RGBt volumes with aid of noisy or incomplete auxiliary data
CN104134217A (zh) * 2014-07-29 2014-11-05 中国科学院自动化研究所 一种基于超体素图割的视频显著物体分割方法
CN106447689A (zh) * 2016-09-27 2017-02-22 微美光速资本投资管理(北京)有限公司 一种全息视频流的分割方法
CN108229290A (zh) * 2017-07-26 2018-06-29 北京市商汤科技开发有限公司 视频物体分割方法和装置、电子设备、存储介质和程序

Also Published As

Publication number Publication date
CN108229290B (zh) 2021-03-02
US20190138816A1 (en) 2019-05-09
US11222211B2 (en) 2022-01-11
CN108229290A (zh) 2018-06-29

Similar Documents

Publication Publication Date Title
WO2019020062A1 (zh) 视频物体分割方法和装置、电子设备、存储介质和程序
US10909380B2 (en) Methods and apparatuses for recognizing video and training, electronic device and medium
US11455782B2 (en) Target detection method and apparatus, training method, electronic device and medium
CN110543815B (zh) 人脸识别模型的训练方法、人脸识别方法、装置、设备及存储介质
CN110569721B (zh) 识别模型训练方法、图像识别方法、装置、设备及介质
KR102319177B1 (ko) 이미지 내의 객체 자세를 결정하는 방법 및 장치, 장비, 및 저장 매체
WO2018202089A1 (zh) 关键点检测方法、装置、存储介质及电子设备
WO2022111506A1 (zh) 视频动作识别方法、装置、电子设备和存储介质
WO2018192570A1 (zh) 时域动作检测方法和***、电子设备、计算机存储介质
WO2020215974A1 (zh) 用于人体检测的方法和装置
US11270124B1 (en) Temporal bottleneck attention architecture for video action recognition
CN110853033B (zh) 基于帧间相似度的视频检测方法和装置
WO2018054329A1 (zh) 物体检测方法和装置、电子设备、计算机程序和存储介质
EP2660753B1 (en) Image processing method and apparatus
US11030750B2 (en) Multi-level convolutional LSTM model for the segmentation of MR images
EP4085369A1 (en) Forgery detection of face image
CN109413510B (zh) 视频摘要生成方法和装置、电子设备、计算机存储介质
JP7297989B2 (ja) 顔生体検出方法、装置、電子機器及び記憶媒体
US20230104262A1 (en) Panoptic segmentation refinement network
CN110399826B (zh) 一种端到端人脸检测和识别方法
WO2020092276A1 (en) Video recognition using multiple modalities
US20230030431A1 (en) Method and apparatus for extracting feature, device, and storage medium
US9081800B2 (en) Object detection via visual search
CN116761020A (zh) 视频处理方法、装置、设备和介质
CN113705666B (zh) 分割网络训练方法、使用方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18838926

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/06/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18838926

Country of ref document: EP

Kind code of ref document: A1