WO2024104239A1 - 视频标注方法、装置、设备、介质及产品 - Google Patents

视频标注方法、装置、设备、介质及产品 Download PDF

Info

Publication number
WO2024104239A1
WO2024104239A1 PCT/CN2023/130577 CN2023130577W WO2024104239A1 WO 2024104239 A1 WO2024104239 A1 WO 2024104239A1 CN 2023130577 W CN2023130577 W CN 2023130577W WO 2024104239 A1 WO2024104239 A1 WO 2024104239A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
segment
video
annotation
sub
Prior art date
Application number
PCT/CN2023/130577
Other languages
English (en)
French (fr)
Inventor
颜鹏翔
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024104239A1 publication Critical patent/WO2024104239A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and in particular to a video annotation method, device, equipment, medium, and product.
  • Video processing can be applied to many technical fields, such as artificial intelligence, intelligent transportation, finance, content recommendation, and many other technical fields.
  • the specific technologies involved may include target tracking, target detection, etc.
  • video annotation is generally performed manually frame by frame.
  • the manual annotation method has low annotation efficiency and high annotation cost.
  • the embodiments of the present disclosure provide a video annotation method, device, equipment, medium and product to overcome the technical problems of low annotation efficiency and high annotation cost in manual annotation.
  • an embodiment of the present disclosure provides a video annotation method, including:
  • a target labeling result of the video to be labeled is generated.
  • an embodiment of the present disclosure provides a video annotation device, including:
  • a first determining unit is used to determine a sub-segment to be labeled in the video to be labeled, and obtain a target sub-segment;
  • a first frame marking unit used to obtain a first frame marking result corresponding to the first frame of the target sub-segment
  • a tail frame marking unit configured to generate a tail frame marking result corresponding to the tail frame of the target sub-segment based on the first frame marking result
  • a segment annotation unit configured to generate an annotation result of an intermediate frame of the target sub-segment according to the first frame annotation result and the last frame annotation result, so as to obtain an annotation result of the target sub-segment to be annotated;
  • the second determining unit is configured to generate a target annotation result of the video to be annotated based on the annotation result of the target sub-segment.
  • an embodiment of the present disclosure provides an electronic device, including: a processor and a memory;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory, so that the processor is configured with the video annotation method as described in the first aspect and various possible designs of the first aspect.
  • an embodiment of the present disclosure provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions.
  • the computer-readable storage medium stores computer-executable instructions.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video annotation method described in the first aspect and various possible designs of the first aspect.
  • the technical solution provided in this embodiment can determine the target sub-segment to be labeled from the segment dimension for the video to be labeled.
  • the first frame labeling result corresponding to the first frame of the target sub-segment can be obtained first, and then based on the first frame labeling result, the tail frame labeling result corresponding to the tail frame of the target sub-segment can be generated.
  • the first frame labeling result and the tail frame labeling result can be used to label the intermediate frames in the target sub-segment, so as to achieve the labeling of the intermediate frames of the target sub-segment and obtain the labeling result of the target sub-segment.
  • the tail frame can be obtained by automatically labeling the first frame
  • the middle frame can be obtained by automatically labeling the first frame labeling result and the tail frame labeling result, so as to achieve efficient labeling of the intermediate frames.
  • the target labeling result of the video to be labeled can be determined.
  • FIG1 is a diagram showing an application example of a video annotation method provided by an embodiment of the present disclosure
  • FIG2 is a flow chart of an embodiment of a video annotation method provided by an embodiment of the present disclosure
  • FIG3 is a flow chart of another embodiment of a video annotation method provided by an embodiment of the present disclosure.
  • FIG4 is a diagram showing an example of feature propagation provided by an embodiment of the present disclosure.
  • FIG5 is a flow chart of another embodiment of a video annotation method provided by an embodiment of the present disclosure.
  • FIG6 is an example diagram of an update of a first frame annotation result provided by an embodiment of the present disclosure.
  • FIG7 is a flowchart of another embodiment of a video annotation method provided by an embodiment of the present disclosure.
  • FIG8 is a flowchart of another embodiment of a video annotation method provided by an embodiment of the present disclosure.
  • FIG9 is a diagram showing an example of dividing a video sub-segment provided by an embodiment of the present disclosure.
  • FIG10 is an example diagram of extracting a key frame provided by an embodiment of the present disclosure.
  • FIG11 is a schematic structural diagram of an embodiment of a video annotation device provided by an embodiment of the present disclosure.
  • FIG. 12 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
  • the technical solution disclosed in the present invention can be applied to video annotation scenarios.
  • By obtaining the annotation result of the first frame and automatically annotating the last frame based on the first frame annotation result other image frames in the image frame can be automatically annotated by obtaining the first frame annotation result and the last frame annotation result, thereby improving the video annotation efficiency.
  • Video samples may include the video itself and the video labels.
  • the video labels generally refer to the labels of each image frame in the video.
  • Each image frame, that is, the labeling results of each image frame in the video are generally obtained by manual labeling.
  • Manual labeling frame by frame generally requires a lot of manual work, with low labeling efficiency and high labeling cost.
  • the present disclosure considers automatically completing the annotation of images.
  • Automatic annotation of images generally requires a region recognition model of the image. If the region recognition model is used directly, the annotation results obtained are not accurate enough.
  • some images can be manually annotated, and then the manually annotated images can be used to annotate the remaining images using a semi-supervised annotation method. The images annotated in this way are more accurate and the annotation efficiency is greatly improved.
  • FIG. 1 is an application example diagram of a video annotation method provided by an embodiment of the present disclosure.
  • the video annotation method can be applied to an electronic device 1, and the electronic device 1 can include a display device 2.
  • the display device 2 can display the video to be annotated.
  • the video to be annotated can be divided into at least one video sub-segment based on multiple key frames.
  • the video to be annotated can be annotated according to each video sub-segment, for example, the target sub-segment 3 is segmented, and the electronic device 1 can display the segment annotation result of any image 4 in the target sub-segment 3 in the display device 2.
  • the segment annotation result can be, for example, the area 5 where the vehicle is located in FIG.
  • the vehicle area 5 shown in FIG. 1 is annotated with a rectangular frame.
  • This annotation method is only exemplary and should not constitute a specific limitation on the annotation method and annotation type. In actual applications, other shapes such as the outline, circle, polygon, etc. of the annotated object can also be used for annotation.
  • the annotated target sub-segment can be used to determine the target annotation result of the video to be annotated.
  • FIG. 2 it is a flow chart of an embodiment of a video annotation method provided by an embodiment of the present disclosure.
  • the video annotation method can be configured as a video annotation device, and the video annotation device can be located in an electronic device.
  • the video annotation method can include the following steps:
  • the method may include: in response to a video annotation request, obtaining a video to be annotated.
  • the target sub-segment may be a sub-segment to be marked in at least one video sub-segment in the video to be marked.
  • the video to be marked may be divided into at least one video sub-segment, and the at least one video sub-segment may be obtained by dividing the video segment to be marked.
  • the first frame may be the first image of the target sub-segment, or may be any image of the target sub-segment.
  • the first frame annotation result can be obtained by manual annotation or extracted by the image annotation model. In order to improve the efficiency of first frame annotation, it can also be automatically annotated by the image annotation model, and then the annotation result of the image annotation model can be manually corrected to obtain the final first frame annotation result.
  • the tail frame may be the last image of the target sub-segment.
  • the end frame annotation result can be obtained by combining the annotation result of the first image of the target sub-segment with the semi-supervised annotation algorithm.
  • the semi-supervised annotation algorithm can use a forward propagation method to propagate the annotation result of the first image of the target sub-segment to the end frame to obtain the end frame annotation result of the end frame.
  • the intermediate frame may include an unlabeled image frame in the target sub-segment, and the intermediate frame may be obtained by annotating the first frame annotation result and the last frame annotation result.
  • the target sub-segment may include multiple images or image frames, and each image may be annotated to obtain the annotation result of each image. After the multiple image frames in the target sub-segment are annotated, the segment annotation result of the target sub-segment composed of the annotation results corresponding to the multiple image frames of the target sub-segment may be obtained.
  • the video to be annotated may include at least one video sub-segment, each of which may be referred to as a target sub-segment during the annotation process, and the annotation result of the target sub-segment may be obtained after the annotation is completed.
  • the target annotation result of the video to be annotated may include annotation results corresponding to multiple video sub-segments.
  • the segment to be annotated can be determined from the segment dimension to obtain a target sub-segment.
  • the first frame annotation result corresponding to the first frame of the target sub-segment can be first obtained, and the last frame annotation result corresponding to the last frame can be generated by the first frame annotation result of the first frame.
  • the first frame annotation result and the last frame annotation result can be used to annotate the intermediate frames in the target sub-segment respectively, so as to realize automatic annotation of the target sub-segment and obtain the annotation result of the target sub-segment.
  • Each image in the target sub-segment can be automatically annotated through its first frame annotation results and last frame annotation results, achieving efficient annotation results.
  • the target annotation results of the video to be annotated can be determined.
  • the tail frame annotation result of the tail frame can be obtained by manual annotation.
  • the forward propagation algorithm can be used to determine the tail frame annotation result of the tail frame.
  • FIG3 it is a flow chart of an embodiment of an image annotation method provided by an embodiment of the present disclosure.
  • the difference between this method and the above-mentioned embodiment is that the annotation result of the last frame of the target sub-segment is generated based on the annotation result of the first frame, including:
  • the end frame annotation result of the end frame can be automatically determined based on the first frame annotation result and combined with the forward propagation algorithm.
  • the annotation efficiency of the end frame can be effectively improved.
  • a forward propagation algorithm is used to determine the last frame annotation result corresponding to the last frame, including:
  • the first frame annotation result is sequentially propagated to the unannotated image frames in the target sub-segment to obtain the annotation results of the unannotated image frames in the target sub-segment;
  • the annotation result of the last image frame of the target sub-segment is obtained as the tail frame annotation result corresponding to the tail frame.
  • a forward propagation algorithm is utilized to sequentially propagate the first frame annotation result to the unlabeled image frames in the target sub-segment to obtain the annotation result of the unlabeled image frames, and the first frame annotation result is propagated to the unlabeled image frames through the forward propagation algorithm until it is propagated to the last image frame of the target video segment to obtain the tail frame annotation result corresponding to the tail frame.
  • the annotation of the tail frame is continuously propagated, so that the annotation of the tail frame refers to its vicinity, such as the annotation result of the image frame before the tail frame, thereby improving the efficiency and accuracy of the tail frame annotation.
  • a two-way propagation method can be used to annotate the intermediate frames of the target sub-segment.
  • the image can be automatically annotated according to the position differences between the intermediate frame and the first frame and the last frame, so as to improve the image annotation accuracy.
  • FIG. 4 a flowchart of another embodiment of a video annotation method provided by an embodiment of the present disclosure is provided.
  • the difference from the above-mentioned embodiment is that, according to the first frame annotation result and the last frame annotation result, generating the annotation result of the middle frame of the target sub-segment may include:
  • the forward propagation algorithm may include a machine learning algorithm, a neural network algorithm, etc., which may be obtained through training.
  • the forward propagation algorithm may be used to propagate features of the first frame annotation result of the first frame to the intermediate frames after the first frame to obtain forward propagation features of the intermediate frames.
  • the target sub-segment may include N image frames, each of which may be annotated as an intermediate frame.
  • N is a positive integer greater than 1.
  • the first frame and the last frame may be annotated first, and then each image frame may be used as an intermediate frame in turn starting from the second image frame in the target sub-segment to obtain the annotation result of each intermediate frame until the annotation result of the previous image of the last frame of the target sub-segment is obtained, at which time the annotation of the target sub-segment is completed.
  • the forward propagation feature can refer to the feature propagation of the label of the first frame to other images located after it frame by frame starting from the first frame, and the propagation stops when the image corresponding to the image sequence number is propagated, and the image features obtained.
  • the first frame annotation result is used as a feature propagation mask to participate in feature calculation.
  • the semi-supervised segmentation algorithm in the following embodiment can be used for feature transmission.
  • the back propagation algorithm may include a machine learning algorithm, a neural network algorithm, etc., which may be obtained through training.
  • the back propagation algorithm may propagate the end frame annotation result to the intermediate frame before the end frame to obtain the back propagation features of the intermediate frame.
  • the backward propagation feature can refer to the feature propagation of the label of the last frame to other images before it frame by frame starting from the last frame, and stopping the propagation when the image corresponding to the image sequence number is reached, and obtaining the image features.
  • the last frame annotation result can also be used as a feature propagation mask to participate in feature calculation.
  • the target sub-segment may include one or more intermediate frames, each of which may be annotated to obtain an annotation result of each intermediate frame.
  • the segment annotation result of the target sub-segment may include the annotation results of each of the multiple intermediate frames.
  • the forward propagation feature of the intermediate frame can be obtained by using the forward propagation algorithm
  • the backward propagation feature of the intermediate frame can be obtained by using the backward propagation algorithm.
  • the fusion of the forward propagation feature and the backward propagation feature can make the target image feature fuse the first frame annotation result and the last frame annotation result.
  • the target image feature can better characterize the annotation feature of the intermediate frame, thereby improving the annotation precision and accuracy of the intermediate frame.
  • the step of extracting the forward propagation features may include: determining the forward propagation features of the intermediate frames using the forward propagation algorithm according to the first frame labeling result and the image sequence number.
  • the step of extracting the backward propagation features may include: determining the backward propagation features of the intermediate frames using the backward propagation algorithm according to the last frame labeling result and the image sequence number.
  • different categories can be set for image labeling according to different actual usage requirements.
  • Multiple types of labels can be marked at the same time.
  • vehicles and pedestrians in the video can be tracked. Therefore, vehicles and pedestrians can be used as two label categories for separate labeling.
  • labels of each category are not affected by other categories, and label features can be generated for each label category.
  • the elements of the label feature can represent the probability that each pixel belongs to the label category.
  • the element value of the coordinate can specifically include the probability values corresponding to the coordinate in at least one label category, and the label represented by the label category with the largest probability value is the label of the coordinate.
  • the forward propagation features and the backward propagation features can be subjected to feature fusion processing to obtain the target image features of the intermediate frame; the labeling results of the intermediate frame can be determined by feature recognition based on the target image features. Determining the forward propagation features of the intermediate frame based on the first frame labeling results and the image sequence number includes: extracting features of the first frame based on the first frame labeling results to obtain label features corresponding to at least one label category of the first frame, and backward propagating the label features corresponding to at least one label category of the first frame to obtain the forward label features corresponding to at least one label category of the intermediate frame, so as to obtain at least one label category classification. The corresponding forward propagation features.
  • the backward propagation features of the intermediate frame are determined, including: extracting features of the tail frame according to the tail frame label, obtaining label features of the tail frame corresponding to at least one label category, forward propagating the label features corresponding to at least one label category corresponding to the tail frame, obtaining backward label features corresponding to at least one label category of the intermediate frame, so as to obtain forward propagation features corresponding to at least one label category.
  • the forward propagation feature is obtained based on the first frame annotation result and the image sequence number, so that the forward propagation feature combines the characteristics of the first frame annotation result and the image sequence number.
  • the backward propagation feature is obtained based on the last frame annotation result and the image sequence number, and combines the characteristics of the last frame annotation result and the image sequence number.
  • the forward propagation feature and the backward propagation feature are the results of the image features propagated from the first frame and from the last frame, respectively.
  • the forward propagation feature and the backward propagation feature are used to perform feature fusion processing to obtain the target image feature of the intermediate frame.
  • the target image feature combines the propagation characteristics of the forward and backward directions.
  • the annotation result obtained using the target image feature is more accurate, which can improve the annotation efficiency and accuracy of the intermediate frame.
  • the forward propagation algorithm may include: a semi-supervised segmentation algorithm.
  • Back propagation algorithms may include: semi-supervised segmentation algorithms.
  • the semi-supervised segmentation algorithm can be used to perform forward feature propagation on the target sub-segment frame by frame starting from the first frame until the forward propagation feature at the image sequence number is obtained.
  • the semi-supervised segmentation algorithm can be used to perform backward feature propagation processing on the target sub-segment frame by frame starting from the last frame until the backward propagation feature at the image sequence number is obtained.
  • the semi-supervised segmentation algorithm may be a semi-supervised object segmentation algorithm.
  • the semi-supervised segmentation algorithm may be used to calculate the image features of the current frame using the image features of the previous frame for the target sub-segment starting from the first frame or the last frame, until the forward or backward propagation features corresponding to the image sequence number are obtained.
  • a semi-supervised segmentation algorithm can be used to perform forward feature propagation on the target sub-segment frame by frame starting from the first frame until the forward propagation feature at the image sequence number position is obtained.
  • the forward propagation of image features can be completed, so that the corresponding forward propagation features obtained by calculation integrate the forward features of the first frame and the image before it, and the feature expression is higher.
  • the semi-supervised segmentation algorithm can also be used to propagate from the last frame, that is, to perform backward feature propagation processing frame by frame starting from the last frame until the backward propagation feature at the image sequence number is obtained.
  • image features can be forward or backward propagated to improve the calculation accuracy of image features.
  • feature fusion calculation can be performed based on the forward propagation features and the backward propagation features, so that the image features of the intermediate frame integrate the features of both the forward and backward aspects.
  • the forward propagation features and the backward propagation features are subjected to feature fusion processing to obtain the target image features of the intermediate frame, including:
  • the forward propagation weight and the backward propagation weight are determined
  • the forward propagation weights According to the forward propagation weights, the backward propagation weights, the forward propagation features and the backward propagation features, the target image features of the intermediate frame are obtained.
  • the image sequence number of the intermediate frame may refer to the order in which the intermediate frame appears in the target sub-segment.
  • the image sequence number of the first image in the target sub-segment may be 1, and the image sequence number of the second image that appears may be 2.
  • the position of the intermediate frame in the target sub-segment may be determined by the image sequence number.
  • Each image frame may determine the corresponding image sequence number according to its marking order, for example, the image sequence number of the first frame may be 1, and the image sequence number of the last frame may be N+1.
  • the intermediate frames and the image sequence numbers of the intermediate frames in the target sub-segment can be determined, and the sequence numbers of the intermediate frames can represent their positional relationship with the first frame and the last frame.
  • the annotation results of the first frame and the last frame can be combined with the image sequence numbers of the intermediate frames to determine the annotation results of the intermediate frames.
  • the annotation effect of the intermediate frames is associated with the positions of the intermediate frames in the target sub-segment, thereby improving the annotation accuracy.
  • determining the sequence number ratio according to the image sequence number may include:
  • the ratio of the image sequence number of the middle frame to the sequence number of the last frame corresponding to the last frame of the target sub-segment is calculated.
  • obtaining the target image features of the intermediate frame according to the forward propagation weights, the backward propagation weights, the forward propagation features, and the backward propagation features may include:
  • the forward propagation features and the backward propagation features are subjected to feature fusion processing and weighted summation to obtain the target image features of the intermediate frame.
  • determining the forward propagation weight and the backward propagation weight may include: determining the sequence number ratio K/N as the backward propagation weight, and determining the difference between the integer 1 and the sequence number ratio, that is, 1-K/N, as the forward propagation weight.
  • the weighted summation step of the target image feature may include:
  • the forward propagation feature may include forward label features corresponding to at least one label category.
  • the backward propagation feature may include backward label features corresponding to at least one label category. According to the forward propagation weight and the backward propagation weight, the forward label features and backward label features of each label category are weighted and summed to obtain the fusion features corresponding to each label category.
  • the fusion features corresponding to each label category are the target image features.
  • the weighted sum of the forward label features and the backward label features of each label category may include: for the forward label features and the backward label features of each label category, multiplying the first eigenvalue of each pixel coordinate in the forward label feature by the forward propagation weight, multiplying the second value of the backward label feature by the backward propagation weight, adding the two products, and obtaining the eigenvalue of each pixel coordinate in the label category, that is, obtaining the eigenvalue of the label category at each pixel coordinate.
  • the target image features can be represented as the feature values of each pixel coordinate of the intermediate frame in different label categories.
  • the first frame annotation result of the first frame 501 is 5011
  • the last frame annotation result of the last frame 502 is 5021.
  • the first frame annotation result 5011 of the first frame 501 is the forward propagation feature corresponding to the forward propagation
  • the last frame annotation result 5021 of the last frame 502 is the backward propagation feature corresponding to the forward propagation feature.
  • the intermediate frame 503 can perform feature fusion on the forward propagation feature and the backward propagation feature based on its sequence number to obtain the corresponding target image feature.
  • the target image feature can be identified by the image classification layer to obtain the target area 5031 of the intermediate frame.
  • the target area 5031 can be the annotation result of the intermediate frame.
  • the correlation between the image and the features of the forward propagation and the features of the backward propagation can be calculated according to the image sequence number, that is, the sequence number ratio corresponding to the image sequence number is calculated.
  • the sequence number ratio can be used to determine the forward propagation weight and the backward propagation weight.
  • determining the labeling result of the intermediate frame according to the target image feature may include:
  • the target area of the target image feature is identified
  • the target area is used as the annotation result of the intermediate frame.
  • identifying a target region of a target image feature may include: Determine the eigenvalues corresponding to at least one label category of each pixel coordinate in the intermediate frame of the target image feature, obtain the maximum eigenvalue among the eigenvalues corresponding to at least one label category of each pixel coordinate, so as to obtain the maximum eigenvalue of each pixel coordinate.
  • the label category corresponding to the maximum eigenvalue of each pixel coordinate determine the target pixel coordinates corresponding to each label category, determine the label area formed by the target pixel coordinates of each label category, and obtain the target area composed of the label area corresponding to at least one label category. That is, the label area corresponding to at least one label category can be the annotation result of the intermediate frame.
  • the image classification layer can be a mathematical model for feature classification of image features.
  • the target region of the target image feature can be identified according to the image classification layer, and the target region is used as the labeling result of the intermediate frame.
  • the use of the image classification layer can accurately extract the label of the target image feature.
  • FIG. 6 it is a flow chart of another embodiment of an image annotation method provided by an embodiment of the present disclosure.
  • the difference from the above-mentioned embodiment is that after determining the annotation result of the intermediate frame, it further includes:
  • the labeling result may include label areas corresponding to at least one label category.
  • 603 Detect the label modification operation performed by the user on the annotation result of the intermediate frame, and obtain the modified annotation result of the intermediate frame.
  • the intermediate frame and its annotation results can be output simultaneously, and the automatic annotation results of the intermediate frame can be output for users to view.
  • the user can view the annotation result of the intermediate frame and check the marking effect of the annotation result. If the marking is unqualified, the annotation result can be modified. If the marking is qualified, the annotation result of the intermediate frame can be directly determined. By interactively displaying with the user, the annotation result of the intermediate frame can be made to better match the user's annotation requirements and the annotation accuracy can be higher.
  • obtaining the first frame annotation result corresponding to the first frame of the target sub-segment may include:
  • a previous video sub-segment of the target sub-segment is obtained, and a tail frame marking result corresponding to a tail frame of the previous video sub-segment is determined as a first frame marking result corresponding to a first frame of the target sub-segment.
  • the label setting operation performed by the user on the first frame of the target sub-segment can be detected to obtain the first frame annotation result when the setting is completed.
  • the target sub-segment is not the first video sub-segment
  • the tail frame annotation result of the tail frame of the previous video sub-segment of the target sub-segment is obtained as the first frame annotation result corresponding to the first frame of the target sub-segment.
  • the first frame annotation result corresponding to the annotation operation can be obtained, and the first frame annotation result that better matches the user's annotation requirements can be obtained, or the last frame annotation result of the previous video sub-segment can be used as the annotation result of the first frame, which can improve the first frame annotation efficiency.
  • the first frame of the target sub-segment and the first frame annotation result corresponding to the first frame can also be obtained in the following manner:
  • the middle frame after the modification of the annotation result is updated as the first frame
  • the modified annotation result of the intermediate frame is used as the annotation result of the first frame.
  • FIG. 7 an example diagram of an image frame annotation prompt provided by an embodiment of the present disclosure is shown.
  • the intermediate frame 701 can be used as the first frame.
  • the original first frame 702 can no longer be used as the first frame.
  • the annotation prompt of the image frame in FIG. 7 is merely exemplary and does not have a limiting effect.
  • the propagation accuracy of the label is reduced, and the matching degree with the actual annotation requirements of the user is low.
  • the intermediate frame after the label modification is used as the first frame, and the image annotation after the intermediate frame modification is used as the first frame annotation result, which can provide more effective image propagation and improve the efficiency and accuracy of image propagation.
  • determining the sub-segment to be annotated in the video to be annotated and obtaining the target sub-segment includes:
  • the key frames of the video to be annotated can be grouped in groups, two adjacent key frames can be used as a group, and at least one group of key frames can be determined from at least one key frame.
  • a group of key frames includes an adjacent first key frame and a second key frame, the first key frame is located before the second key frame, and the second key frame of the previous group of key frames is the same as the first key frame of the next group of key frames.
  • the video interval surrounded by two adjacent key frames can be used as a video sub-segment, that is, the video sub-segment can include two key frames and an intermediate frame between the two key frames.
  • the intermediate frame can be obtained by sampling at a preset sampling frequency.
  • the key frame may be an image that is significantly different from the image near it in the video to be annotated. For example, if there is no vehicle in the image at time t1, and there is a vehicle in the image at time t2, and the time difference between t1 and t2 is within the time constraint, then the image at time t2 is determined to be the key frame.
  • Fig. 9 is an example diagram of a video sub-segment division provided by an embodiment of the present disclosure.
  • the key frames of the video to be annotated are key frame 1, key frame 2, key frame 4, and key frame 6.
  • Two adjacent key frames can be regarded as a group.
  • key frame 1 and key frame 2 can be used as a group of adjacent key frames, and the image frames surrounded by the group of adjacent key frames can be video sub-segment 1.
  • Video sub-segment 1 can be composed of key frame 1, key frame 2, and image frame 3 between key frames 1 and 2.
  • Key frame 2 and key frame 4 may be a group of adjacent key frames, and the image frames enclosed by the group of adjacent key frames may be video sub-segment 2.
  • Video sub-segment 2 may be composed of key frame 2, key frame 4, and image frame 5 between key frames 2 and 4.
  • Key frame 4 and key frame 6 may be a group of adjacent key frames, and the image frames enclosed by the group of adjacent key frames may be video sub-segment 3.
  • Video sub-segment 3 may be composed of key frame 4, key frame 6, and image frame 7 between key frames 4 and 6.
  • key frame 2 can be the last frame of video sub-segment 1, or the first frame of video sub-segment 2.
  • Key frame 4 can be the last frame of video sub-segment 2, or the first frame of video sub-segment 3.
  • the key frames can be used to complete Acquisition of two adjacent key frames.
  • the video interval enclosed by the two adjacent key frames in the video to be labeled can be a video sub-segment, and then at least one video sub-segment corresponding to the video to be labeled is obtained, so that the last frame of the previous video sub-segment of the two adjacent video sub-segments in at least one video sub-segment is the same as the first frame of the next video sub-segment, completing the comprehensive and accurate segmentation of the video to be labeled, so that the segmentation efficiency of at least one video sub-segment is higher.
  • At least one key frame may be extracted from the video to be labeled according to the key frame extraction frequency; or at least one key frame satisfying the image change condition may be extracted from the video to be labeled.
  • the key frame extraction frequency can be set according to the usage requirements and can be obtained by presetting.
  • the unit of the key frame extraction frequency is frame/time. For each image frame of the feature extraction frequency, a key frame is extracted. For example, when the key frame extraction frequency is 10, a key frame can be extracted every 10 frames, and the 1st frame and the 11th frame can both be key frames.
  • extracting at least one key frame of the video to be labeled includes:
  • At least one key frame in the image frame is obtained according to the motion amplitude value.
  • the image change condition may include: the motion amplitude value of the image frame is greater than the index threshold.
  • obtaining at least one key frame in the image frame according to the motion amplitude value may include:
  • the image frame is determined to be a key frame to obtain at least one key frame among the multiple image frames.
  • the motion amplitude value may refer to the amplitude difference between the image frame and its surrounding frames.
  • the amplitude value of the image frame and the amplitude value of the surrounding frames may be calculated by difference to obtain the motion amplitude value. If the motion amplitude value is greater than the index threshold, it means that the image frame is significantly different from the surrounding frames, and the image frame may be used as a key frame.
  • FIG10 an example diagram of a key frame extraction provided in an embodiment of the present disclosure is shown.
  • the amplitude of each image frame changes continuously starting from the first image frame 0, and the amplitude lines of each image frame form a curve 1001.
  • the motion amplitude value can be the amplitude difference between each image frame.
  • the amplitude difference of the image frame can be determined by the change of the curve 1001, that is, the image frame corresponding to the key point 1002 whose forward and backward motion amplitude is greater than the index threshold can be a key frame.
  • the index data of the motion amplitude index of each image frame can be calculated for multiple image frames in the video to be annotated, and the key frame can be screened according to the motion amplitude of each image frame.
  • the key frame can be used to obtain the video sub-segment, and the motion amplitude can be used as the basis for obtaining the video sub-segment, so that the motion amplitude value of the same video sub-segment is used as the basis for division, which can effectively improve the image annotation accuracy when automatically annotating the image.
  • the step of calculating the motion amplitude value of each image frame may include:
  • the inter-frame optical flow change amplitude value corresponding to the inter-frame optical flow change index of each image frame is calculated, and the inter-frame optical flow change amplitude value is determined as the motion amplitude value.
  • an intersection-over-union ratio of a segmentation result corresponding to each image frame is calculated, and the intersection-over-union ratio is determined as a motion amplitude value.
  • the indicator threshold can be determined according to the type of motion amplitude value.
  • the inter-frame difference may refer to a difference between respective pixel means of two image frames.
  • the optical flow variation value may refer to the difference between the optical flows of two or more image frames, and the optical flow floating threshold corresponding to each image frame may be calculated by the optical flow calculation formula.
  • the segmentation results between each image frame are calculated by the intersection-and-union ratio, which may refer to the ratio of the intersection and union between the segmentation results of the image frame and the segmentation results of the surrounding frames obtained by performing image segmentation processing on the image frame and its surrounding frames respectively. If the overlap between the two is high, the value of the intersection-and-union ratio is large, and if the overlap between the two is low, the value of the intersection-and-union ratio is small.
  • the inter-frame optical flow change amplitude value or the intersection-over-union ratio of the segmentation results a variety of methods can be applied to accurately calculate the motion amplitude value of each image frame.
  • determining a sub-segment to be labeled in a video to be labeled and obtaining a target sub-segment includes:
  • one video segment is selected in turn as a sub-segment to be marked to obtain a target sub-segment.
  • the segment sequence of each video sub-segment may be based on the segment sequence number corresponding to at least one video sub-segment.
  • the target sub-segment may be determined from at least one video sub-segment in sequence.
  • the marking scheme of the above embodiment may be executed until at least one video sub-segment is obtained.
  • the annotation results of all video sub-segments are obtained, and the annotation results of all video sub-segments are integrated to obtain the annotation result of the video to be annotated.
  • a segment number may be set for each video sub-segment obtained. For example, the segment number of the first obtained video sub-segment is 1, and the segment number of the second video sub-segment is 2.
  • the target sub-segment can be selected from the at least one video sub-segment in sequence according to the segment sequence corresponding to the at least one video sub-segment.
  • the acquisition of the target sub-segment using the segment sequence can ensure that the corresponding target sub-segment is obtained in sequence, and then the labeling of each target sub-segment is completed in sequence, so as to achieve the sequential and sequential labeling of the at least one video sub-segment, and improve the comprehensiveness of the labeling of the video sub-segments.
  • the technical solution of the present disclosure can also be applied to the field of games, specifically, for example, it can include application fields such as the design and display of three-dimensional game scenes.
  • FIG. 11 it is a schematic diagram of a structure of an embodiment of a video annotation device provided by an embodiment of the present disclosure.
  • the device may be located in an electronic device and may be configured with the above-mentioned video annotation method.
  • the video annotation device 1100 may include:
  • the first determining unit 1101 is used to determine a sub-segment to be marked in the video to be marked, and obtain a target sub-segment;
  • a first frame labeling unit 1102 is used to obtain a first frame labeling result corresponding to the first frame of the target sub-segment
  • the end frame marking unit 1103 is used to generate an end frame marking result corresponding to the end frame of the target sub-segment based on the first frame marking result;
  • the segment annotation unit 1104 is used to generate an annotation result of the middle frame of the target sub-segment according to the first frame annotation result and the last frame annotation result, so as to obtain an annotation result of the target sub-segment to be annotated;
  • the second determining unit 1105 is configured to generate a target annotation result of the video to be annotated based on the annotation result of the target sub-segment.
  • the target acquisition unit includes:
  • Key extraction module used to extract key frames of the video to be annotated
  • a segment obtaining module used for dividing a video interval surrounded by two adjacent key frames in the key frames into a video sub-segment, and obtaining at least one video sub-segment;
  • the target determination module is used to determine a target sub-segment to be marked from at least one video sub-segment.
  • the key extraction module includes:
  • the amplitude calculation submodule is used to calculate the motion amplitude value of each image frame in the video to be labeled
  • the key determination submodule is used to obtain at least one key frame in the image frame according to the motion amplitude value.
  • the tail frame marking unit may include:
  • a first frame acquisition module is used to obtain a first frame annotation result corresponding to the first frame of the target sub-segment
  • the tail frame generation module is used to determine the tail frame annotation result corresponding to the tail frame by using the forward propagation algorithm according to the first frame annotation result.
  • the tail frame generation module may include:
  • the label propagation submodule is used to sequentially propagate the labeling results of the first frame to the unlabeled image frames in the target sub-segment by using the forward propagation algorithm to obtain the labeling results of the unlabeled image frames in the target sub-segment;
  • the tail frame annotation submodule is used to obtain the annotation result of the last image frame of the target sub-segment as the tail frame annotation result corresponding to the tail frame.
  • the segment annotation unit includes:
  • the first extraction module is used to extract the forward propagation features of the intermediate frames of the target sub-segment based on the first frame annotation result and in combination with the forward propagation algorithm.
  • the second extraction module is used to extract the back-propagation features of the intermediate frames of the target sub-segment based on the tail frame label result and in combination with the back-propagation algorithm;
  • a feature fusion module is used to perform feature fusion processing on the forward propagation features and the backward propagation features to obtain the target image features of the intermediate frame;
  • the label determination module is used to determine the labeling result of the intermediate frame according to the target image features.
  • the feature fusion module may include:
  • a sequence number determination submodule used to determine the image sequence number of the intermediate frame in the target sub-segment
  • a ratio determination submodule used for determining a sequence number ratio according to an image sequence number
  • a weight determination submodule used to determine the forward propagation weight and the backward propagation weight according to the sequence number ratio
  • the feature weighting submodule is used to obtain the target image features of the intermediate frame according to the forward propagation weight, the backward propagation weight, the forward propagation features and the backward propagation features.
  • the first frame marking unit may include:
  • the first frame annotation module is used to detect the annotation operation performed by the user on the first frame and obtain the annotation operation The corresponding first frame annotation result;
  • the first frame determination module is used to obtain the previous video sub-segment of the target sub-segment, and determine the last frame marking result corresponding to the last frame of the previous video sub-segment as the first frame marking result corresponding to the first frame of the target sub-segment.
  • the first determining unit may include:
  • a sequence determination module used to determine a sequence of segments corresponding to at least one video sub-segment according to a time sequence of at least one video sub-segment
  • the segment traversal module is used to select one video segment as a sub-segment to be marked in sequence starting from the first video segment according to the segment sequence corresponding to at least one video sub-segment, so as to obtain a target sub-segment.
  • the device provided in this embodiment can be used to execute the technical solution of the above method embodiment. Its implementation principle and technical effect are similar, and this embodiment will not be repeated here.
  • the embodiment of the present disclosure also provides an electronic device.
  • FIG. 12 it shows a schematic diagram of the structure of an electronic device 1200 suitable for implementing the embodiment of the present disclosure
  • the electronic device 1200 may be a terminal device or a server.
  • the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (PMPs), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • PDAs personal digital assistants
  • PADs Portable Android Devices
  • PMPs portable multimedia players
  • vehicle terminals such as vehicle navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 12 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.
  • the electronic device 1200 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 1201, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 1202 or a program loaded from a storage device 1208 to a random access memory (RAM) 1203.
  • a processing device 1201 e.g., a central processing unit, a graphics processing unit, etc.
  • RAM random access memory
  • Various programs and data required for the operation of the electronic device 1200 are also stored in the RAM 1203.
  • the processing device 1201, the ROM 1202, and the RAM 1203 are connected to each other via a bus 1204.
  • An input/output (I/O) interface 1205 is also connected to the bus 1204.
  • the following devices can be connected to the I/O interface 1205: input devices 1206 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; Output device 1207 such as a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage device 1208 such as a magnetic tape, a hard disk, etc.; and communication device 1209.
  • the communication device 1209 can allow the electronic device 1200 to communicate with other devices wirelessly or by wire to exchange data.
  • FIG. 12 shows an electronic device 1200 with various devices, it should be understood that it is not required to implement or have all the devices shown. More or fewer devices may be implemented or provided instead.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 1209, or installed from the storage device 1208, or installed from the ROM 1202.
  • the processing device 1201 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried.
  • This propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
  • Computer readable signal media may also be any computer readable medium other than computer readable storage media, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any of the above. The right combination.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device executes the method shown in the above embodiment.
  • the present disclosure also provides a computer-readable storage medium, in which computer-executable instructions are stored.
  • a processor executes the computer-executable instructions, the video annotation method provided in any of the above embodiments is implemented.
  • the present disclosure also provides a computer program product, including a computer program, where the computer program is executed by a processor to configure the video annotation method provided by any of the above embodiments.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including image-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer via any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or hardware.
  • the name of a unit does not limit the unit itself in some cases.
  • the first acquisition unit may also be described as a "unit for acquiring at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chip
  • CPLDs complex programmable logic devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供一种视频标注方法、装置、设备、介质及产品,该方法包括:确定待标注视频中待标注的子片段,获得目标子片段;获取所述目标子片段的首帧对应的首帧标注结果;基于所述首帧标注结果,生成所述目标子片段的尾帧对应的尾帧标注结果;根据所述首帧标注结果和所述尾帧标注结果,生成所述目标子片段中间帧的标注结果,以获得所述待标注的目标子片段的标注结果;基于所述目标子片段的标注结果,生成所述待标注视频的目标标注结果。本公开的技术方案提高了视频标注效率。

Description

视频标注方法、装置、设备、介质及产品
本申请要求2022年11月15日递交的申请号为202211430306.1、标题为“视频标注方法、装置、设备、介质及产品”的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及计算机技术领域,尤其涉及一种视频标注方法、装置、设备、介质及产品。
背景技术
视频处理可以应用于诸多技术领域,例如,人工智能、智能交通、金融、内容推荐等多种技术领域,其具体涉及到的技术例如可以包括目标追踪、目标检测等。
相关技术中,视频的标注一般采用人工逐帧进行标注。但是,采用人工标注的方式,标注效率较低,标注成本过高。
发明内容
本公开实施例提供一种视频标注方法、装置、设备、介质及产品,以克服采用人工标注的方式,标注效率较低,标注成本过高的技术问题。
第一方面,本公开实施例提供一种视频标注方法,包括:
确定待标注视频中待标注的子片段,获得目标子片段;
获取所述目标子片段的首帧对应的首帧标注结果;
基于所述首帧标注结果,生成所述目标子片段的尾帧对应的尾帧标注结果;
根据所述首帧标注结果和所述尾帧标注结果,生成所述目标子片段中间帧的标注结果,以获得所述待标注的目标子片段的标注结果;
基于所述目标子片段的标注结果,生成所述待标注视频的目标标注结果。
第二方面,本公开实施例提供一种视频标注装置,包括:
第一确定单元,用于确定待标注视频中待标注的子片段,获得目标子片段;
首帧标注单元,用于获取所述目标子片段的首帧对应的首帧标注结果;
尾帧标注单元,用于基于所述首帧标注结果,生成所述目标子片段的尾帧对应的尾帧标注结果;
片段标注单元,用于根据所述首帧标注结果和所述尾帧标注结果,生成所述目标子片段中间帧的标注结果,以获得所述待标注的目标子片段的标注结果;
第二确定单元,用于基于所述目标子片段的标注结果,生成所述待标注视频的目标标注结果。
第三方面,本公开实施例提供一种电子设备,包括:处理器、存储器;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器配置有如上第一方面以及第一方面各种可能的设计所述的视频标注方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的视频标注方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的视频标注方法。
本实施例提供的技术方案,针对待标注视频,可以从片段维度,确定待标注的目标子片段。对目标子片段进行详细标注时,可以先获取目标子片段的首帧对应的首帧标注结果,再基于首帧标注结果,生成目标子片段的尾帧对应的尾帧标注结果,可以利用首帧标注结果和尾帧标注结果对目标子片段中的中间帧进行标注,以实现对目标子片段的中间帧的标注,获得目标子片段的标注结果。尾帧可通过首帧自动标注获得,而中间帧可以通过首帧标注结果和尾帧标注结果自动标注获得,实现中间帧的高效标注。获得目标子片段的标注结果之后,可以确定待标注视频的目标标注结果,通过时间维度更小的片段标注,可以提高片段标注准确性,相比于直接对待标注视频进行标 注,效率更高,准确度更高。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种视频标注方法的一个应用示例图;
图2为本公开实施例提供的一种视频标注方法的一个实施例的流程图;
图3为本公开实施例提供的一种视频标注方法的又一个实施例的流程图;
图4为本公开实施例提供的一个特征传播示例图;
图5为本公开实施例提供的一种视频标注方法的又一个实施例的流程图;
图6为本公开实施例提供的一个首帧标注结果的更新示例图;
图7为本公开实施例提供的一种视频标注方法的又一个实施例的流程图;
图8为本公开实施例提供的一种视频标注方法的又一个实施例的流程图;
图9为本公开实施例提供的一个视频子片段的划分示例图;
图10为本公开实施例提供的一个关键帧的提取示例图;
图11为本公开实施例提供的一种视频标注装置的一个实施例的结构示意图;
图12为本公开实施例提供的一种电子设备的硬件结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开的技术方案可以应用于视频标注场景中,通过获取首帧标注结果,并通过首帧标注结果自动标注尾帧,通过首帧标注结果和尾帧标注结果的获取可以对图像帧中的其他图像帧进行自动标注,提高视频的标注效率。
相关技术中,视频处理模型的训练需要大量的视频样本。视频样本可以包括视频本身以及视频的标签。视频的标签一般可以指视频中各图像帧的标签,各图像帧,也即视频中各图像帧的标注结果一般是人工标注获得。逐帧实现人工标注一般需要大量人工完成,标注效率较低,标注成本较高。
为了解决人工标注成本过高的问题,本公开考虑到自动完成对图像的标注。而图像的自动标注一般需要图像的区域识别模型,如果直接通过区域识别模型,获得的标注结果也不够准确。为了获得准确的标注结果,可以采用手动标注部分图像,再利用手动标注的图像,采用半监督标注方式,对剩余图像进行标注。通过此方式标注的图像准确度更高,标注效率也大大提升。
下面将以具体实施例对本公开的技术方案以及本公开的技术方案如何解决上述技术问题进行详细说明。下面几个具体实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图对本发明的实施例进行详细描述。
图1为本公开实施例提供的一种视频标注方法的一个应用示例图,该视频标注方法可以应用于电子设备1中,电子设备1可以包括显示装置2。显示装置2可以显示待标注视频。待标注视频可以基于多个关键帧被划分为至少一个视频子片段。根据本公开的技术方案,可以对待标注视频按照各视频子片段进行标注,例如对目标子片段3进行片段标注,电子设备1可以在显示装置2中显示目标子片段3中任意图像4的片段标注结果,片段标注结果例如可以为图1中的车辆所在区域5,图像中的其他类型的对象,例如路灯6即可以不进行标注,获得该图像4的片段标注结果。其中,为了便于理解,图1所示的车辆区域5采用矩形框标注,该标注方式仅仅是示例性的,并不应构成对标注方式以及标注种类的具体限定,在实际应用中,还可以采用被标注对象的轮廓、圆形、多边形等其他形状进行标注。片段标注结果确定之后,可以利用标注的目标子片段,确定待标注视频的目标标注结果。
如图2所示,为本公开实施例提供的一种视频标注方法的一个实施例的流程图,该视频标注方法可以配置为一视频标注装置,视频标注装置可以位于电子设备中,视频标注方法可以包括以下几个步骤:
201:确定待标注视频中待标注的子片段,获得目标子片段。
可选地,确定待标注视频中待标注的子片段,获得目标子片段之前,还 可以包括:响应于视频标注请求,获取待标注视频。
目标子片段可以为待标注视频中至少一个视频子片段中待标注的子片段。可以将待标注视频划分为至少一个视频子片段,至少一个视频子片段可以通过待标注视频片段划分获得。
202:获取目标子片段的首帧对应的首帧标注结果。
首帧可以为目标子片段的第一个图像,也可以为目标子片段的任意图像。
首帧标注结果可以通过人工标注获得,也可以通过图像标注模型提取获得。为了提高首帧标注效率,还可以先通过图像标注模型自动标注,之后通过人工对图像标注模型的标注结果进行修正,获得最终的首帧标注结果。
尾帧可以为目标子片段的最后一个图像。
203:基于首帧标注结果,生成目标子片段的尾帧对应的尾帧标注结果。
尾帧标注结果可以通过目标子片段的第一个图像的标注结果结合半监督标注算法提取获得。半监督标注算法可以采用前向传播方式,将目标子片段的第一个图像的标注结果传播至尾帧,获得尾帧的尾帧标注结果。
204:根据首帧标注结果和尾帧标注结果,生成目标子片段中间帧的标注结果,以获得待标注的目标子片段的标注结果。
中间帧可以包括目标子片段中未标注的图像帧,中间帧可以通过首帧标注结果和尾帧标注结果标注获得。目标子片段中可以包括多个图像或者图像帧,每个图像均可以进行标注,获得每个图像的标注结果。目标子片段中的多个图像帧均标注结束,可以获得目标子片段的多个图像帧分别对应的标注结果所构成目标子片段的片段标注结果。
204:基于目标子片段的标注结果,生成待标注视频的目标标注结果。
待标注视频可以包括至少一个视频子片段,每个视频子片段标注过程中可以称为目标子片段,标注结束即可以获得目标子片段的标注结果。待标注视频的目标标注结果可以包括多个视频子片段分别对应的标注结果。
本公开实施例中,针对待标注视频,可以从片段维度,确定待标注的片段,获得目标子片段。对目标子片段的标注中,可以先获取目标子片段的首帧对应的首帧标注结果,并通过首帧的首帧标注结果生成尾帧对应的尾帧标注结果,可以利用首帧标注结果和尾帧标注结果对目标子片段中的中间帧分别进行标注,以实现对目标子片段的自动标注,获得目标子片段的标注结果。 目标子片段中各图像可以通过其首帧标注结果和尾帧标注结果自动标注获得,获得高效的标注效果。获得目标子片段的片段标注结果之后,可以确定待标注视频的目标标注结果,通过时间维度更小的片段标注,可以提高片段标注效率,相比于直接对待标注视频进行标注,准确度更高。
在一般情况下,可以采用人工标注方式获得尾帧的尾帧标注结果。但是,为了提高尾帧的标注效率,可以利用前向传播算法确定尾帧的尾帧标注结果。
如图3所示,为本公开实施例提供的一种图像标注方法的一个实施例的流程图,该方法与上述实施例的不同之处在于,基于首帧的标注结果生成目标子片段尾帧的标注结果,包括:
301:获取目标子片段的首帧对应的首帧标注结果。
302:根据首帧标注结果,利用前向传播算法确定尾帧对应的尾帧标注结果。
本公开实施例中,可以根据首帧标注结果并结合前向传播算法,自动确定尾帧的尾帧标注结果。通过自动确定尾帧的尾帧标注结果,可以有效提升尾帧的标注效率。
在一种可能的设计中,根据首帧标注结果,利用前向传播算法确定尾帧对应的尾帧标注结果,包括:
利用前向传播算法,将首帧标注结果向目标子片段中未标注的图像帧进行标注结果的顺序传播,获得目标子片段中未标注图像帧的标注结果;
获取目标子片段的最后一个图像帧的标注结果作为尾帧对应的尾帧标注结果。
本公开实施例中,利用前向传播算法,将首帧标注结果向目标子片段中未标注的图像帧进行标注结果的顺序传播,获得未标注图像帧的标注结果,通过前向传播算法将首帧标注结果向未标注的图像帧进行传播,直至传播至目标视频片段的最后一个图像帧,获得尾帧对应的尾帧标注结果,通过标注结果的传播,使得尾帧的标注是不断传播获得,使得尾帧的标注参考到其附近,例如尾帧的前一个图像帧的标注结果,提高尾帧标注效率和准确性。
在实际应用中,可以采用双向传播方式,对目标子片段的中间帧进行标注处理。对于不同位置的中间帧,可以按照中间帧分别与首帧和尾帧的位置差异对图像自动标注,以提高图像的标注精度。
因此,如图4所示,为本公开实施例提供的一种视频标注方法的又一个实施例的流程图,与前述实施例的不同之处在于,根据首帧标注结果和尾帧标注结果,生成目标子片段中间帧的标注结果,可以包括:
401:基于首帧标注结果,结合前向传播算法,提取目标子片段的中间帧的前向传播特征。
可选地,前向传播算法可以包括机器学习算法、神经网络算法等算法,可以通过训练获得。前向传播算法可以用于将首帧的首帧标注结果向位于首帧之后的中间帧进行特征传播,获得中间帧的前向传播特征。
目标子片段可以包括N个图像帧,每个图像帧均可以作为中间帧进行标注。N为大于1的正整数。可以先对首帧和尾帧进行标注,之后可以从目标子片段中的第二个图像帧开始,依次将每个图像帧作为中间帧,获得每个中间帧的标注结果,直至获得目标子片段的尾帧的前一个图像的标注结果,此时目标子片段标注结束。
其中,前向传播特征可以指从首帧开始,将首帧的标签逐帧向位于其之后的其他图像进行特征传播,传播至图像序号对应的图像即停止传播,获得的图像特征。首帧标注结果作为特征传播mask(掩码)参与特征计算。具体可以使用下述实施例中的半监督分割算法进行特征传输。
402:基于尾帧标签结果,结合后向传播算法,提取目标子片段的中间帧的后向传播特征。
可选地,后向传播算法可以包括机器学习算法、神经网络算法等算法,可以通过训练获得。后向传播算法可以将尾帧标注结果向尾帧之前的中间帧进行传播,获得中间帧的后向传播特征。
其中,后向传播特征可以指从尾帧开始,将尾帧的标签逐帧向位于其之前的其他图像进行特征传播,传播至图像序号对应的图像即停止传播,获得的图像特征。同样,尾帧标注结果也可以作为特征传播掩码参与特征计算。
403:将前向传播特征和后向传播特征进行特征融合处理,获得中间帧的目标图像特征。
404:根据目标图像特征,确定中间帧的标注结果。
目标子片段可以包括一个或多个中间帧,每个中间帧均可以进行标注,获得各中间帧的标注结果。目标子片段的片段标注结果可以包括多个中间帧各自的标注结果。
本公开实施例中,利用前向传播算法可以获得中间帧的前向传播特征,利用后向传播算法可以获得中间帧的后向传播特征。前向传播特征和后向传播特征的融合可以使得目标图像特征融合了首帧标注结果和尾帧标注结果,通过目标图像特征可以更好地表征中间帧的标注特征,提高中间帧的标注精度和准确性。
作为一个实施例,前向传播特征的提取步骤可以包括:可以根据首帧标注结果和图像序号,利用前向传播算法,确定中间帧的前向传播特征。后向传播特征的提取步骤可以包括:根据尾帧标注结果和图像序号,利用后向传播算法,确定中间帧的后向传播特征。
在实际应用中,对图像打标可以根据实际使用需求的不同设置不同的类别。在同一次达标中可以一次打多个类型的标签,例如,在自然图像处理场景中,可以对视频中的车辆、行人进行目标追踪,因此,车辆和行人可以作为两个标签类别,以分别进行打标。在图像特征提取过程中,为了更好地表征不同类别的标签,各类别的标签不受其他类别的影响,可以为各标签类别分别生成标签特征。标签特征的元素可以代表各像素点属于该标签类别的概率。对于同一个坐标,该坐标的元素值具体可以包括该坐标在至少一个标签类别分别对应的概率值,概率值最大的标签类别所代表的标签即为该坐标的标签。
在一种可能的设计中,获得前向传播特征和后向传播特征之后,可以对前向传播特征和后向传播特征进行特征融合处理,获得中间帧的目标图像特征;根据目标图像特征通过特征识别即可以确定中间帧的标注结果。根据首帧标注结果和图像序号,确定中间帧的前向传播特征包括:根据首帧标注结果对首帧进行特征提取,获得首帧在至少一个标签类别分别对应的标签特征,将首帧对应的至少一个标签类别分别对应的标签特征向后传播,获得中间帧在至少一个标签类别分别对应的前向标签特征,以获得至少一个标签类别分 别对应的前向传播特征。
可选地,根据尾帧标注结果和图像序号,确定中间帧的后向传播特征,包括:根据尾帧标对尾帧进行特征提取,获得尾帧在至少一个标签类别分别对应的标签特征,将尾帧对应的至少一个标签类别分别对应的标签特征向前传播,获得中间帧的至少一个标签类别分别对应的后向标签特征,以获得至少一个标签类别分别对应的前向传播特征。
本公开实施例中,前向传播特征基于首帧标注结果和图像序号获得,使得前向传播特征综合了首帧标注结果和图像序号的特性。而后向传播特征基于尾帧标注结果和图像序号获得,综合了尾帧标注结果和图像序号的特性。前向传播特征和后向传播特征分别为图像特征从首帧传播和从尾帧传播获得的结果。利用前向传播特征和后向传播特征进行特征融合处理,获得中间帧的目标图像特征。目标图像特征综合了前向和后向两个方向的传播特性,利用目标图像特征获得的标注结果更准确,可以提高中间帧的标注效率和准确性。
在一种可能的设计中,前向传播算法可以包括:半监督分割算法。
后向传播算法可以包括:半监督分割算法。
可以根据首帧标注结果,利用半监督分割算法对目标子片段从首帧开始逐帧进行前向特征传播,直至获得图像序号处的前向传播特征。可以根据尾帧标注结果,利用半监督分割算法对目标子片段从尾帧开始逐帧进行后向特征传播处理,直至获得图像序号处的后向传播特征。
其中,半监督分割算法具体可以为半监督物体分割算法。可以通过半监督分割算法对目标子片段从首帧或尾帧开始,利用前一帧的图像特征计算当前帧的图像特征。直至获得图像序号对应的前向或后向传播特征。
本公开实施例中,可以通过半监督分割算法,对目标子片段从首帧开始逐帧进行前向特征传播,直至获得图像序号位置处的前向传播特征。以半监督的分割算法,可以完成图像特征的前向传播,使得对应的计算获得的前向传播特征综合首帧及其之前的图像的前向特征,特征的表现度更高通过半监督分割算法还可以从尾帧开始传播,也即从尾帧开始逐帧进行后向特征传播处理,直至获得图像序号处的后向传播特征。通过半监督分割算法,可以对图像特征进行前向或后向的传播,提高图像特征的计算准确度。
在获得前向传播特征和后向传播特征之后,可以根据前向传播特征和后向传播特征进行特征的融合计算,以使得中间帧的图像特征综合前向和后向两个方面的特征。在某些实施例中,将前向传播特征和后向传播特征进行特征融合处理,获得中间帧的目标图像特征,包括:
确定中间帧在目标子片段中的图像序号;
根据图像序号确定序号比值;
根据序号比值,确定前向传播权重和后向传播权重;
根据前向传播权重、后向传播权重、前向传播特征和后向传播特征,获得中间帧的目标图像特征。
可选地,中间帧的图像序号可以指中间帧在目标子片段的在目标子片段的出现顺序。例如,目标子片段中的第一个图像的图像序号可以为1,第二个出现的图形的图像序号可以为2。通过图像序号可以确定中间帧在目标子片段中的位置。每个图像帧可以按照其标注顺序,确定相应的图像序号,例如首帧的图像序号可以为1,尾帧的图像序号可以为N+1。
本公开实施例中,可以确定目标子片段中的中间帧和中间帧的图像序号,中间帧的序号可以代表其与首帧和尾帧的位置关系。通过首帧标注结果和尾帧标注结果并结合中间帧的图像序号,可以确定中间帧的标注结果。使得中间帧的标注效果与中间帧在目标子片段中的位置关联,提高标注准确性。
作为一个实施例,根据图像序号确定序号比值,可以包括:
计算中间帧的图像序号和目标子片段的尾帧对应的尾帧序号的序号比值。
作为又一个实施例,根据前向传播权重、后向传播权重、前向传播特征和后向传播特征,获得中间帧的目标图像特征可以包括:
根据前向传播权重和后向传播权重,对前向传播特征和后向传播特征进行特征融合处理加权求和,获得中间帧的目标图像特征。
图像序号为K,尾帧序号为N,则序号比值为K/N。根据序号比值,确定前向传播权重和后向传播权重,可以包括:确定序号比值K/N为后向传播权重,确定整数1和序号比值的差,也即1-K/N,为前向传播权重。目标图像特征的加权求和步骤可以包括:
计算前向传播权重:1-K/N与前向传播特征Fforward的乘积,获得第一特 征;计算后向传播权重:K/N与后向传播特征Fbackward的乘积,获得第二特征;将第一特征和第二特征相加获得目标图像特征Fcurrent
可选地,前向传播特征可以包括至少一个标签类别分别对应的前向标签特征。后向传播特征可以包括至少一个标签类别分别对应的后向标签特征。根据前向传播权重和后向传播权重,将各标签类别的前向标签特征和后向标签特征加权求和,获得各标签类别分别对应的融合特征。而各标签类别分别对应的融合特征即为目标图像特征。
每个标签类别的前向标签特征和后向标签特征的加权求和可以包括:对于每个标签类别的前向标签特征和后向标签特征,将各像素坐标在前向标签特征的第一特征值和前向传播权重相乘,将在后向标签特征的第二取值和后向传播权重相乘,将两个乘积相加,获得各像素坐标在该标签类别的特征值,也即,获得该标签类别在各像素坐标的特征值。
目标图像特征可以表征为中间帧的各像素坐标在不同标签类别的特征值。
为了便于理解,如图5所示的特征传播示例图,假设首帧501的首帧标注结果为5011,尾帧502的尾帧标注结果为5021。其中,首帧501的首帧标注结果5011前向传播对应的前向传播特征,尾帧502的尾帧标注结果5021对应的后向传播特征。中间帧503可以基于其序号对前向传播特征和后向传播特征进行特征融合,获得相应的目标图像特征。目标图像特征经图像分类层识别可以获得中间帧的目标区域5031。该目标区域5031即可以为中间帧的标注结果。
本公开实施例中,可以根据图像序号对图像与前向传播的特征和后向传播的特征的关联度进行计算,也即计算图像序号对应的序号比值。该序号比值可以用于确定前向传播权重和后向传播权重。通过前向传播和后向传播的相关特性的计算,可以对图像的传播效率准确提升,提高图像特征传播的准确度。
作为一个实施例,根据目标图像特征,确定中间帧的标注结果可以包括:
根据图像分类层,识别目标图像特征的目标区域;
以目标区域作为中间帧的标注结果。
可选地,根据图像分类层,识别目标图像特征的目标区域,可以包括: 确定目标图像特征中中间帧的各像素坐标在至少一个标签类别分别对应的特征值,获取各像素坐标在至少一个标签类别分别对应的特征值中的最大特征值,以获得各像素坐标的最大特征值。根据各像素坐标的最大特征值对应的标签类别,确定各标签类别对应的目标像素坐标,确定各标签类别的目标像素坐标所形成的标签区域,获得至少一个标签类别分别对应的标签区域构成的目标区域。也即,至少一个标签类别分别对应的标签区域可以为中间帧的标注结果。图像分类层可以为对图像特征进行特征分类的数学模型。
本公开实施例中,确定中间帧的标注结果之后,可以根据图像分类层识别目标图像特征的目标区域,以该目标区域作为中间帧的标注结果。通过图像分类层的使用可以对目标图像特征进行准确的标签提取。
如图6所示,为本公开实施例提供的一种图像标注方法的又一个实施例的流程图,与前述实施例的不同之处在于,在确定中间帧的标注结果之后,还包括:
601:输出中间帧的标注结果。
标注结果可以包括至少一个标签类别分别对应的标签区域。
602:检测用户针对中间帧的标注结果执行的标签确认操作,维持中间帧的标注结果不变。
603:检测用户针对中间帧的标注结果执行的标签修改操作,获得中间帧修改后的标注结果。
可以同时输出中间帧和其标注结果,对中间帧的自动标注结果进行输出,供用户查看。
本公开实施例中,输出中间帧的标注结果之后,用户可以查看中间帧的标注结果,对标注结果的打标效果进行查看,若打标不合格,可以对标注结果进行修改,若打标合格,可以直接确定中间帧的标注结果。通过与用户交互显示,可以使得中间帧的标注结果与用户的标注需求更匹配,标注准确度更高。
作为一个实施例,获取目标子片段的首帧对应的首帧标注结果,可以包括:
检测用户针对首帧执行的标注操作,获得标注操作对应的首帧标注结果。
或,获取目标子片段的前一个视频子片段,并确定前一个视频子片段的尾帧对应的尾帧标注结果为目标子片段的首帧对应的首帧标注结果。
可选地,在首帧为目标子片段的第一个图像且目标子片段为待标注视频的第一个视频子片段,可以检测用户针对目标子片段的首帧执行的标签设置操作,获得设置结束时的首帧标注结果。或者,目标子片段不为第一个视频子片段时,获取目标子片段的前一个视频子片段的尾帧的尾帧标注结果作为目标子片段的首帧对应的首帧标注结果。
本公开实施例中,通过检测用户针对首帧执行的标注操作,可以获得标注操作对应的首帧标注结果,可以获得与用户标注需求更匹配的首帧标注结果,或者还可以将前一个视频子片的尾帧标注结果作为首帧的标注结果,可以提高首帧标注效率。
作为又一个实施例,首帧对应的首帧标注结果的获取方式除上述实施例提供的技术方案之外,目标子片段的首帧以及首帧对应的首帧标注结果,还可以通过下列方式获得:
若用户针对中间帧的标注结果执行的标签修改操作,更新修改标注结果后的中间帧为首帧;
将中间帧修改后的标注结果作为首帧标注结果。
如图7所示,为本公开实施例提供的一种图像帧的标注提示示例图。参考图7,在获得中间帧701的标注结果7011之后,若检测到用户针对中间帧修改其标注结果例如修改为标注结果7012,可以将中间帧701作为首帧。而原首帧702则可以不再作为首帧。当然,图7的图像帧的标注提示仅仅是示例性的,并不具备限定作用。
本公开实施例中,在用户对中间帧执行标签修改操作时,可以说明标签的传播精度降低,与用户的实际标注需求匹配度较低。将标签修改后的中间帧作为首帧,中间帧修改后的图像标注作为首帧标注结果,可以提供更有效的图像传播,提高图像的传播效率和准确度。
为了获得准确的视频子片段,如图8所示,为本公开实施例提供的一种视频标注方法的又一个实施例的流程图,与前述实施例的不同之处在于,确定待标注视频中待标注的子片段,获得目标子片段,包括:
801:提取待标注视频的关键帧。
802:将关键帧中相邻的两个关键帧在待标注视频所围成的视频区间划分为一个视频子片段,获得至少一个视频子片段。
803:从至少一个视频子片段中确定待标注的目标子片段。
可选地,可以以组的方式对待标注视频的关键帧进行分组,相邻的两个关键帧可以作为一组,可以从至少一个关键帧中确定至少一组关键帧。一组关键帧包括相邻的第一关键帧和第二关键帧,第一关键帧位于第二关键帧之前,前一组关键帧的第二关键帧与后一组关键帧的第一关键帧相同。两个相邻的关键帧所围成的视频区间可以作为视频子片段,也即视频子片段可以包括两个关键帧,以及两个关键帧之间的中间帧,当然中间帧可以按照预设采样频率采样获得。
其中,关键帧可以为与待标注视频中在其附近的图像差异较大的图像。例如,在t1时间的图像不存在车辆,在t2时间的图像出现车辆,t1和t2的时间差在时间约束内,则确定t2时间的图像为关键帧。
为了便于理解,图9所为本公开实施例提供的一个视频子片段的划分示例图。参考图9,待标注视频的关键帧分别为关键帧1、关键帧2、关键帧4以及关键帧6。可以将相邻的两个关键帧作为一组。
其中,关键帧1和关键帧2可以作为一组相邻的关键帧,该组相邻的关键帧之间所围成的图像帧可以为视频子片段1。视频子片段1可以由关键帧1、关键帧2以及关键帧1和2之间的图像帧3组成。
关键帧2和关键帧4可以作为一组相邻的关键帧,该组相邻的关键帧之间所围成的图像帧可以为视频子片段2。视频子片段2可以由关键帧2、关键帧4以及关键帧2和4之间的图像帧5组成。
关键帧4和关键帧6可以作为一组相邻的关键帧,该组相邻的关键帧之间所围成的图像帧可以为视频子片段3。视频子片段3可以由关键帧4、关键帧6以及关键帧4和6之间的图像帧7组成。
相邻两组关键帧存在关键帧重叠,参考图9,关键帧2可以为视频子片段1的尾帧,关键帧2还可以为视频子片段2的首帧。关键帧4可以为视频子片段2的尾帧,可以为视频子片段3的首帧。通过此关键帧的提取方式,各个关键帧的提取
本公开实施例中,通过提取待标注视频的关键帧,可以基于关键帧完成 相邻两个关键帧的获取。而相邻的两个关键帧在待标注视频中所围成的视频区间可以为一个视频子片段,进而获得待标注视频对应的至少一个视频子片段,使得至少一个视频子片段中相邻的两个视频子片段的前一个视频子片段的最后一帧与后一个视频子片段的第一帧相同,完成待标注视频的全面而准确的分割,使得至少一个视频子片段的分割效率更高。
在某些实施例中,可以从待标注视频中,按照关键帧提取频率提取至少一个关键帧;或者,从待标注视频中,提取满足图像变化条件的至少一个关键帧。
关键帧提取频率可以根据使用需求设置,可以是预先设置获得的。关键帧提取频率的单位为帧/次。每间隔特征提取频率个图像帧,提取一个关键帧。例如,关键帧提取频率为10时,可以每10帧提取一个关键帧,第1帧、第11帧均可以为关键帧。
在一种可能的设计中,提取待标注视频的至少一个关键帧,包括:
针对待标注视频中的图像帧,计算各图像帧的运动幅度值;
根据运动幅度值获得图像帧中的至少一个关键帧。
图像变化条件可以包括:图像帧的运动幅度值大于指标阈值。
可选地,根据运动幅度值获得图像帧中的至少一个关键帧,可以包括;
若任意图像帧的运动幅度值大于指标阈值,则确定图像帧为关键帧,以获得多个图像帧中的至少一个关键帧。
运动幅度值可以指图像帧与其周围帧的幅度差异。可以将图像帧的幅度值与其周围帧的幅度值进行差值计算,获得运动幅度值。若运动幅度值大于指标阈值,则说明图像帧与周围帧差异较大,该图像帧可以作为关键帧。
为了便于理解,如图10,为本公开实施例提供的一个关键帧的提取示例图。以纵轴各图像帧的运动幅值,横轴为待标注视频中的各个图像帧序号为例,从第一个图像帧0开始各个图像帧的幅值不断变化,各个图像帧的幅值连线形成曲线1001。运动幅度值可以为各图像帧之间的幅值差异。由曲线1001可以的变化情况可以确定图像帧的幅值差异,也即前后运动幅值大于指标阈值的关键点1002对应的图像帧可以为关键帧。
本公开实施例中,可以针对待标注视频中的多个图像帧,计算各图像帧在运动幅度指标的指标数据,根据各图像帧的运动幅度可以进行关键帧的筛 选。关键帧可以用于获取视频子片段,以将运动幅度作为视频子片段的获取基础,使得同一视频子片段的运动幅度值作为划分基础,在进行图像自动标注时,可以有效提升图像的标注精度。
本公开实施例中,各图像帧的运动幅度值的计算步骤,可以包括:
计算各图像帧在帧间幅度差异指标对应的帧间差值,确定帧间差值为运动幅度值;
或者,计算各图像帧在帧间光流变化指标对应的帧间光流变化幅度值,确定帧间光流变化幅度值为运动幅度值。
或者,基于预训练的分割模型,计算各图像帧对应的分割结果的交并比,确定交并比为运动幅度值。
采用不同种类的运动幅度值,指标阈值可以根据运动幅度值的类型确定。
可选地,帧间差值可以指两个图像帧各自的像素均值的差值。
光流变化幅度值可以指两个或两个以上的图像帧的光流之间的差值,可以通过光流计算公式计算获得各图像帧对应的光流浮动阈值。各图像帧之间的分割结果进行交并比计算,交并比可以指图像帧和其周围帧分别进行图像分割处理,获得的图像帧的分割结果和其周围帧的分割结果之间的交集和并集的比值,如果二者重叠度较高,交并比的值较大,如果二者的重叠度较低,交并比的值较小。
本公开实施例中,通过计算图像帧对应的帧间差值、帧间光流变化幅度值或者分割结果的交并比,可以应用多种方式对各图像帧的运动幅度值进行准确计算。
作为一个实施例,确定待标注视频中待标注的子片段,获得目标子片段,包括:
根据至少一个视频子片段的时间先后顺序,确定至少一个视频子片段分别对应的片段顺序;
按照至少一个视频子片段分别对应的片段顺序,从第一个视频片段开始,依次选择一个视频片段作为待标注的子片段,获得目标子片段。
可选地,可以基于至少一个视频子片段分别对应的片段序号为各视频子片段的片段顺序。可以依次从至少一个视频子片段中确定目标子片段。在获得目标子片段之后可以执行上述实施例的标注方案,直至至少一个视频子片 段遍历结束,获得所有视频子片段的标注结果,将所有视频子片段的标注结果综合获得待标注视频的标注结果。
在视频进行片段分割时,每获得一个视频子片段,可以为该视频子片段设置片段序号。例如第一个获取的视频子片段的片段序号为1,第二个视频子片段的片段序号为2。
本公开实施例中,可以根据至少一个视频子片段分别对应的片段顺序,依次从至少一个视频子片段中选择目标子片段。利用片段顺序进行目标子片段的获取可以确保依次获得相应的目标子片段,进而依次完成各目标子片段的标注,实现对至少一个视频子片段的顺序、依次标注,提高视频子片段的标注全面性。
此外,本公开的技术方案还可以应用于游戏领域,具体例如可以包括三维游戏场景的设计、显示等应用领域。
如图11所示,为本公开实施例提供的一种视频标注装置的一个实施例的结构示意图,该装置可以位于电子设备中,可以配置有上述视频标注方法,该视频标注装置1100可以包括:
第一确定单元1101,用于确定待标注视频中待标注的子片段,获得目标子片段;
首帧标注单元1102,用于获取目标子片段的首帧对应的首帧标注结果;
尾帧标注单元1103,用于基于首帧标注结果,生成目标子片段的尾帧对应的尾帧标注结果;
片段标注单元1104,用于根据首帧标注结果和尾帧标注结果,生成目标子片段中间帧的标注结果,以获得待标注的目标子片段的标注结果;
第二确定单元1105,用于基于目标子片段的标注结果,生成待标注视频的目标标注结果。
作为一个实施例,目标获取单元,包括:
关键提取模块,用于提取待标注视频的关键帧;
片段获得模块,用于将关键帧中相邻的两个关键帧在待标注视频所围成的视频区间划分为一个视频子片段,获得至少一个视频子片段;
目标确定模块,用于从至少一个视频子片段中确定待标注的目标子片段。
在某些实施例中,关键提取模块,包括:
幅值计算子模块,用于针对待标注视频中的图像帧,计算各图像帧的运动幅度值;
关键确定子模块,用于根据运动幅度值获得图像帧中的至少一个关键帧。
作为一个实施例,尾帧标注单元,可以包括:
首帧获取模块,用于获取目标子片段的首帧对应的首帧标注结果;
尾帧生成模块,用于根据首帧标注结果,利用前向传播算法确定尾帧对应的尾帧标注结果。
在一种可能的设计中,尾帧生成模块,可以包括:
标签传播子模块,用于利用前向传播算法,将首帧标注结果向目标子片段中未标注的图像帧进行标注结果的顺序传播,获得目标子片段中未标注图像帧的标注结果;
尾帧标注子模块,用于获取目标子片段的最后一个图像帧的标注结果作为尾帧对应的尾帧标注结果。
作为又一个实施例,片段标注单元,包括:
第一提取模块,用于基于首帧标注结果,结合前向传播算法,提取目标子片段的中间帧的前向传播特征。
第二提取模块,用于基于尾帧标签结果,结合后向传播算法,提取目标子片段的中间帧的后向传播特征;
特征融合模块,用于将前向传播特征和后向传播特征进行特征融合处理,获得中间帧的目标图像特征;
标签确定模块,用于根据目标图像特征,确定中间帧的标注结果。
在某些实施例中,特征融合模块,可以包括:
序号确定子模块,用于确定中间帧在目标子片段中的图像序号;
比值确定子模块,用于根据图像序号确定序号比值;
权重确定子模块,用于根据序号比值,确定前向传播权重和后向传播权重;
特征加权子模块,用于根据前向传播权重、后向传播权重、前向传播特征和后向传播特征获得中间帧的目标图像特征。
作为一个实施例,首帧标注单元,可以包括:
首帧标注模块,用于检测用户针对首帧执行的标注操作,获得标注操作 对应的首帧标注结果;
或者,首帧确定模块,用于获取目标子片段的前一个视频子片段,并确定前一个视频子片段的尾帧对应的尾帧标注结果为目标子片段的首帧对应的首帧标注结果。
作为一个实施例,第一确定单元,可以包括:
顺序确定模块,用于根据至少一个视频子片段的时间先后顺序,确定至少一个视频子片段分别对应的片段顺序;
片段遍历模块,用于按照至少一个视频子片段分别对应的片段顺序,从第一个视频片段开始,依次选择一个视频片段作为待标注的子片段,获得目标子片段。
本实施例提供的装置,可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
为了实现上述实施例,本公开实施例还提供了一种电子设备。
参考图12,其示出了适于用来实现本公开实施例的电子设备1200的结构示意图,该电子设备1200可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图12示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图12所示,电子设备1200可以包括处理装置(例如中央处理器、图形处理器等)1201,其可以根据存储在只读存储器(Read Only Memory,简称ROM)1202中的程序或者从存储装置1208加载到随机访问存储器(Random Access Memory,简称RAM)1203中的程序而执行各种适当的动作和处理。在RAM 1203中,还存储有电子设备1200操作所需的各种程序和数据。处理装置1201、ROM 1202以及RAM 1203通过总线1204彼此相连。输入/输出(I/O)接口1205也连接至总线1204。
通常,以下装置可以连接至I/O接口1205:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1206;包括 例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置1207;包括例如磁带、硬盘等的存储装置1208;以及通信装置1209。通信装置1209可以允许电子设备1200与其他设备进行无线或有线通信以交换数据。虽然图12示出了具有各种装置的电子设备1200,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1209从网络上被下载和安装,或者从存储装置1208被安装,或者从ROM1202被安装。在该计算机程序被处理装置1201执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意 合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
本公开还提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当处理器执行计算机执行指令时,实现如上述任一实施例所提供的视频标注方法。
本公开还提供一种计算机程序产品,包括计算机程序,计算机程序被处理器执行,以配置上述任一实施例所提供的视频标注方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向图像的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上***(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行***、装置或设备使用或与指令执行***、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体***、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (13)

  1. 一种视频标注方法,其特征在于,包括:
    确定待标注视频中待标注的子片段,获得目标子片段;
    获取所述目标子片段的首帧对应的首帧标注结果;
    基于所述首帧标注结果,生成所述目标子片段的尾帧对应的尾帧标注结果;
    根据所述首帧标注结果和所述尾帧标注结果,生成所述目标子片段中间帧的标注结果,以获得所述待标注的目标子片段的标注结果;
    基于所述目标子片段的标注结果,生成所述待标注视频的目标标注结果。
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述待标注视频中待标注的子片段,获得目标子片段,包括:
    提取所述待标注视频的关键帧;
    将所述关键帧中相邻的两个关键帧在所述待标注视频所围成的视频区间划分为一个视频子片段,获得至少一个视频子片段;
    从所述至少一个视频子片段中确定待标注的所述目标子片段。
  3. 根据权利要求2所述的方法,其特征在于,所述提取待标注视频的至少一个关键帧,包括:
    针对所述待标注视频中的图像帧,计算各图像帧的运动幅度值;
    根据所述运动幅度值获得所述图像帧中的至少一个关键帧。
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述首帧的标注结果生成所述目标子片段尾帧的标注结果,包括:
    获取所述目标子片段的首帧对应的首帧标注结果;
    根据所述首帧标注结果,利用前向传播算法确定所述尾帧对应的尾帧标注结果。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述首帧标注结果,利用前向传播算法确定所述尾帧对应的尾帧标注结果,包括:
    利用所述前向传播算法,将所述首帧标注结果向所述目标子片段中未标注的图像帧进行标注结果的顺序传播,获得所述目标子片段中未标注图像帧的标注结果;
    获取所述目标子片段的最后一个图像帧的标注结果作为所述尾帧对应的 尾帧标注结果。
  6. 根据权利要求1所述的方法,其特征在于,所述根据所述首帧标注结果和所述尾帧标注结果,生成所述目标子片段中间帧的标注结果,包括:
    基于所述首帧标注结果,结合前向传播算法,提取所述目标子片段的中间帧的前向传播特征;
    基于所述尾帧标签结果,结合后向传播算法,提取所述目标子片段的中间帧的后向传播特征;
    将所述前向传播特征和所述后向传播特征进行特征融合处理,获得所述中间帧的目标图像特征;
    根据所述目标图像特征,确定所述中间帧的标注结果。
  7. 根据权利要求6所述的方法,其特征在于,所述将所述前向传播特征和所述后向传播特征进行特征融合处理,获得所述中间帧的目标图像特征,包括:
    确定所述中间帧在所述目标子片段中的图像序号;
    根据所述图像序号确定序号比值;
    根据所述序号比值,确定前向传播权重和后向传播权重;
    根据所述前向传播权重、所述后向传播权重、前向传播特征和所述后向传播特征获得所述中间帧的目标图像特征。
  8. 根据权利要求1所述的方法,其特征在于,所述获取所述目标子片段的首帧对应的首帧标注结果,包括:
    检测用户针对所述首帧执行的标注操作,获得所述标注操作对应的首帧标注结果;
    或,获取所述目标子片段的前一个视频子片段,并确定所述前一个视频子片段的尾帧对应的尾帧标注结果为所述目标子片段的首帧对应的首帧标注结果。
  9. 根据权利要求1所述的方法,其特征在于,所述确定待标注视频中待标注的子片段,获得目标子片段,包括:
    根据至少一个视频子片段的时间先后顺序,确定至少一个视频子片段分别对应的片段顺序;
    按照至少一个视频子片段分别对应的片段顺序,从第一个视频片段开始, 依次选择一个视频片段作为待标注的子片段,获得所述目标子片段。
  10. 一种视频标注装置,其特征在于,包括:
    第一确定单元,用于确定待标注视频中待标注的子片段,获得目标子片段;
    首帧标注单元,用于获取所述目标子片段的首帧对应的首帧标注结果;
    尾帧标注单元,用于基于所述首帧标注结果,生成所述目标子片段的尾帧对应的尾帧标注结果;
    片段标注单元,用于根据所述首帧标注结果和所述尾帧标注结果,生成所述目标子片段中间帧的标注结果,以获得所述待标注的目标子片段的标注结果;
    第二确定单元,用于基于所述目标子片段的标注结果,生成所述待标注视频的目标标注结果。
  11. 一种电子设备,其特征在于,包括:处理器、存储器;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器配置有如权利要求1至9任一项所述的视频标注方法。
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至9任一项所述的视频标注方法。
  13. 一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行,以配置有如权利要求1至9任一项所述的视频标注方法。
PCT/CN2023/130577 2022-11-15 2023-11-08 视频标注方法、装置、设备、介质及产品 WO2024104239A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211430306.1 2022-11-15
CN202211430306.1A CN115905622A (zh) 2022-11-15 2022-11-15 视频标注方法、装置、设备、介质及产品

Publications (1)

Publication Number Publication Date
WO2024104239A1 true WO2024104239A1 (zh) 2024-05-23

Family

ID=86495049

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/130577 WO2024104239A1 (zh) 2022-11-15 2023-11-08 视频标注方法、装置、设备、介质及产品

Country Status (2)

Country Link
CN (1) CN115905622A (zh)
WO (1) WO2024104239A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115905622A (zh) * 2022-11-15 2023-04-04 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品
CN115757871A (zh) * 2022-11-15 2023-03-07 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581433A (zh) * 2020-05-18 2020-08-25 Oppo广东移动通信有限公司 视频处理方法、装置、电子设备及计算机可读介质
CN112053323A (zh) * 2020-07-31 2020-12-08 上海图森未来人工智能科技有限公司 单镜头多帧图像数据物体追踪标注方法和装置、存储介质
US20210081671A1 (en) * 2019-09-12 2021-03-18 Beijing Xiaomi Mobile Software Co., Ltd. Video processing method and device, and storage medium
CN113378958A (zh) * 2021-06-24 2021-09-10 北京百度网讯科技有限公司 自动标注方法、装置、设备、存储介质及计算机程序产品
CN114117128A (zh) * 2020-08-29 2022-03-01 华为云计算技术有限公司 视频标注的方法、***及设备
CN115905622A (zh) * 2022-11-15 2023-04-04 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210081671A1 (en) * 2019-09-12 2021-03-18 Beijing Xiaomi Mobile Software Co., Ltd. Video processing method and device, and storage medium
CN111581433A (zh) * 2020-05-18 2020-08-25 Oppo广东移动通信有限公司 视频处理方法、装置、电子设备及计算机可读介质
CN112053323A (zh) * 2020-07-31 2020-12-08 上海图森未来人工智能科技有限公司 单镜头多帧图像数据物体追踪标注方法和装置、存储介质
CN114117128A (zh) * 2020-08-29 2022-03-01 华为云计算技术有限公司 视频标注的方法、***及设备
CN113378958A (zh) * 2021-06-24 2021-09-10 北京百度网讯科技有限公司 自动标注方法、装置、设备、存储介质及计算机程序产品
CN115905622A (zh) * 2022-11-15 2023-04-04 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品

Also Published As

Publication number Publication date
CN115905622A (zh) 2023-04-04

Similar Documents

Publication Publication Date Title
CN109584276B (zh) 关键点检测方法、装置、设备及可读介质
CN111476309B (zh) 图像处理方法、模型训练方法、装置、设备及可读介质
WO2024104239A1 (zh) 视频标注方法、装置、设备、介质及产品
CN112184738B (zh) 一种图像分割方法、装置、设备及存储介质
CN110503074A (zh) 视频帧的信息标注方法、装置、设备及存储介质
CN111414879B (zh) 人脸遮挡程度识别方法、装置、电子设备及可读存储介质
CN111368668B (zh) 三维手部识别方法、装置、电子设备及存储介质
CN111783626B (zh) 图像识别方法、装置、电子设备及存储介质
CN112734873B (zh) 对抗生成网络的图像属性编辑方法、装置、设备及介质
WO2023185391A1 (zh) 交互式分割模型训练方法、标注数据生成方法及设备
WO2024083121A1 (zh) 一种数据处理方法及其装置
CN115578570A (zh) 图像处理方法、装置、可读介质及电子设备
CN113610034B (zh) 识别视频中人物实体的方法、装置、存储介质及电子设备
CN116844129A (zh) 多模态特征对齐融合的路侧目标检测方法、***及装置
CN111444335B (zh) 中心词的提取方法及装置
CN113140012B (zh) 图像处理方法、装置、介质及电子设备
WO2024104272A1 (zh) 视频标注方法、装置、设备、介质及产品
CN113111684B (zh) 神经网络模型的训练方法、装置和图像处理***
CN109816791B (zh) 用于生成信息的方法和装置
CN114625876B (zh) 作者特征模型的生成方法、作者信息处理方法和装置
CN111870954B (zh) 一种高度图生成方法、装置、设备及存储介质
CN111353470B (zh) 图像的处理方法、装置、可读介质和电子设备
CN111860209B (zh) 手部识别方法、装置、电子设备及存储介质
CN117275017A (zh) 文字提取方法、装置、设备及存储介质
CN114565586B (zh) 息肉分割模型的训练方法、息肉分割方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23890675

Country of ref document: EP

Kind code of ref document: A1