WO2024104272A1 - 视频标注方法、装置、设备、介质及产品 - Google Patents

视频标注方法、装置、设备、介质及产品 Download PDF

Info

Publication number
WO2024104272A1
WO2024104272A1 PCT/CN2023/131040 CN2023131040W WO2024104272A1 WO 2024104272 A1 WO2024104272 A1 WO 2024104272A1 CN 2023131040 W CN2023131040 W CN 2023131040W WO 2024104272 A1 WO2024104272 A1 WO 2024104272A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
frame
annotation
sub
segment
Prior art date
Application number
PCT/CN2023/131040
Other languages
English (en)
French (fr)
Inventor
颜鹏翔
张晓鹤
朱思凝
刘豪
赵晴
吴捷
王一同
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2024104272A1 publication Critical patent/WO2024104272A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and in particular to a video annotation method, apparatus, device, medium, and product.
  • Video segmentation and annotation are usually done through traditional image segmentation. Generally, the video is sampled at a certain sampling frequency, and the sampled videos are distributed to annotators for manual annotation. Annotators can complete manual annotation by methods such as region selection and graphic drawing. However, the manual annotation method alone is quite limited and has low annotation efficiency.
  • the embodiments of the present disclosure provide a video annotation method, device, equipment, medium and product to overcome the problem that only manual annotation is relatively limited and has low annotation efficiency.
  • an embodiment of the present disclosure provides a video annotation method, including:
  • the video to be annotated is displayed on the video annotation page.
  • Video annotation results of annotated videos are displayed on the video annotation page.
  • an embodiment of the present disclosure provides a video annotation device, including:
  • a first responding unit configured to determine, in response to a video labeling request, a first sub-segment among the sub-segments to be labeled in the video to be labeled;
  • a second responding unit configured to obtain a first frame marking result of the first frame in response to a marking operation performed by a user on the first frame of the first sub-segment
  • a third responding unit configured to generate a tail frame marking result of the tail frame in response to a marking request for the tail frame of the first sub-segment
  • a fourth responding unit configured to generate an intermediate frame labeling result of the intermediate frame in response to the labeling request for the intermediate frame of the first sub-segment
  • the first display unit is configured to display the video annotation result of the video to be annotated on a video annotation page according to the annotation result of each image frame of the first sub-segment.
  • an embodiment of the present disclosure provides an electronic device, including: a processor, a memory, and an output device;
  • the memory stores computer-executable instructions
  • the processor executes the computer-executable instructions stored in the memory, so that the processor is configured with the video annotation method as described in the first aspect and various possible designs of the first aspect, and the output device is used to output the video annotation page.
  • an embodiment of the present disclosure provides a computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions.
  • the computer-readable storage medium stores computer-executable instructions.
  • an embodiment of the present disclosure provides a computer program product, including a computer program, which, when executed by a processor, implements the video annotation method described in the first aspect and various possible designs of the first aspect.
  • the technical solution provided by this embodiment can determine the first sub-segment in the sub-segments to be annotated in the video to be annotated in response to a video annotation request, and start the annotation of the first sub-segment.
  • the first frame annotation result of the first frame can be obtained.
  • the last frame annotation result of the last frame can be generated, and in response to the annotation request initiated for the middle frame of the first sub-segment, the middle frame annotation result of the middle frame can be generated.
  • the method of manually annotating the first frame and automatically annotating the last frame and the middle frame is adopted.
  • the sequential labeling of each image frame in the first sub-segment is realized by the automatic labeling of the last frame and the middle frame, which increases the labeling method of the image, effectively reduces the difficulty of labeling the image, and improves the labeling efficiency.
  • FIG1 is a diagram showing an application example of a video annotation method provided by an embodiment of the present disclosure
  • FIG3 is a flow chart of another embodiment of a video annotation method provided by an embodiment of the present disclosure.
  • FIG4 is an example diagram of a video annotation page provided by an embodiment of the present disclosure.
  • FIG5 is an example diagram of a task creation page provided by an embodiment of the present disclosure.
  • FIG6 is a flowchart of another embodiment of a video annotation method provided by an embodiment of the present disclosure.
  • FIG7 is a schematic diagram of the structure of an embodiment of a video annotation device provided by an embodiment of the present disclosure.
  • FIG8 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present disclosure.
  • the technical solution disclosed in the present invention can be applied to video annotation scenarios.
  • the manual annotation and automatic annotation methods are combined for segmenting the video to be annotated, thereby increasing the annotation efficiency of the video to be annotated and improving the annotation efficiency and accuracy of the video.
  • Video labeling is generally done manually. Video labeling generally involves dividing the video into Send it to annotators for manual annotation. In actual applications, after sending the video frame to multiple annotators, the annotators can complete the annotation by drawing curves, annotating objects, setting object types or names, etc.
  • the above annotation methods are mainly completed by manual annotation, which is too limited, resulting in low image annotation efficiency.
  • the present disclosure relates to technical fields such as image processing and artificial intelligence, and specifically to a video annotation method, device, equipment, medium and product.
  • the first sub-segment in the sub-segments to be annotated of the video to be annotated can be determined from the video to be annotated in response to a video annotation request, the first frame annotation result of the first frame can be obtained in response to the annotation operation performed by the user on the first frame of the first sub-segment, the tail frame annotation result of the tail frame can be generated in response to the annotation request for the tail frame of the first sub-segment, and at the same time, the middle frame annotation result of the middle frame can be generated in response to the annotation request for the middle frame of the first sub-segment.
  • the first frame annotation result is obtained manually by sampling, and the annotation results of the tail frame and the middle frame are automatically generated, so as to reduce manual annotation and improve the annotation efficiency of the image frame.
  • the annotation operation of the first frame and the annotation request of the tail frame and the middle frame interactive annotation with the user is realized, and the effectiveness of the annotation interaction is improved.
  • the video annotation results of the video to be annotated can be displayed on the video annotation page, so as to realize the visualization of the annotation process.
  • the annotation method provided by the present solution increases the annotation methods of the image, effectively reduces the difficulty of image annotation, improves the annotation efficiency, and provides annotation visualization interaction to improve the annotation effect and accuracy.
  • FIG1 is a diagram showing an application example of a video annotation method provided by an embodiment of the present disclosure, and the video annotation method can be configured in an electronic device 1.
  • the electronic device 1 can correspond to a display device 2.
  • the display device 2 can be used to display a video annotation interface 3.
  • the video annotation request can be triggered by the user in an interactive manner, and the electronic device 1 can detect the video annotation request and obtain the video to be annotated corresponding to the video annotation request.
  • the electronic device 1 can also determine the first sub-segment in the sub-segments to be annotated from the video to be annotated.
  • the electronic device 1 can start annotating from the first frame and obtain the first frame annotation result by using the user to perform the annotation operation. After obtaining the first frame, the first frame can be annotated to obtain the first frame annotation result.
  • the face annotation box 4 is annotated in the middle, while other areas in the image, such as the area where the street lamp 5 is located, do not need to be annotated.
  • the annotation object of the middle frame can be the same as the annotation object of the first frame and the last frame, for example, both are annotating faces.
  • FIG2 it is a flow chart of an embodiment of a video annotation method provided by an embodiment of the present disclosure.
  • the method can be configured as a video annotation device, and the video annotation device can be located in an electronic device.
  • the video annotation method can include the following steps:
  • 201 In response to a video annotation request, determine a first sub-segment among sub-segments to be annotated in the video to be annotated.
  • the video annotation request may be an access method generated by an image annotation operation triggered by a user.
  • the video annotation request may specify a video to be annotated.
  • the video annotation request may be an annotation link initiated by a user for the video to be annotated, such as a URL link.
  • the video annotation request may include a storage address of the video to be annotated. The video to be annotated may be loaded according to the storage address of the video to be annotated.
  • a sub-segment to be annotated may be obtained from the video to be annotated as a first sub-segment, and the first sub-segment may be annotated.
  • the video to be labeled may include at least one video sub-segment to be labeled.
  • the at least one video sub-segment may be obtained by dividing the video to be labeled.
  • the first sub-segment may be the first video sub-segment in the at least one video sub-segment.
  • the image frames in the first sub-segment may be annotated manually or automatically.
  • the first frame may be annotated manually by a user performing an annotation operation.
  • the last frame and the middle frame may respond to an annotation request and automatically annotate the last frame or the middle frame.
  • the annotation operation may be an operation such as region selection and label setting performed manually on the image to be annotated.
  • the first frame annotation result of the first frame is obtained by detecting the manually performed annotation operation.
  • the unannotated tail frame can be automatically annotated according to the first frame annotation result of the annotated first frame to obtain the tail frame annotation result of the tail frame.
  • the efficiency and accuracy of tail frame annotation are improved.
  • the method may further include: displaying the tail frame and the tail frame annotation result on the video annotation page.
  • the method may further include: obtaining an updated tail frame annotation result in response to a modification operation performed on the tail frame annotation result.
  • the tail frame annotation result may be an annotation result confirmed by the user, thereby improving the accuracy and precision of the tail frame annotation result.
  • the annotation request for the last frame may be generated by a user triggering an annotation control, or may be automatically generated by the electronic device when the annotation of the previous image frame is completed.
  • the unannotated intermediate frames can be automatically annotated by using the first frame annotation results of the annotated first frame and the last frame annotation results of the last frame to obtain the intermediate frame annotation results of the intermediate frames.
  • the efficiency and accuracy of intermediate frame annotation can be improved.
  • a semi-supervised machine learning model a deep neural network or other computing model can be used to automatically annotate the last frame or the middle frame in combination with the annotation result of the first frame, so as to achieve automatic normalization of the annotation result.
  • a semi-supervised machine learning model a deep neural network or other computing model can be used to automatically annotate the last frame or the middle frame in combination with the annotation result of the first frame, so as to achieve automatic normalization of the annotation result.
  • the intermediate frame may be an image frame between the first frame and the last frame, and each intermediate frame may perform the step of generating an intermediate frame annotation result of the intermediate frame in response to the annotation request for the intermediate frame of the first sub-segment.
  • the annotation request for the intermediate frame may be generated by a user triggering an annotation control, or may be automatically generated by the electronic device when the annotation of the previous image frame is completed.
  • annotation results of each image frame of the first sub-segment can be displayed on the video annotation page.
  • the annotation results of each image frame can be displayed after the annotation is completed.
  • the video annotation result of the video to be annotated may include: the annotation result of the image frame of each video sub-segment.
  • the first sub-segment of the sub-segment to be annotated of the video to be annotated is determined from the video to be annotated, in response to the annotation operation performed by the user on the first frame of the first sub-segment, the first frame annotation result of the first frame is obtained, and in response to the annotation request for the last frame of the first sub-segment, the last frame annotation result of the last frame can be generated.
  • the middle frame annotation result of the middle frame can be generated.
  • the first frame annotation result is obtained manually by sampling, and the annotation results of the last frame and the middle frame are automatically generated, which reduces manual annotation and improves the annotation efficiency of the image frame.
  • the annotation operation of the first frame and the annotation request of the last frame and the middle frame interactive annotation with the user is realized, and the effectiveness of the annotation interaction is improved.
  • the video annotation results of the video to be annotated can be displayed on the video annotation page, and the visualization of the annotation process is realized.
  • the annotation method provided by the present solution increases the image annotation method, effectively reduces the difficulty of image annotation, improves the annotation efficiency, and provides annotation visualization interaction to improve the annotation effect and accuracy.
  • FIG. 3 it is a flow chart of another embodiment of a video annotation method provided by an embodiment of the present disclosure, which is different from the embodiment shown in FIG. 2 in that it further includes:
  • 303 displaying intermediate frames in a third image area of the video annotation page, and the number of intermediate frames displayed in the third image area is a preset number of images to be displayed.
  • the video annotation page 400 may include a first image area 401, a second image area 402, and a third image area 403.
  • the first image area 401 is used to display the first frame
  • the second image area 402 may display the last frame.
  • the third image area 403 may be used to display at least one intermediate frame.
  • the number of at least one intermediate frame is the number of image display preset in the third image area.
  • the third image area 403 may display a number of image display quantities.
  • the image area may be a label prompt window of the image, and the label prompt window may display a thumbnail of the image.
  • the video label page may display a thumbnail of the first frame in the first image area 401 at the lower right.
  • the video label page may display a thumbnail of the last frame in the second image area 402 at the lower left.
  • the third image area between the first image area 401 and the second image area 402 may be a thumbnail of the last frame.
  • Area 403 may be used to display thumbnails of intermediate frames.
  • the third image area 403 may include multiple image prompt windows, such as 4031-4035 shown in FIG4. Among them, when any image prompt window is selected, for example, the image corresponding to the selected image prompt window 4033 can be used as the intermediate frame to be annotated, and the intermediate frame can be the intermediate frame to be annotated. Other unselected image prompt windows, such as 4033, can be displayed in a manner different from other display areas.
  • image prompt windows such as 4031-4035 shown in FIG4.
  • the image corresponding to the selected image prompt window 4033 can be used as the intermediate frame to be annotated
  • the intermediate frame can be the intermediate frame to be annotated.
  • Other unselected image prompt windows, such as 4033 can be displayed in a manner different from other display areas.
  • a label confirmation control 404 can be displayed on the video labeling page. If the image prompt window of a certain image is selected, the label confirmation control 404 can establish a confirmation association with the selected image prompt window, such as 4033 in the middle frame. By detecting the click operation performed by the user on the label confirmation control, it can be determined that the middle frame corresponding to the image prompt window is the middle frame to be currently labeled, and a labeling request for the middle frame to be currently labeled is generated. In response to the labeling request, the labeling of the middle frame can be performed. For example, the middle frame corresponding to the selected image prompt window 4033 can be displayed in the image display area or window 407. By detecting the labeling operation performed by the user on the labeling control of the image to be labeled, the labeling operation on the image to be labeled can be started. For example, marking the face area in the image frame.
  • a video annotation page can be displayed, and targeted prompts can be given to the first frame, the last frame, and the middle frame through different image prompt areas in the video annotation page, so that users can view each image frame conveniently, thereby improving the annotation prompt efficiency and prompt accuracy of the image to be annotated.
  • an annotation task can be established for the videos to be annotated, so as to facilitate the annotation management of each video to be annotated.
  • a task creation page may be provided to implement the creation and processing of the annotation task.
  • the process further includes:
  • a labeling task for the video to be labeled is created.
  • the task display area on the task creation page displays the task prompt information of the labeling task of the video to be labeled.
  • a video annotation request for the video to be annotated is generated, and the video annotation page of the video to be annotated is switched.
  • the labeling task may refer to a labeling process established for a video to be labeled.
  • the task establishment page may refer to a webpage for implementing the task of the program to be labeled, which may be obtained by programming in programming languages such as HTML5, C++, JAVA, and IOS.
  • the task establishment page may include a task display area. The task display area may be used to display task prompt information of the labeling task of each video to be labeled.
  • the task creation page 500 may include multiple controls, such as a task management control 501 and a template preview control 502.
  • a task management subpage 503 may be displayed.
  • the task management subpage may include a task prompt control 5031 and a task display area 5032.
  • the control name of the task prompt control 5031 may prompt the establishment of a new annotation task.
  • the task display area 5032 may display the task information of the established annotation task.
  • the task information may include: task title, task ID, creation time, creator, and task operation control.
  • the task information of the annotation task may also include other types of information, such as annotation method, label type, etc., which will not be repeated here.
  • one task may correspond to one piece of task information.
  • the video annotation page 400 can include task prompt information 405 of the annotation task A: "Annotation task A".
  • the task prompt information 405 can prompt the annotation task corresponding to the annotation task A.
  • the video to be annotated corresponding to the annotation task A can be divided into several video sub-segments, and the video sub-segments 1-N can be displayed in the segment prompt area 406, such as the video sub-segment 1-video sub-segment N shown in FIG4.
  • establishing a labeling task for a video to be labeled may refer to a task execution module established for the video to be labeled, and the user may perform labeling operations on the labeling task through the task operation controls in the labeling task for the video to be labeled.
  • the task operation controls may include: labeling controls, quality inspection controls, statistical controls, etc.
  • the labeling control may refer to a start prompt control for the labeling task for the video to be labeled, and detecting that a user triggers the labeling control may generate a video labeling request for the video to be labeled.
  • the quality inspection control may refer to a prompt control for performing quality inspection on the labeling results generated by the labeling task for the video to be labeled, and detecting that a user triggers the quality inspection control may start a quality inspection process for the labeling results of each image frame of the video to be labeled.
  • the statistical control may refer to a control for prompting labeling-related data such as the number of times the video to be labeled is labeled and the number of labels.
  • the statistical control that detects user triggering can display various data generated by the task of annotating the video to be annotated.
  • the task prompt control may include: a trigger control for creating a new labeling task.
  • the target path may be the storage path of the video to be annotated selected by the path.
  • the video to be annotated may be uploaded to the annotation method through the target path for annotation.
  • a video upload page in response to a click operation performed on a task prompt control in a task establishment page, can be displayed on the upper layer of the task establishment page.
  • the video upload page can include a path selection control for a video storage path, and the task prompt control can be used to prompt a newly created annotation task, so as to obtain a target path in response to a path selection operation performed on the path selection control.
  • the first sub-segment may be a video sub-segment that needs to be labeled.
  • the video sub-segment may start with the labeling of the first image, that is, the first frame and the last frame, and then label the remaining intermediate frames in sequence.
  • the third image area may be a sliding window.
  • the method may also include:
  • the intermediate frame displayed in the third image area is slid toward the left side of the sliding window in the order of the image frames, and the intermediate frame displayed in the third image area is updated;
  • the intermediate frame displayed in the third image area is slid toward the right side of the sliding window in the order of the image frames, and the intermediate frame displayed in the third image area is updated.
  • the third image area 403 shown in FIG4 may include multiple image prompt windows, such as 4031-4035 shown in FIG4, and the number of image prompts M that can be displayed is, for example, 5, so 5 image prompt windows can be displayed in the window prompt area 401, and each image prompt window can display a thumbnail of an image frame to prompt the image frame. Starting from the second image frame of the target video sub-segment, 5 image frames are displayed in the third image area 403 in the order of each image frame.
  • the third image area 403 may include a first button and a second button in addition to the image prompt window.
  • the first button and the second button may be a triangle as shown in FIG4 , or may be other shapes, such as a circle, a square, etc.
  • Displaying the M image frames in the window prompt area in the order of the image frames may include: determining the M image frames currently displayed; displaying the M image frames in sequence in the window prompt area; and determining the first unlabeled image from the M image frames as the image to be labeled.
  • the third image area is a sliding window
  • the intermediate frames displayed in the third image in response to a user clicking the first button on the left side of the sliding window, can be sequentially slid to the left for display, and in response to a user clicking the second button on the right side of the sliding window, the intermediate frames displayed in the third image area can be slid to the right side of the sliding window in the order of the image frames.
  • the user can update the intermediate frames displayed in the sliding window by clicking the window button, and the displayed intermediate frames are continuously updated by sliding on the left and right sides, thereby realizing effective display of the intermediate frames.
  • the to-be-annotated object includes one or more objects
  • the annotation result of the image frame of the first sub-segment includes annotation sub-results corresponding to the multiple annotation objects respectively.
  • one or more objects may be annotated in an image frame, thereby achieving multi-object annotation for one image frame and improving annotation efficiency and accuracy.
  • FIG. 6 it is a flow chart of another embodiment of a video annotation method provided by an embodiment of the present disclosure, which is different from the above-mentioned embodiment in that, according to the annotation results of each image frame of the first sub-segment, displaying the video annotation results of the video to be annotated on the video annotation page includes:
  • a second sub-segment to be marked is determined from unmarked video sub-segments of at least one video sub-segment corresponding to the video to be marked.
  • the video to be labeled may be divided into at least one video sub-segment, and the target video sub-segment may be determined from the at least one video sub-segment in sequence according to the segment sequence corresponding to the at least one video sub-segment.
  • the second sub-segment may be the second video sub-segment or a video sub-segment subsequent to the second video sub-segment.
  • two adjacent video sub-segments may include the same video frame, and the image frames of the two adjacent video sub-segments are set to overlap.
  • the last image frame of the previous video sub-segment may be overlapped with the first image frame of the next video sub-segment. Therefore, by automatically annotating the last image frame of the previous video sub-segment, the first frame annotation result of the first frame of the segment can be obtained.
  • the step of dividing at least one video sub-segment may include: extracting multiple key frames of the video to be labeled, extracting one video sub-segment using two adjacent key frames, and obtaining at least one video sub-segment from the video to be labeled according to the multiple key frames.
  • the last image frame of the previous video sub-segment of the two adjacent video sub-segments is the same as the first image frame of the next video sub-segment.
  • determining the second sub-segment to be labeled from the unlabeled video sub-segments of at least one video sub-segment may include: determining the second sub-segment to be labeled from the unlabeled video sub-segments according to the segment division order of the at least one video sub-segment.
  • 603 In response to the marking request initiated for the first frame of the second sub-segment, obtain the marking result of the last frame of the previous sub-segment of the second sub-segment as the first frame marking result of the first frame of the second sub-segment.
  • the annotation result of the last image frame of the previous video sub-segment can be read to obtain the first frame annotation result of the first frame.
  • the image annotation can be quickly transferred and the image annotation efficiency can be improved.
  • 604 In response to the marking request initiated for other frames of the second sub-segment, generate marking results for other frames of the second sub-segment.
  • generating labeling results for other frames of the second sub-segment may include: generating a tail frame labeling result of the tail frame in response to a labeling request for the tail frame of the second sub-segment; and generating an intermediate frame labeling result of the intermediate frame in response to a labeling request for the intermediate frame of the second sub-segment.
  • the method may further include: in response to a modification operation performed by a user on the annotation result of the intermediate frame, obtaining a final annotation result of the intermediate frame, updating the modified intermediate frame to be the first frame and the final annotation result to be the updated first frame annotation result, returning to execute in response to the annotation request for the intermediate frame of the second sub-segment, generating an intermediate frame annotation result of the intermediate frame.
  • 605 Outputting the annotation results of each image frame of the second sub-segment on the video annotation page.
  • the output interface and content of the video annotation page of the second sub-segment can refer to the output of the first sub-segment, and will not be repeated here.
  • the annotation result of the first sub-segment can be generated according to the annotation result of each image frame of the first sub-segment, and the annotation result of the first sub-segment can be used to obtain at least one of the video frames corresponding to the video to be annotated.
  • a second sub-segment to be annotated is determined in an unannotated video sub-segment of a video sub-segment.
  • the second sub-segment may be an unannotated video sub-segment.
  • the tail frame labeling result of the tail frame of the previous video sub-segment may be obtained as the first frame labeling result of the first frame of the second sub-segment, so that the first frame is automatically obtained.
  • the labeling results of the other frames may be generated.
  • the second sub-segment annotated after the first sub-segment may automatically obtain labeling results from the first frame to the tail frame, without the need for manual labeling, thereby achieving efficient labeling of the video sub-segment.
  • the labeling process may be visualized by outputting the labeling results of each image frame of the second sub-segment on the video labeling page.
  • determining a second sub-segment to be labeled from unlabeled video sub-segments of at least one video sub-segment corresponding to the video to be labeled includes:
  • At least one video sub-segment corresponding to the video to be labeled is determined.
  • the selected video sub-segment is obtained as a second sub-segment.
  • At least one video sub-segment can display segment prompt information respectively.
  • a second sub-segment corresponding to the segment prompt information clicked by the user is obtained.
  • the selection of the video sub-segment can be achieved through user triggering.
  • the selected video sub-segment in response to the user's selection operation on any video sub-segment, can be obtained as the second sub-segment.
  • the user interactively responds to obtain the second sub-segment selected by the user, and the second sub-segment is accurately selected.
  • the method may further include:
  • the segment prompt window of the video annotation page displays segment prompt information corresponding to at least one video sub-segment, and the segment prompt information is displayed in a sub-window in the segment display window.
  • the segment prompt information corresponding to at least one video sub-segment can be displayed in the video annotation page.
  • the segment prompt area 406 shown in FIG. 4 can display segment prompt information of several video sub-segments.
  • the segment prompt information can be provided with segment names of the video sub-segments, which are respectively video sub-segments 1-N, where N is the number of segments of at least one video sub-segment.
  • the segment names of the video sub-segments can be determined according to the division order of the video sub-segments.
  • the first video sub-segment extracted from the video to be annotated can be named video sub-segment 1.
  • the video sub-segment in the marked state can be the target video sub-segment.
  • the video sub-segment 2 can be the target video sub-segment in the marked state.
  • the video sub-segment 2 in the marked state can be in a selected state, and other video sub-segments, such as video sub-segment 1, video sub-segment 3-video sub-segment N, etc., are all in an unselected state.
  • the segment prompt information corresponding to each sub-segment of a video can be displayed in the segment prompt window of the video annotation page, and the segment display window uses a sub-window to display the corresponding segment prompt information.
  • the segment display window uses a sub-window to display the corresponding segment prompt information.
  • the method further includes:
  • the first frame annotation result of the first frame is displayed in the result display area of the video annotation page.
  • tail frame annotation result of the tail frame After generating the tail frame annotation result of the tail frame, it also includes:
  • the annotation results are displayed in the result display area of the video annotation page.
  • the annotation results can be effectively displayed.
  • the method in response to a labeling request initiated for an intermediate frame of the first sub-segment, after generating an intermediate frame labeling result of the intermediate frame, the method further includes:
  • the modified intermediate frame is updated as the first frame and the final annotation result is the updated first frame annotation result, and the process returns to execute in response to the annotation request for the intermediate frame of the first sub-segment, and generates the intermediate frame annotation result of the intermediate frame.
  • the annotation result of the intermediate frame is determined to be the final annotation result of the intermediate frame.
  • the method may further include: regenerating the last frame annotation result of the last frame according to the final annotation result of the intermediate frame. Generating the intermediate frame annotation results of the intermediate frames between the intermediate frame and the last frame according to the final annotation result of the intermediate frame.
  • a result confirmation control set for the annotation result of the image to be annotated can also be displayed on the video annotation page; in response to a click operation triggered by the user on the result confirmation control, the annotation result currently corresponding to the currently annotated image frame can be determined as the final annotation result of the image frame.
  • a result modification control set for the annotation result of the image to be annotated may also be displayed in the video annotation page; in response to a click operation triggered by the user on the result modification control, an annotation modification operation performed by the user on the annotation result of the image to be annotated may be detected.
  • the modification operation may include modification operations performed on the annotation results such as the annotation area, label type, and the position of the annotation object of the intermediate frame.
  • the update of the annotation area can obtain the updated area through operations such as erasing, dragging, and sliding.
  • the update of the label type may refer to deleting the original label of the annotation area and adding a new label area.
  • the annotation modification result may include at least one of the following results: the modification result of the label area, the modification result of the label type, and the update of the position of the annotation object.
  • the user can manually modify the annotation result of the image to be annotated, realize secondary modification and annotation of the automatically generated annotation result, and improve the annotation accuracy.
  • the final annotation result of the intermediate frame can be obtained in response to the modification operation performed by the user on the annotation result of the intermediate frame.
  • the intermediate frame can be revised in time by interacting with the user, and the revision effect of the annotation result of the intermediate frame by the user can be achieved, thereby improving the annotation accuracy.
  • the annotation result of the intermediate frame can also be determined as the final annotation result of the intermediate frame in response to the confirmation operation performed by the user on the annotation result of the intermediate frame.
  • the video annotation results of the video to be annotated can be quality checked.
  • the method may also include:
  • the video annotation results of the video to be annotated are sent to the quality inspection party.
  • the quality inspector may include an electronic device corresponding to a user who performs quality inspection on the video to be annotated.
  • the quality inspector may obtain the annotation quality inspection result of the video to be annotated in response to the quality inspection operation performed by the quality inspector on the video annotation result of the video to be annotated.
  • the video annotation result of the video to be annotated may include the annotation result corresponding to each image frame of at least one video sub-segment, that is, the video annotation result may include the annotation result corresponding to each image frame in the video to be annotated.
  • the annotation quality inspection result may include image frames with anomalies in the video annotation result.
  • the quality inspection party may detect the abnormal trigger operation performed by the quality inspection user on the image frame with anomalies, and obtain the image frame with anomalies.
  • the annotation anomaly may include that there is an error between the annotation result of the image frame and the annotation result required by the user.
  • the video annotation result of the video to be annotated can be sent to the quality inspection party, instructing the quality inspection party to perform quality inspection on the video to be annotated.
  • the quality inspection of the annotation result of the video to be annotated the annotation validity and reliability of the video to be annotated can be ensured.
  • the image annotation device 700 may include the following units:
  • the first responding unit 701 is configured to respond to a video labeling request and determine a first sub-segment in the sub-segments to be labeled in the video to be labeled.
  • the second responding unit 702 is configured to obtain a first frame marking result of the first frame in response to a marking operation performed by the user on the first frame of the first sub-segment.
  • the third responding unit 703 is configured to generate a tail frame marking result of the tail frame in response to the marking request for the tail frame of the first sub-segment.
  • the fourth responding unit 704 is configured to generate an intermediate frame labeling result of the intermediate frame in response to the labeling request for the intermediate frame of the first sub-segment.
  • the first display unit 705 is configured to display the video annotation result of the video to be annotated on the video annotation page according to the annotation result of each image frame of the first sub-segment.
  • the device may include:
  • a second display unit used to display the first frame in the first image area of the video annotation page
  • a third display unit used to display the last frame in the second image area of the video annotation page
  • the fourth display unit is used to display intermediate frames in the third image area of the video annotation page, and the number of intermediate frames displayed in the third image area is a preset image display number.
  • the third image area is a sliding window; and may further include:
  • a first response module configured to respond to a click operation performed by a user on a first button on the left side of the sliding window, slide the intermediate frame displayed in the third image area to the left side of the sliding window according to the sequence of the image frames, and update the intermediate frame displayed in the third image area;
  • the second response module is used to respond to the user's click operation on the second button on the right side of the sliding window, slide the intermediate frame displayed in the third image area to the right side of the sliding window according to the order of each image frame, and update the intermediate frame displayed in the third image area.
  • the to-be-annotated object includes one or more objects
  • the annotation result of the image frame of the first sub-segment includes annotation sub-results corresponding to the multiple annotation objects respectively.
  • the first display unit may include:
  • a result generating module configured to generate a labeling result of the first sub-segment according to the labeling results of each image frame of the first sub-segment;
  • a segment determination module configured to determine that the marking of the first sub-segment is completed, and then determine a second sub-segment to be marked from unmarked video sub-segments of at least one video sub-segment corresponding to the video to be marked;
  • a segment annotation module configured to, in response to the annotation request initiated for the first frame of the second sub-segment, obtain the annotation result of the last frame of the previous sub-segment of the second sub-segment as the first frame annotation result of the first frame of the second sub-segment;
  • the result generating module is configured to generate the marking results of other frames of the second sub-segment in response to the marking request initiated for other frames of the second sub-segment.
  • the result display module is used to output the annotation results of each image frame of the second sub-segment on the video annotation page.
  • the fragment determination module includes:
  • the video segment submodule is used to determine at least one video sub-segment corresponding to the video to be labeled.
  • the segment selection submodule is used to obtain the selected video subsegment as the second subsegment in response to a user's selection operation on any video subsegment in the at least one video subsegment.
  • it further includes:
  • a segment prompt unit used to display at least one video in the segment prompt window of the video annotation page
  • the sub-segments respectively correspond to the segment prompt information, and the segment prompt information is displayed in the segment display window using the sub-window.
  • it may also include:
  • a result modification unit configured to obtain a final annotation result of the intermediate frame in response to a modification operation performed by a user on the annotation result of the intermediate frame;
  • the first frame updating unit is used to update the modified intermediate frame as the first frame and the final marking result as the updated first frame marking result, and return to execute in response to the marking request for the intermediate frame of the first sub-segment to generate the intermediate frame marking result of the intermediate frame.
  • the annotation confirmation unit is configured to determine, in response to a confirmation operation performed by a user on the annotation result of the intermediate frame, that the annotation result of the intermediate frame is the final annotation result of the intermediate frame.
  • it further includes:
  • a result sending unit used to send the video annotation result of the video to be annotated to the quality inspection party
  • a quality inspection receiving unit used to receive the quality inspection results of the video to be annotated fed back by the quality inspector
  • the quality inspection display unit is used to display the labeling quality inspection results of the video to be labeled.
  • the device provided in this embodiment can be used to execute the technical solution of the above method embodiment. Its implementation principle and technical effect are similar, and this embodiment will not be repeated here.
  • the embodiment of the present disclosure also provides an electronic device.
  • FIG8 it shows a schematic diagram of the structure of an electronic device 800 suitable for implementing the embodiment of the present disclosure
  • the electronic device 800 may be a terminal device or a server.
  • the terminal device may include but is not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, personal digital assistants (PDAs), tablet computers (Portable Android Devices, PADs), portable multimedia players (PMPs), vehicle terminals (such as vehicle navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • PDAs personal digital assistants
  • PADs Portable Android Devices
  • PMPs portable multimedia players
  • vehicle terminals such as vehicle navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG8 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present disclosure.
  • the electronic device 800 may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.) 801, which may perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 802 or a program loaded from a storage device 808 to a random access memory (RAM) 803.
  • ROM read-only memory
  • RAM random access memory
  • Various programs and data required for the operation of the electronic device 800 are also stored in the RAM 803.
  • 802 and RAM 803 are connected to each other via a bus 804.
  • an input/output (I/O) interface 805 is also connected to the bus 804, an input/output (I/O) interface 805 is also connected.
  • I/O input/output
  • the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; output devices 807 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices 88 including, for example, a magnetic tape, a hard disk, etc.; and communication devices 809.
  • the communication device 809 may allow the electronic device 800 to communicate with other devices wirelessly or by wire to exchange data.
  • FIG. 7 shows an electronic device 800 having various devices, it should be understood that it is not required to implement or have all of the devices shown. More or fewer devices may be implemented or have alternatively.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from the network through the communication device 809, or installed from the storage device 808, or installed from the ROM 802.
  • the processing device 801 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, which carries a computer-readable program code. Such a propagated data signal may take a variety of forms, including but not limited to an electromagnetic signal, an optical signal, or any suitable combination of the above.
  • the computer-readable signal medium also Any computer-readable medium other than a computer-readable storage medium can be used to send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the computer-readable medium can be transmitted using any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the computer-readable medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.
  • the computer-readable medium carries one or more programs.
  • the electronic device executes the method shown in the above embodiment.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented by software or by computer.
  • the unit name does not limit the unit itself in some cases.
  • the first acquisition unit can also be described as a "unit for acquiring at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chip (SOCs), complex programmable logic devices (CPLDs), and the like.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOCs systems on chip
  • CPLDs complex programmable logic devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
  • a more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM portable compact disk read-only memory
  • CD-ROM compact disk read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • a video annotation method including:
  • the video annotation results of the video to be annotated are displayed on the video annotation page.
  • the present invention further includes:
  • the third image area on the video annotation page displays the middle frame.
  • the number of frames is the preset number of image displays.
  • the third image area is a sliding window; further comprising:
  • the intermediate frame displayed in the third image area is slid toward the left side of the sliding window in the order of the image frames, and the intermediate frame displayed in the third image area is updated;
  • the intermediate frame displayed in the third image area is slid to the right side of the sliding window in the order of the image frames, and the intermediate frame displayed in the third image area is updated.
  • the to-be-annotated object includes one or more, and the annotation result of the image frame of the first sub-segment includes annotation sub-results corresponding to the multiple annotation objects respectively.
  • displaying the video annotation results of the video to be annotated on the video annotation page according to the annotation results of each image frame of the first sub-segment includes:
  • a second sub-segment to be marked is determined from unmarked video sub-segments of at least one video sub-segment corresponding to the video to be marked;
  • tagging results of other frames of the second sub-segment are generated.
  • the annotation results of each image frame of the second sub-segment are output on the video annotation page.
  • determining a second sub-segment to be labeled from an unlabeled video sub-segment of at least one video sub-segment corresponding to the video to be labeled includes:
  • At least one video sub-segment corresponding to the video to be labeled is determined.
  • the selected video sub-segment is obtained as a second sub-segment.
  • the present invention further includes:
  • the segment prompt window of the video annotation page displays segment prompt information corresponding to at least one video sub-segment, and the segment prompt information is displayed in a sub-window in the segment display window.
  • the method in response to a labeling request initiated for an intermediate frame of a first sub-segment, after generating an intermediate frame labeling result of the intermediate frame, the method further includes:
  • the final annotation result of the intermediate frame is displayed in the result display area
  • the annotation result of the intermediate frame is determined to be the final annotation result of the intermediate frame.
  • the present invention further includes:
  • a video annotation device including:
  • a first responding unit configured to determine, in response to a video labeling request, a first sub-segment among the sub-segments to be labeled in the video to be labeled;
  • a second responding unit configured to obtain a first frame marking result of the first frame in response to a marking operation performed by a user on the first frame of the first sub-segment
  • a third responding unit configured to generate a tail frame marking result of the tail frame in response to the marking request for the tail frame of the first sub-segment
  • a fourth responding unit configured to generate an intermediate frame labeling result of the intermediate frame in response to the labeling request for the intermediate frame of the first sub-segment
  • the first display unit is used to display the video annotation result of the video to be annotated on the video annotation page according to the annotation result of each image frame of the first sub-segment.
  • an electronic device comprising: at least one processor and a memory;
  • Memory stores computer-executable instructions
  • At least one processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the video annotation method of the first aspect and various possible designs of the first aspect as described above.
  • a computer-readable storage medium in which computer execution instructions are stored.
  • a processor executes the computer execution instructions, the video annotation method as described in the first aspect and various possible designs of the first aspect are implemented.
  • a computer program product including a computer program, which, when executed by a processor, implements the video annotation method of the first aspect and various possible designs of the first aspect.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本公开实施例提供一种视频标注方法、装置、设备、介质及产品。视频标注方法可以包括:响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段;响应于用户针对所述第一子片段的首帧执行的标注操作,获得所述首帧的首帧标注结果;响应于针对所述第一子片段的尾帧的标注请求,生成所述尾帧的尾帧标注结果;响应于针对所述第一子片段的中间帧的标注请求,生成所述中间帧的中间帧标注结果;根据所述第一子片段各图像帧的标注结果,在视频标注页面显示所述待标注视频的视频标注结果。本公开的技术方案对待标注视频采用手动和自动标注结合,增加了图像的标注方式,对图像的标注难度有效降低,提升标注效率。

Description

视频标注方法、装置、设备、介质及产品
本申请要求2022年11月15日递交的、标题为“视频标注方法、装置、设备、介质及产品”、申请号为2022114303042的中国发明专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及计算机技术领域,尤其涉及一种视频标注方法、装置、设备、介质及产品。
背景技术
视频分割标注一般是通过传统的图片分割,一般是将视频按照一定采样频率进行抽帧,并将采样获得的视频分发给标注人员进行人工标注。标注人员可以通过区域选择、图形绘制等方式完成人工标注。但是,仅采用人工标注方式较为局限,标注效率较低。
发明内容
本公开实施例提供一种视频标注方法、装置、设备、介质及产品,以克服仅采用人工标注方式较为局限,标注效率较低的问题。
第一方面,本公开实施例提供一种视频标注方法,包括:
响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段;
响应于用户针对所述第一子片段的首帧执行的标注操作,获得所述首帧的首帧标注结果;
响应于针对所述第一子片段的尾帧的标注请求,生成所述尾帧的尾帧标注结果;
响应于针对所述第一子片段的中间帧的标注请求,生成所述中间帧的中间帧标注结果;
根据所述第一子片段各图像帧的标注结果,在视频标注页面显示所述待 标注视频的视频标注结果。
第二方面,本公开实施例提供一种视频标注装置,包括:
第一响应单元,用于响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段;
第二响应单元,用于响应于用户针对所述第一子片段的首帧执行的标注操作,获得所述首帧的首帧标注结果;
第三响应单元,用于响应于针对所述第一子片段的尾帧的标注请求,生成所述尾帧的尾帧标注结果;
第四响应单元,用于响应于针对所述第一子片段的中间帧的标注请求,生成所述中间帧的中间帧标注结果;
第一显示单元,用于根据所述第一子片段各图像帧的标注结果,在视频标注页面显示所述待标注视频的视频标注结果。
第三方面,本公开实施例提供一种电子设备,包括:处理器、存储器以及输出装置;
所述存储器存储计算机执行指令;
所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器配置有如上第一方面以及第一方面各种可能的设计所述的视频标注方法,输出装置用于输出视频标注页面。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计所述的视频标注方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计所述的视频标注方法。
本实施例提供的技术方案,响应于视频标注请求,可以确定待标注视频中待标注子片段中的第一子片段,启动对第一子片段的标注。此时,可以响应于用户针对第一子片段的首帧执行的标注操作,获得首帧的首帧标注结果。之后,还可以响应于针对第一子片段中的尾帧发起的标注请求,生成尾帧的尾帧标注结果,以及响应于针对第一子片段的中间帧发起的标注请求,生成中间帧的中间帧标注结果。采用手动标注首帧、自动标注尾帧和中间帧的方 式,实现第一子片段中各图像帧的顺序标注。通过对对尾帧和中间帧的自动标注,增加了图像的标注方式,对图像的标注难度有效降低,提升标注效率。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种视频标注方法的一个应用示例图;
图2为本公开实施例提供的一种视频标注方法的一个实施例的流程图;
图3为本公开实施例提供的一种视频标注方法的又一个实施例的流程图;
图4为本公开实施例提供的一个视频标注页面的示例图;
图5为本公开实施例提供的一个任务建立页面的示例图;
图6为本公开实施例提供的一种视频标注方法的又一个实施例的流程图;
图7为本公开实施例提供的一种视频标注装置的一个实施例的结构示意图;
图8为本公开实施例提供的一种电子设备的硬件结构示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开的技术方案可以应用于视频标注场景中,通过采用对待标注的图像进行按照首帧手动标注,尾帧和中间帧自动标注的方式,分片段对待标注视频执行手动标注和自动标注方式结合,增加待标注视频的标注效率,提高视频的标注效率和准确性。
相关技术中,视频分割场景中,需要使用已标注的视频对视频分割模型进行训练。视频的标注一般是通过人工完成。视频的标注,一般是将视频分 发给标注人员进行人工标注。在实际应用中,将视频帧发送给多个标注人员之后,标注人员可以采用曲线绘制、对象标注、对象类型或名称的设置等方式完成标注。但是,上述标注方式主要是通过人工标注完成,标注方式过于局限,导致图像的标注效率较低。
本公开涉图像处理、人工智能等技术领域,具体涉及一种视频标注方法、装置、设备、介质及产品。
为了解决上述技术问题,本公开的技术方案中,可以通过响应于视频标注请求,从待标注视频中确定待标注视频的待标注子片段中的第一子片段,响应于用户针对第一子片段的首帧执行的标注操作,获得首帧的首帧标注结果,响应于针对第一子片段的尾帧的标注请求,可以生成尾帧的尾帧标注结果,同时,响应于针对第一子片段的中间帧的标注请求,可以生成中间帧的中间帧标注结果。采样手动获得首帧标注结果,自动生成尾帧和中间帧的标注结果,减少手动标注,提高图像帧的标注效率。此外,通过首帧的标注操作,尾帧和中间帧的标注请求,实现与用户的交互标注,提高标注交互的有效性。根据第一子片段各图像帧的标注结果,可以在视频标注页面显示待标注视频的视频标注结果,实现标注过程的可视化。本方案提供的标注方法,增加了图像的标注方式,对图像的标注难度有效降低,提升标注效率,同时提供标注可视化交互,提高标注效果和精度。
下面将以具体实施例对本公开的技术方案以及本公开的技术方案如何解决上述技术问题进行详细说明。下面几个具体实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图对本发明的实施例进行详细描述。
图1所示,为本公开实施例提供的一种视频标注方法的一个应用示例图,该视频标注方法可以配置于电子设备1中。电子设备1可以对应有显示装置2。其中,显示装置2可以用于显示视频标注界面3。
可选地,视频标注请求可以通过交互方式由用户触发,电子设备1可以检测视频标注请求,并获取视频标注请求对应的待标注视频。电子设备1还可以从待标注视频中确定待标注子片段中的第一子片段。电子设备1可以从首帧开始标注,利用用户执行标注操作获得首帧标注结果。获得首帧之后,可以对首帧进行标注,获得首帧标注结4例如显示装置2所示的待标注图像 中标注的人脸标注框4,而对于图像中的其它区域,例如路灯5所在的区域即可以不进行标注。之后,可以从首帧切换到尾帧进行标注,尾帧的尾帧标注结果是自动生成的,例如图1所示的人脸标注框6。之后,可以从尾帧切换到中间帧进行标注,中间帧的中间帧标注结果也是自动生成的。通过手动标注首帧,自动标注尾帧和中间帧的标注策略,可以提高视频的标注效率。中间帧的标注对象可以和首帧以及尾帧的标注对象相同,例如均为对人脸进行标注。
如图2所示,为本公开实施例提供的一种视频标注方法的一个实施例的流程图,该方法可以配置为一视频标注装置,视频标注装置可以位于电子设备中。视频标注方法可以包括以下几个步骤:
201:响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段。
可选地,视频标注请求可以为用户触发的图像标注操作生成的访问方法。视频标注请求可以指定待标注视频。具体地,视频标注请求可以为用户针对待标注视频发起的标注链接,例如URL链接等。视频标注请求中可以包括待标注视频的存储地址。可以根据待标注视频的存储地址加载待标注视频。
为了对待标注视频准确标注,可以从待标注视频中获取待标注子片段作为第一子片段,对第一子片段进行标注。
示例性地,待标注视频可以包括待标注的至少一个视频子片段。至少一个视频子片段可以通过待标注视频进行视频划分获得。第一子片段可以为至少一个视频子片段中的第一个视频子片段。
202:响应于用户针对第一子片段的首帧执行的标注操作,获得首帧的首帧标注结果。
可选地,可以采用人工标注或自动标注对第一子片段中的图像帧进行标注。首帧可以采用用户执行标注操作的方式进行人工标注。尾帧和中间帧可以响应标注请求,自动对尾帧或中间帧进行标注。标注操作可以为人工针对待标注图像执行的区域选择、标签设置等操作。通过检测人工执行的标注操作以获得首帧的首帧标注结果。
203:响应于针对第一子片段的尾帧的标注请求,生成尾帧的尾帧标注结果。
尾帧标注结果的生成过程中,可以通过已标注首帧的首帧标注结果对未标注的尾帧进行自动标注处理,获得尾帧的尾帧标注结果。通过自动化标注,提高尾帧标注效率和准确性。
可选地,生成尾帧的尾帧标注结果之后,还可以包括:在视频标注页面显示尾帧和尾帧标注结果。此外,该方法还可以包括:响应于针对尾帧标注结果执行的修改操作,获得更新后的尾帧标注结果。尾帧标注结果可以为用户确认的标注结果,提高尾帧标注结果的准确性和精度。
可选地,尾帧的标注请求可以为用户触发标注控件生成,也可以为前一个图像帧标注结束由电子设备自动生成。
204:响应于针对第一子片段的中间帧的标注请求,生成中间帧的中间帧标注结果。
中间帧标注结果的生成过程中,可以通过已标注首帧的首帧标注结果和尾帧的尾帧标注结果对未标注的中间帧进行自动标注处理,获得中间帧的中间帧标注结果。通过自动化标注,提高中间帧标注效率和准确性。
示例性地,可以采用半监督的机器学习模型、深度神经网络等计算模型,结合首帧的标注结果,对尾帧或中间帧进行自动标注,实现标注结果的自动正常。关于自动学习图像帧的标注结果的算法,可以参考相关技术的实施方式,在此不再赘述。
中间帧可以为位于首帧和尾帧之间的图像帧,每个中间帧均可以执行响应于针对第一子片段的中间帧的标注请求,生成中间帧的中间帧标注结果的步骤。位于首帧和尾帧之间的中间帧可以包括多个,各中间帧可以按照各自在子片段中的顺序依次进行标注,获得各中间帧的中间帧标注结果。
可选地,中间帧的标注请求可以为用户触发标注控件生成,也可以为前一个图像帧标注结束由电子设备自动生成。
205:根据第一子片段各图像帧的标注结果,在视频标注页面显示待标注视频的视频标注结果。
可选地,可以在视频标注页面显示第一子片段的各图像帧的标注结果。每个图像帧标注完成即可以显示其标注结果。
待标注视频的视频标注结果可以包括:各视频子片段的图像帧的标注结果。
本公开的技术方案中,响应于视频标注请求,从待标注视频中确定待标注视频的待标注子片段中的第一子片段,响应于用户针对第一子片段的首帧执行的标注操作,获得首帧的首帧标注结果,响应于针对第一子片段的尾帧的标注请求,可以生成尾帧的尾帧标注结果,同时,响应于针对第一子片段的中间帧的标注请求,可以生成中间帧的中间帧标注结果。采样手动获得首帧标注结果,自动生成尾帧和中间帧的标注结果,减少手动标注,提高图像帧的标注效率。此外,通过首帧的标注操作,尾帧和中间帧的标注请求,实现与用户的交互标注,提高标注交互的有效性。根据第一子片段各图像帧的标注结果,可以在视频标注页面显示待标注视频的视频标注结果,实现标注过程的可视化。本方案提供的标注方法,增加了图像的标注方式,对图像的标注难度有效降低,提升标注效率,同时提供标注可视化交互,提高标注效果和精度。
为了便于查看各图像帧,可以分区域显示不同的图像帧。
如图3所示,为本公开实施例提供的一种视频标注方法的又一个实施例的流程图,与图2所示实施例的不同之处在于,还包括:
301:在视频标注页面的第一图像区域显示首帧。
302:在视频标注页面的第二图像区域显示尾帧。
303:在视频标注页面的第三图像区域显示中间帧,第三图像区域显示的中间帧数量为预设的图像显示数量。
为了便于理解,如图4所示的视频标注页面400,该视频标注页面400可以包括第一图像区域401,第二图像区域402和第三图像区域403。其中,第一图像区域401用于显示首帧,第二图像区域402可以显示尾帧。第三图像区域403可以用于显示至少一个中间帧。至少一个中间帧的数量为第三图像区域预设的图像显示数量。例如第三图像区域403可以显示若干图像显示数量。
示例性地,图像区域例如可以为图像的标注提示窗口,标注提示窗口可以显示图像的缩略图。视频标注页面可以在右下方的第一图像区域401中显示首帧的缩略图。视频标注页面可以在左下方的第二图像区域402可以显示尾帧的缩略图。位于第一图像区域401和第二图像区域402之间的第三图像 区域403可以用于显示中间帧的缩略图。
第三图像区域403中可以包括多个图像提示窗口,例如图4所示的4031-4035。其中,任意图像提示窗口被选中时,例如被选中的图像提示窗口对4033应的图像可以作为待标注的中间帧,中间帧可以为待标注的中间帧,未被选中的其他图像提示窗口,例如4033可以以区别于其它显示区域显示。检测用户选择中间帧,并执行标注控件的触发操作时,可以确定检测到中间帧的标注请求。当然,在实际应用中,为了实现图像帧的高效标注,可以在中间帧标注结束,启动下一个中间帧的标注之后,直接启动下一个中间帧的标注请求。
可选地,可以在视频标注页面显示标注确认控件404。若在某个图像的图像提示窗口被选中,则该标注确认控件404可以与被选中的图像提示窗口例如中间帧中的4033,建立确认关联。检测用户针对标注确认控件执行的点击操作,可以确定该图像提示窗口对应的中间帧为当前待标注的中间帧,并生成当前待标注的中间帧的标注请求,可以响应于标注请求,执行中间帧的标注。例如被选中的图像提示窗口对4033应的中间帧可以显示于图像显示区域或窗口407中。检测用户针对待标注图像的标注控件执行的标注操作,可以启动对待标注图像的标注操作。例如,标记图像帧中的人脸区域。
本实施例中,可以显示视频标注页面,通过视频标注页面中的不同图像提示区域对首帧、尾帧和中间帧进行针对性提示,便于用户查看各图像帧,提高待标注图像的标注提示效率和提示准确性。
在实际应用中,待标注视频可以包括多个,而同一个标注视频也可能存在多次标注的需求,因此,为了便于对标注视频的标注管理,可以为待标注视频建立标注任务,以便于对各待标注视频进行标注管理。
示例性地,可以提供任务建立页面以实现标注任务的建立以及处理。在响应于视频标注请求,确定待标注视频中的待标注图像之前,还包括:
响应于针对任务建立页面中的任务提示控件执行的点击操作,建立待标注视频的标注任务。
在任务建立页面的任务显示区域显示待标注视频的标注任务的任务提示信息。
响应于用户针对待标注视频的任务提示信息执行的点击操作,生成待标注视频的视频标注请求,并切换至待标注视频的视频标注页面。
标注任务可以指为待标注视频建立的标注进程。任务建立页面可以指用于实现待标注程序的任务的网页,可以通过HTML5、C++、JAVA、IOS等编程语言编写获得。任务建立页面可以包括任务显示区域。任务显示区域可以用于显示各待标注视频的标注任务的任务提示信息。
为了便于理解,如图5所示,为一任务建立页面500的示例图。任务建立页面500可以包括多个控件,例如任务管理控件501、模板预览控件502。检测任务管理控件被触发时,可以显示任务管理子页面503。其中,任务管理子页面可以包括任务提示控件5031和任务显示区域5032。其中,任务提示控件5031的控件名称可以提示建立新的标注任务。任务显示区域5032可以显示建立的标注任务的任务信息。任务信息可以包括:任务标题、任务ID、创建时间、创建人以及任务操作控件。此外,标注任务的任务信息还可以包括其他类型的信息,例如标注方式、标签类型等信息,在此不再赘述。
其中,一个任务可以对应有一条任务信息。
示例性地,在用户针对某个任务提示信息A执行点击操作,生成待标注视频的视频标注请求并切换至待标注视频的视频标注页面之后。假设任务提示信息A对应的标注任务A的视频标注页面可以如图4所示,视频标注页面400中可以包括标注任务A的任务提示信息405:“标注任务A”。任务提示信息405可以对标注任务A对应的标注任务进行提示。标注任务A对应的待标注视频可以被划分为若干视频子片段,视频子片段1-N可以显示于片段提示区域406中,例如图4所示的视频子片段1-视频子片段N。
可选地,建立待标注视频的标注任务可以指为待标注视频建立的任务执行模块,用户可以通过待标注视频的标注任务中的任务操作控件,对标注任务进行标注操作。例如,任务操作控件可以包括:标注控件、质检控件、统计控件等。标注控件可以指待标注视频的标注任务的启动提示控件,检测用户触发标注控件可以生成待标注视频的视频标注请求。质检控件可以指待标注视频的标注任务所产生的标注结果进行质检的提示控件,检测用户触发质检控件可以启动对待标注视频的各图像帧的标注结果的质检流程。统计控件可以指待标注视频的标注次数、标注数量等与标注相关的数据进行提示的控 件,检测用户触发统计控件可以显示待标注视频标注任务产生的各项数据。
可选地,任务提示控件可以包括:新建标注任务的触发控件。
目标路径可以为通过路径选择的待标注视频的存储路径。可以通过目标路径上传待标注视频到标注方法中,进行标注。
本实施例中,可以响应于针对任务建立页面中的任务提示控件执行的点击操作,在任务建立页面的上一层显示视频上传页面。视频上传页面可以包括视频存储路径的路径选择控件,任务提示控件可以用于提示新建的标注任务,以响应于针对路径选择控件执行的路径选择操作,获得目标路径。通过建立目标路径对应待标注视频的标注任务,实现待标注视频的标注任务的建立。以通过视频的路径选择完成视频的标注任务的建立,提高标注任务的建立效率和准确性。
示例性地,第一子片段可以为需要进行标注的视频子片段。视频子片段可以从第一个图像,也即首帧和尾帧的标注开始,依次对剩余的中间帧进行标注。
在实际应用中,第三图像区域可以为滑动窗口。该方法还可以包括:
响应于用户对滑动窗口左侧的第一按钮执行的点击操作,将第三图像区域中显示的中间帧按照各图像帧的顺序向滑动窗口左侧方向滑动,更新第三图像区域中显示的中间帧;
响应于用户对滑动窗口右侧的第二按钮执行的点击操作,将第三图像区域中显示的中间帧按照各图像帧的顺序向滑动窗口右侧方向滑动,更新第三图像区域中显示的中间帧。
如图4所示的第三图像区域403中可以包括多个图像提示窗口,例如图4所示的4031-4035,可以显示的图像提示数量M取值例如为5,因此可以在窗口提示区域401中显示5个图像提示窗口,每个图像提示窗口中可以显示图像帧的缩略图,以对图像帧进行提示。从目标视频子片段的第二图像帧开始,按照各图像帧的顺序在第三图像区域403中显示5个图像帧。
示例性地,第三图像区域403中除图像提示窗之外,还可以包括第一按钮和第二按钮。第一按钮和第二按钮可以如图4所示的三角形,此外第一按钮和第二按钮还可以为其它形状的图案,例如圆形,方形等。
在按照各图像帧的顺序在窗口提示区域显示M个图像帧可以包括:确定当前显示的M个图像帧;在窗口提示区域中依次显示M个图像帧。从M个图像帧中确定第一个未标注的图像为待标注图像。
本公开实施例中,在第三图像区域为滑动窗口时,可以响应于用户对滑动窗口左侧的第一按钮执行的点击操作,获将第三图像中显示的中间帧顺序向左侧方向滑动显示,还可以响应于用户对滑动窗口右侧的第二按钮执行的点击操作,将第三图像区域中显示的中间帧按照各图像帧的顺序向滑动窗口右侧方向滑动。通过点击窗口按钮可以实现用户对滑动窗口中显示的中间帧的更新,通过左右两侧滑动方式不断更新展示的中间帧,实现中间帧的有效展示。
在一种可能的设计中,待标注对象包括一个或多个,第一子片段的图像帧的标注结果包括多个标注对象分别对应的标注子结果。
本公开实施例中,可以在图像帧中标注一个或多个对象,实现对一个图像帧的多对象标注,提高标注效率和准确性。
如图6所示,为本公开实施例提供的一种视频标注方法的又一个实施例的流程图,与前述实施例的不同之处在于,根据第一子片段各图像帧的标注结果,在视频标注页面显示待标注视频的视频标注结果包括:
601:根据第一子片段各图像帧的标注结果,生成第一子片段的标注结果;
602:确定第一子片段标注结束,则从待标注视频对应的至少一个视频子片段的未标注视频子片段中确定待标注的第二子片段。
可选地,可以将待标注视频划分为至少一个视频子片段,并按照至少一个视频子片段分别对应片段顺序依次从至少一个视频子片段中确定目标视频子片段。
第二子片段可以为第二个视频子片段或者第二个视频子片段之后的视频子片段。
可选地,相邻两个视频子片段可以包括相同视频帧,设置前后两个视频子片段的图像帧重叠。为了提高视频标注效率,对于两个相邻视频子片段,可以将前一个视频子片段的最后一个图像帧与后一个视频子片段的第一个图 像帧重叠。因此,通过前一个视频子片段的最后一个图像帧的自动化标注,可以获得片段首帧的首帧标注结果。
可选地,至少一个视频子片段的划分步骤可以包括:提取待标注视频的多个关键帧,以相邻两个关键帧提取一个视频子片段的提取策略,对待标注视频按照多个关键帧提取获得至少一个视频子片段。其中,相邻两个视频子片段的前一个视频子片段的最后一个图像帧与后一个视频子片段的第一个图像帧相同。
作为一种可能的实现方式,从至少一个视频子片段的未标注视频子片段中确定待标注的第二子片段,可以包括:按照至少一个视频子片段各自的片段划分顺序,从未标注的视频子片段中确定待标注的第二子片段。
603:响应于针对第二子片段的首帧发起的标注请求,获取第二子片段的前一个子片段的尾帧标注结果作为第二子片段的首帧的首帧标注结果。
可以读取上一个视频子片段的最后一个图像帧的标注结果,以获得首帧的首帧标注结果。采用图像帧重叠采样方式,可以快速实现图像标注的传递,提高图像的标注效率。
604:响应于针对第二子片段的其它帧发起的标注请求,生成第二子片段的其它帧的标注结果。
可选地,响应于针对第二子片段的其它帧发起的标注请求,生成第二子片段的其它帧的标注结果,可以包括:响应于对第二子片段的尾帧的标注请求,生成尾帧的尾帧标注结果;响应于对第二子片段的中间帧的标注请求,生成中间帧的中间帧标注结果。
可选地,还可以包括:响应于用户针对中间帧的标注结果执行的修改操作,获得中间帧的最终标注结果。更新修改后的中间帧为首帧且最终标注结果为更新后的首帧标注结果,返回执行响应于针对第二子片段的中间帧的标注请求,生成中间帧的中间帧标注结果。
605:在视频标注页面输出第二子片段的各图像帧的标注结果。
第二子片段的视频标注页面的输出界面和内容可以参考第一子片段的输出,在此不再赘述。
本公开实施例中,可以根据第一子片段各图像帧的标注结果,生成第一子片段的标注结果,通过第一子片段的标注结果,从待标注视频对应的至少 一个视频子片段的未标注视频子片段中确定待标注的第二子片段。第二子片段可以为未标注的视频子片段。通过响应于针对第二子片段的首帧发起的标注请求,可以获取前一个视频子片段的尾帧的尾帧标注结果作为第二子片段的首帧的首帧标注结果,使得该首帧自动获取。而对于其他帧,例如尾帧和中间帧,可以生成其他帧的标注结果。在第一子片段之后标注的第二子片段可以从首帧到尾帧均自动地获得标注结果,不需要再进行人工标注,实现高效的视频子片段的标注。另外,通过在视频标注页面输出第二子片段各图像帧的标注结果可以实现标注过程的可视化展示。
作为一个实施例,从待标注视频对应的至少一个视频子片段的未标注视频子片段中确定待标注的第二子片段,包括:
确定待标注视频对应的至少一个视频子片段。
响应于用户针对至少一个视频子片段中任意视频子片段的选择操作,获得被选择的视频子片段为第二子片段。
可选地,至少一个视频子片段可以分别展示片段提示信息。响应于用户对任一个片段提示信息触发的点击操作,获得用户点击的片段提示信息对应的第二子片段。通过用户触发可以实现视频子片段的选择。
本公开实施例中,可以响应于用户针对任意视频子片段的选择操作,获得被选择的视频子片段为第二子片段。用户交互响应,获得用户选择的第二子片段,实现第二子片段的准确选择。
作为又一个实施例,该方法还可以包括:
在视频标注页面的片段提示窗口展示至少一个视频子片段分别对应的片段提示信息,片段展示窗口中利用子窗口展示片段提示信息。
为了实现对至少一个视频子片段的提示,可以在视频标注页面中显示至少一个视频子片段分别对应的片段提示信息。例如图4所示的片段提示区域406中可以显示若干视频子片段的片段提示信息,片段提示信息例如可以以视频子片段的片段名称进行提示,分别为视频子片段1-N,N为至少一个视频子片段的片段数量,视频子片段的片段名称可以根据视频子片段的划分顺序确定,从待标注视频中提取的第一个视频子片段可以命名为视频子片段1, 当然,此命名方式仅仅是示例性的,并不应构成对片段命名方式的具体限定。其中,处于标注状态的视频子片段可以为目标视频子片段。参考图4,视频子片段2可以为处于标注状态的目标视频子片段。处于标注状态的视频子片段2可以处于选中状态,其它视频子片段,例如视频子片段1、视频子片段3-视频子片段N等其它视频子片段均处于未选中状态。
本公开实施例中,可以在视频标注页面的片段提示窗口展示展示一个视频子片段分别对应的片段提示信息,片段展示窗口利用子窗口展示相应的片段提示信息。通过展示片段提示信息可以进行有效的片段提示,提高片段提示效率和有效性。
在某些实施例中,在获取首帧标注结果之后,还包括:
在视频标注页面的结果显示区域显示首帧的首帧标注结果。
在生成尾帧的尾帧标注结果之后,还包括:
将结果显示区域显示的首帧标注结果切换为显示尾帧的尾帧标注结果。
生成中间帧的中间帧标注结果之后,还包括:
将结果显示区域显示的尾帧标注结果切换为显示中间帧的中间帧标注结果。
本实施例中,利用视频标注页面的结果显示区域显示标注结果,通过依次显示首帧、尾帧以及中间帧的标注结果,可以实现标注结果的有效显示。
在一种可能的设计中,响应于针对第一子片段的中间帧发起的标注请求,生成中间帧的中间帧标注结果后,还包括:
响应于用户针对中间帧的标注结果执行的修改操作,获得中间帧的最终标注结果;
更新修改后的中间帧为首帧且最终标注结果为更新后的首帧标注结果,返回执行响应于针对第一子片段的中间帧的标注请求,生成中间帧的中间帧标注结果。
或者,响应于用户针对中间帧的标注结果执行的确认操作,确定中间帧的标注结果为中间帧的最终标注结果。
更新首帧之后,可以根据尾帧和尾帧标注结果,结合新更新的首帧和首 帧标注结果,生成中间帧的中间帧标注结果。结合新的首帧和尾帧的标注结果可以快速而准确地生成中间帧的标注结果。
可选地,响应于用户针对中间帧的标注结果执行的修改操作,获得中间帧的最终标注结果之后,还可以包括:根据中间帧的最终标注结果,重新生成尾帧的尾帧标注结果。根据中间帧的最终标注结果,生成该中间帧至尾帧之间的中间帧的中间帧标注结果。
可选地,还可以在视频标注页面显示针对待标注图像的标注结果设置的结果确认控件;响应于用户针对结果确认控件触发的点击操作,可以确定当前标注的图像帧当前对应的标注结果为该图像帧的最终标注结果。
可选地,还可以在视频标注页面中显示针对待标注图像的标注结果设置的结果修改控件;可以响应于用户针对结果修改控件触发的点击操作,可以检测用户针对待标注图像的标注结果执行的标注修改操作。
其中,修改操作可以包括对中间帧的标注区域、标签类型、标注对象位置等标注结果执行的修改操作。例如,标注区域的更新,可以通过擦除、拖动、滑动等操作获得更新后的区域。标签类型的更新可以指删除标注区域的原标签增加新的标签的区域。标注修改结果可以包括:标签区域修改结果、标签类型的修改结果、标注对象位置的更新等结果中的至少一种。用户可以为待标注图像的标注结果进行人工修改,实现对自动生成的标注结果进行二次修改标注,提高标注精度。
本公开实施例中,可以响应于用户针对中间帧的标注结果执行的修改操作,获得中间帧的最终标注结果。通过与用户交互可以实现对中间帧的及时修订,实现用户对中间帧的标注结果的修订效果,提高标注准确性。还可以响应于用户针对中间帧的标注结果执行的确认操作,确定中间帧的标注结果为中间帧的最终标注结果。利用用户对中间帧的标注结果的确认或者修改操作,可以使得用户对中间帧的标注结果进行个性化监控,使得中间帧的最终标注结果能够与用户需求更匹配,准确度更高。
在实际应用中可以对待标注视频的视频标注结果进行质检。作为又一个实施例,该方法还可以包括:
将待标注视频的视频标注结果发送至质检方。
接收质检方反馈的待标注视频的标注质检结果。
显示待标注视频的质检结果。
可选地,质检方可以包括对待标注视频进行质检的用户对应的电子设备。质检方可以响应于质检的用户对待标注视频的视频标注结果执行的质检操作,获得待标注视频的标注质检结果。
待标注视频的视频标注结果可以包括至少一个视频子片段分别在各自的图像帧对应的标注结果,也即视频标注结果可以包括待标注视频中的各图像帧对应的标注结果。
标注质检结果可以包括视频标注结果中存在标注异常的图像帧。质检方可以检测质检的用户对存在标注异常的图像帧执行的异常触发操作,获得存在标注异常的图像帧。标注异常可以包括图像帧的标注结果和用户所需要的标注结果存在误差。
本公开实施例中,可以将待标注视频的视频标注结果发生至质检方,指示质检方对待标注视频进行质检。通过对待标注视频的标注结果的质检,可以确保待标注视频的标注有效性和可靠性。
如图7所示,为本公开实施例提供的一种图像标注装置的一个实施例的结构示意图。该图像标注装置700可以包括以下几个单元:
第一响应单元701:用于响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段。
第二响应单元702:用于响应于用户针对第一子片段的首帧执行的标注操作,获得首帧的首帧标注结果。
第三响应单元703:用于响应于针对第一子片段的尾帧的标注请求,生成尾帧的尾帧标注结果。
第四响应单元704:用于响应于针对第一子片段的中间帧的标注请求,生成中间帧的中间帧标注结果。
第一显示单元705:用于根据第一子片段各图像帧的标注结果,在视频标注页面显示待标注视频的视频标注结果。
作为一个实施例,该装置可以包括:
第二显示单元,用于在视频标注页面的第一图像区域显示首帧;
第三显示单元,用于在视频标注页面的第二图像区域显示尾帧;
第四显示单元,用于在视频标注页面的第三图像区域显示中间帧,第三图像区域显示的中间帧数量为预设的图像显示数量。
作为又一个实施例,第三图像区域为滑动窗口;还可以包括:
第一响应模块,用于响应于用户对滑动窗口左侧的第一按钮执行的点击操作,将第三图像区域中显示的中间帧按照各图像帧的顺序向滑动窗口左侧方向滑动,更新第三图像区域中显示的中间帧;
第二响应模块,用于响应于用户对滑动窗口右侧的第二按钮执行的点击操作,将第三图像区域中显示的中间帧按照各图像帧的顺序向滑动窗口右侧方向滑动,更新第三图像区域中显示的中间帧。
在某些实施例中,待标注对象包括一个或多个,第一子片段的图像帧的标注结果包括多个标注对象分别对应的标注子结果。
在某些实施例中,第一显示单元,可以包括:
结果生成模块,用于根据第一子片段各图像帧的标注结果,生成第一子片段的标注结果;
片段确定模块,用于确定第一子片段标注结束,则从待标注视频对应的至少一个视频子片段的未标注视频子片段中确定待标注的第二子片段;
片段标注模块,用于响应于针对第二子片段的首帧发起的标注请求,获取第二子片段的前一个子片段的尾帧标注结果作为第二子片段的首帧的首帧标注结果;
结果生成模块,用于响应于针对第二子片段的其它帧发起的标注请求,生成第二子片段的其它帧的标注结果。
结果显示模块,用于在视频标注页面输出第二子片段的各图像帧的标注结果。
在某些实施例中,片段确定模块包括:
视频片段子模块,用于确定待标注视频对应的至少一个视频子片段。
片段选择子模块,用于响应于用户针对至少一个视频子片段中任意视频子片段的选择操作,获得被选择的视频子片段为第二子片段。
作为又一个实施例,还包括:
片段提示单元,用于在视频标注页面的片段提示窗口展示至少一个视频 子片段分别对应的片段提示信息,片段展示窗口中利用子窗口展示片段提示信息。
作为一个实施例,还可以包括:
结果修改单元,用于响应于用户针对中间帧的标注结果执行的修改操作,获得中间帧的最终标注结果;
首帧更新单元,用于更新修改后的中间帧为首帧且最终标注结果为更新后的首帧标注结果,返回执行响应于针对第一子片段的中间帧的标注请求,生成中间帧的中间帧标注结果。
或者,标注确认单元,用于响应于用户针对中间帧的标注结果执行的确认操作,确定中间帧的标注结果为中间帧的最终标注结果。
作为又一个实施例,还包括:
结果发送单元,用于将待标注视频的视频标注结果发送至质检方;
质检接收单元,用于接收质检方反馈的待标注视频的标注质检结果;
质检显示单元,用于显示待标注视频的标注质检结果。
本实施例提供的装置,可用于执行上述方法实施例的技术方案,其实现原理和技术效果类似,本实施例此处不再赘述。
为了实现上述实施例,本公开实施例还提供了一种电子设备。
参考图8,其示出了适于用来实现本公开实施例的电子设备800的结构示意图,该电子设备800可以为终端设备或服务器。其中,终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图8所示,电子设备800可以包括处理装置(例如中央处理器、图形处理器等)801,其可以根据存储在只读存储器(Read Only Memory,简称ROM)802中的程序或者从存储装置808加载到随机访问存储器(Random Access Memory,简称RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有电子设备800操作所需的各种程序和数据。处理装置801、ROM  802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
通常,以下装置可以连接至I/O接口805:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置806;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置807;包括例如磁带、硬盘等的存储装置88;以及通信装置809。通信装置809可以允许电子设备800与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的电子设备800,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储装置808被安装,或者从ROM 802被安装。在该计算机程序被处理装置801执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还 可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述实施例所示的方法。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)-连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可 以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上***(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行***、装置或设备使用或与指令执行***、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体***、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供了一种视频标注方法,包括:
响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段;
响应于用户针对第一子片段的首帧执行的标注操作,获得首帧的首帧标注结果;
响应于针对第一子片段的尾帧的标注请求,生成尾帧的尾帧标注结果;
响应于针对第一子片段的中间帧的标注请求,生成中间帧的中间帧标注结果;
根据第一子片段各图像帧的标注结果,在视频标注页面显示待标注视频的视频标注结果。
根据本公开的一个或多个实施例,还包括:
在视频标注页面的第一图像区域显示首帧;
在视频标注页面的第二图像区域显示尾帧;
在视频标注页面的第三图像区域显示中间帧,第三图像区域显示的中间 帧数量为预设的图像显示数量。
根据本公开的一个或多个实施例,第三图像区域为滑动窗口;还包括:
响应于用户对滑动窗口左侧的第一按钮执行的点击操作,将第三图像区域中显示的中间帧按照各图像帧的顺序向滑动窗口左侧方向滑动,更新第三图像区域中显示的中间帧;
响应于用户对滑动窗口右侧的第二按钮执行的点击操作,将第三图像区域中显示的中间帧按照各图像帧的顺序向滑动窗口右侧方向滑动,更新第三图像区域中显示的中间帧。
根据本公开的一个或多个实施例,待标注对象包括一个或多个,第一子片段的图像帧的标注结果包括多个标注对象分别对应的标注子结果。
根据本公开的一个或多个实施例,根据第一子片段各图像帧的标注结果,在视频标注页面显示待标注视频的视频标注结果包括:
根据第一子片段各图像帧的标注结果,生成第一子片段的标注结果;
确定第一子片段标注结束,则从待标注视频对应的至少一个视频子片段的未标注视频子片段中确定待标注的第二子片段;
响应于针对第二子片段的首帧发起的标注请求,获取第二子片段的前一个子片段的尾帧标注结果作为第二子片段的首帧的首帧标注结果;
响应于针对第二子片段的其它帧发起的标注请求,生成第二子片段的其它帧的标注结果。
在视频标注页面输出第二子片段的各图像帧的标注结果。
根据本公开的一个或多个实施例,从待标注视频对应的至少一个视频子片段的未标注视频子片段中确定待标注的第二子片段,包括:
确定待标注视频对应的至少一个视频子片段。
响应于用户针对至少一个视频子片段中任意视频子片段的选择操作,获得被选择的视频子片段为第二子片段。
根据本公开的一个或多个实施例,还包括:
在视频标注页面的片段提示窗口展示至少一个视频子片段分别对应的片段提示信息,片段展示窗口中利用子窗口展示片段提示信息。
根据本公开的一个或多个实施例,响应于针对第一子片段的中间帧发起的标注请求,生成中间帧的中间帧标注结果之后,还包括:
响应于用户针对中间帧的标注结果执行的修改操作,获得中间帧的最终标注结果;
在结果显示区域显示中间帧的最终标注结果;
或者,响应于用户针对中间帧的标注结果执行的确认操作,确定中间帧的标注结果为中间帧的最终标注结果。
根据本公开的一个或多个实施例,还包括:
将待标注视频的视频标注结果发送至质检方;
接收质检方反馈的待标注视频的标注质检结果;
显示待标注视频的标注质检结果。
第二方面,根据本公开的一个或多个实施例,提供了一种视频标注装置,包括:
第一响应单元,用于响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段;
第二响应单元,用于响应于用户针对第一子片段的首帧执行的标注操作,获得首帧的首帧标注结果;
第三响应单元,用于响应于针对第一子片段的尾帧的标注请求,生成尾帧的尾帧标注结果;
第四响应单元,用于响应于针对第一子片段的中间帧的标注请求,生成中间帧的中间帧标注结果;
第一显示单元,用于根据第一子片段各图像帧的标注结果,在视频标注页面显示待标注视频的视频标注结果。
第三方面,根据本公开的一个或多个实施例,提供了一种电子设备,包括:至少一个处理器和存储器;
存储器存储计算机执行指令;
至少一个处理器执行存储器存储的计算机执行指令,使得至少一个处理器执行如上第一方面以及第一方面各种可能的设计的视频标注方法。
第四方面,根据本公开的一个或多个实施例,提供了一种计算机可读存储介质,计算机可读存储介质中存储有计算机执行指令,当处理器执行计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计的视频标注方法。
第五方面,根据本公开的一个或多个实施例,提供了一种计算机程序产品,包括计算机程序,计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计的视频标注方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (13)

  1. 一种视频标注方法,其特征在于,包括:
    响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段;
    响应于用户针对所述第一子片段的首帧执行的标注操作,获得所述首帧的首帧标注结果;
    响应于针对所述第一子片段的尾帧的标注请求,生成所述尾帧的尾帧标注结果;
    响应于针对所述第一子片段的中间帧的标注请求,生成所述中间帧的中间帧标注结果;
    根据所述第一子片段各图像帧的标注结果,在视频标注页面显示所述待标注视频的视频标注结果。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    在所述视频标注页面的第一图像区域显示所述首帧;
    在所述视频标注页面的第二图像区域显示所述尾帧;
    在所述视频标注页面的第三图像区域显示所述中间帧,所述第三图像区域显示的中间帧数量为预设的图像显示数量。
  3. 根据权利要求2所述的方法,其特征在于,所述第三图像区域为滑动窗口;还包括:
    响应于所述用户对所述滑动窗口左侧的第一按钮执行的点击操作,将所述第三图像区域中显示的中间帧按照各图像帧的顺序向所述滑动窗口左侧方向滑动,更新所述第三图像区域中显示的中间帧;
    响应于所述用户对所述滑动窗口右侧的第二按钮执行的点击操作,将所述第三图像区域中显示的中间帧按照各图像帧的顺序向所述滑动窗口右侧方向滑动,更新所述第三图像区域中显示的中间帧。
  4. 根据权利要求1所述的方法,其特征在于,所述待标注对象包括一个或多个,所述第一子片段的图像帧的标注结果包括多个标注对象分别对应的标注子结果。
  5. 根据权利要求1所述的方法,其特征在于,所述根据所述第一子片段各图像帧的标注结果,在视频标注页面显示所述待标注视频的视频标注结果包括:
    根据第一子片段各图像帧的标注结果,生成第一子片段的标注结果;
    确定所述第一子片段标注结束,则从所述待标注视频对应的至少一个视频子片段的未标注视频子片段中确定待标注的第二子片段;
    响应于针对所述第二子片段的首帧发起的标注请求,获取所述第二子片段的前一个子片段的尾帧标注结果作为所述第二子片段的首帧的首帧标注结果;
    响应于针对所述第二子片段的其它帧发起的标注请求,生成所述第二子片段的其它帧的标注结果;
    在所述视频标注页面输出所述第二子片段的各图像帧的标注结果。
  6. 根据权利要求5所述的方法,其特征在于,所述从所述待标注视频对应的至少一个视频子片段的未标注视频子片段中确定待标注的第二子片段,包括:
    确定所述待标注视频对应的至少一个视频子片段;
    响应于所述用户针对至少一个所述视频子片段中任意视频子片段的选择操作,获得被选择的视频子片段为所述第二子片段。
  7. 根据权利要求5所述的方法,其特征在于,还包括:
    在所述视频标注页面的片段提示窗口展示至少一个所述视频子片段分别对应的片段提示信息,所述片段展示窗口中利用子窗口展示片段提示信息。
  8. 根据权利要求1所述的方法,其特征在于,所述响应于所述针对所述第一子片段的中间帧发起的标注请求,生成所述中间帧的中间帧标注结果之后,还包括:
    响应于所述用户针对所述中间帧的标注结果执行的修改操作,获得所述中间帧的最终标注结果;
    更新修改后的所述中间帧为所述首帧且所述最终标注结果为更新后的所述首帧标注结果,返回执行所述响应于针对所述第一子片段的中间帧的标注请求,生成所述中间帧的中间帧标注结果;
    或者,响应于所述用户针对所述中间帧的标注结果执行的确认操作,确定所述中间帧的标注结果为所述中间帧的最终标注结果。
  9. 根据权利要求1所述的方法,其特征在于,还包括:
    将所述待标注视频的视频标注结果发送至质检方;
    接收所述质检方反馈的所述待标注视频的标注质检结果;
    显示所述待标注视频的标注质检结果。
  10. 一种图像标注装置,其特征在于,包括:
    第一响应单元,用于响应于视频标注请求,确定待标注视频中待标注子片段中的第一子片段;
    第二响应单元,用于响应于用户针对所述第一子片段的首帧执行的标注操作,获得所述首帧的首帧标注结果;
    第三响应单元,用于响应于针对所述第一子片段的尾帧的标注请求,生成所述尾帧的尾帧标注结果;
    第四响应单元,用于响应于针对所述第一子片段的中间帧的标注请求,生成所述中间帧的中间帧标注结果;
    第一显示单元,用于根据所述第一子片段各图像帧的标注结果,在视频标注页面显示所述待标注视频的视频标注结果。
  11. 一种电子设备,其特征在于,包括:处理器、存储器以及输出装置;
    所述存储器存储计算机执行指令;
    所述处理器执行所述存储器存储的计算机执行指令,使得所述处理器配置有如权利要求1至9任一项所述的视频标注方法,所述输出装置用于输出视频标注页面。
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至9任一项所述的视频标注方法。
  13. 一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行,以配置有如权利要求1至9任一项所述的视频标注方法。
PCT/CN2023/131040 2022-11-15 2023-11-10 视频标注方法、装置、设备、介质及产品 WO2024104272A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211430304.2A CN115757871A (zh) 2022-11-15 2022-11-15 视频标注方法、装置、设备、介质及产品
CN202211430304.2 2022-11-15

Publications (1)

Publication Number Publication Date
WO2024104272A1 true WO2024104272A1 (zh) 2024-05-23

Family

ID=85371790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/131040 WO2024104272A1 (zh) 2022-11-15 2023-11-10 视频标注方法、装置、设备、介质及产品

Country Status (2)

Country Link
CN (1) CN115757871A (zh)
WO (1) WO2024104272A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757871A (zh) * 2022-11-15 2023-03-07 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503074A (zh) * 2019-08-29 2019-11-26 腾讯科技(深圳)有限公司 视频帧的信息标注方法、装置、设备及存储介质
US20200210706A1 (en) * 2018-12-31 2020-07-02 International Business Machines Corporation Sparse labeled video annotation
CN112053323A (zh) * 2020-07-31 2020-12-08 上海图森未来人工智能科技有限公司 单镜头多帧图像数据物体追踪标注方法和装置、存储介质
CN113312951A (zh) * 2020-10-30 2021-08-27 阿里巴巴集团控股有限公司 动态视频目标跟踪***、相关方法、装置及设备
CN114117128A (zh) * 2020-08-29 2022-03-01 华为云计算技术有限公司 视频标注的方法、***及设备
CN114973056A (zh) * 2022-03-28 2022-08-30 华中农业大学 基于信息密度的快速视频图像分割标注方法
CN115757871A (zh) * 2022-11-15 2023-03-07 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品
CN115905622A (zh) * 2022-11-15 2023-04-04 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110996138B (zh) * 2019-12-17 2021-02-05 腾讯科技(深圳)有限公司 一种视频标注方法、设备及存储介质
CN112004032B (zh) * 2020-09-04 2022-02-18 北京字节跳动网络技术有限公司 视频处理方法、终端设备及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200210706A1 (en) * 2018-12-31 2020-07-02 International Business Machines Corporation Sparse labeled video annotation
CN110503074A (zh) * 2019-08-29 2019-11-26 腾讯科技(深圳)有限公司 视频帧的信息标注方法、装置、设备及存储介质
CN112053323A (zh) * 2020-07-31 2020-12-08 上海图森未来人工智能科技有限公司 单镜头多帧图像数据物体追踪标注方法和装置、存储介质
CN114117128A (zh) * 2020-08-29 2022-03-01 华为云计算技术有限公司 视频标注的方法、***及设备
CN113312951A (zh) * 2020-10-30 2021-08-27 阿里巴巴集团控股有限公司 动态视频目标跟踪***、相关方法、装置及设备
CN114973056A (zh) * 2022-03-28 2022-08-30 华中农业大学 基于信息密度的快速视频图像分割标注方法
CN115757871A (zh) * 2022-11-15 2023-03-07 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品
CN115905622A (zh) * 2022-11-15 2023-04-04 北京字跳网络技术有限公司 视频标注方法、装置、设备、介质及产品

Also Published As

Publication number Publication date
CN115757871A (zh) 2023-03-07

Similar Documents

Publication Publication Date Title
US10127021B1 (en) Storing logical units of program code generated using a dynamic programming notebook user interface
WO2024104272A1 (zh) 视频标注方法、装置、设备、介质及产品
US8527863B2 (en) Navigating through cross-referenced documents
WO2022111591A1 (zh) 页面生成方法和装置、存储介质和电子设备
CN110070593B (zh) 图片预览信息的显示方法、装置、设备及介质
WO2022002066A1 (zh) 文档内表格浏览方法、装置、电子设备及存储介质
WO2022218034A1 (zh) 交互方法、装置和电子设备
CN113377366B (zh) 控件编辑方法、装置、设备、可读存储介质及产品
US12032816B2 (en) Display of subtitle annotations and user interactions
WO2020220776A1 (zh) 图片类评论数据的展示方法、装置、设备及介质
CN113268180A (zh) 数据标注方法、装置、设备、计算机可读存储介质及产品
WO2024104239A1 (zh) 视频标注方法、装置、设备、介质及产品
WO2023185391A1 (zh) 交互式分割模型训练方法、标注数据生成方法及设备
US20230239546A1 (en) Theme video generation method and apparatus, electronic device, and readable storage medium
WO2024099171A1 (zh) 视频生成方法和装置
WO2022184034A1 (zh) 一种文档处理方法、装置、设备和介质
US20190227634A1 (en) Contextual gesture-based image searching
CN113377365B (zh) 代码显示方法、装置、设备、计算机可读存储介质及产品
CN110673886B (zh) 用于生成热力图的方法和装置
WO2024067144A1 (zh) 图像处理方法、装置、设备、计算机可读存储介质及产品
CN111787188B (zh) 视频播放方法、装置、终端设备及存储介质
WO2022184037A1 (zh) 文档处理方法、装置、设备和介质
JP2024521940A (ja) マルチメディア処理方法、装置、デバイスおよび媒体
CN116170549A (zh) 视频处理方法及设备
CN111460769B (zh) 文章发布方法、装置、存储介质和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23890706

Country of ref document: EP

Kind code of ref document: A1