CN112053323A

CN112053323A - Single-lens multi-frame image data object tracking and labeling method and device and storage medium

Info

Publication number: CN112053323A
Application number: CN202010757794.1A
Authority: CN
Inventors: 郑贺
Original assignee: Tusimple Inc
Current assignee: Tusimple Inc
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-12-08

Abstract

The invention discloses a method and a device for tracking and labeling objects in single-lens multi-frame image data, which are used for solving the problems of low labeling speed and low efficiency caused by one-by-one labeling of the objects in the image data by a labeling operator in the related art. The marking device displays the N frame of image data in the single-lens multi-frame image data, and receives and stores the N frame of marking data of an object; wherein N is a natural number greater than or equal to 1; displaying the image data of the (N + i) th frame, and receiving and storing the labeling data of the (N + i) th frame of the object; wherein i is a natural number greater than or equal to 2; and determining the intermediate frame marking data of each frame of image data between the Nth frame and the (N + i) th frame of the object according to the Nth frame marking data and the (N + i) th frame marking data of the object, and storing the intermediate frame marking data of each frame.

Description

Single-lens multi-frame image data object tracking and labeling method and device and storage medium

Technical Field

The invention relates to the field of data annotation, in particular to a method and a device for tracking and annotating a single-lens multi-frame image data object and a storage medium.

Background

In the related art, when a annotator annotates an object in image data, the annotator needs to observe and identify the object therein, superimpose an annotation frame on the image data to mark the object, and the annotation system records the position and size of the annotation frame in the image data to record annotation information of the object. And when the multi-frame image data acquired by the single lens is labeled, the operations are executed one by one. Therefore, when objects in single-lens multi-frame image data are marked in the related technology, the problems of low marking speed and low marking efficiency exist.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for tracking and labeling an object in single-lens multi-frame image data, so as to solve the problems of slow labeling speed and low efficiency caused by one-by-one labeling of objects in image data by a labeling operator in the related art.

In one aspect, an embodiment of the present application provides a method for tracking and labeling an object in single-lens multi-frame image data, including: the marking device displays the N frame of image data in the single-lens multi-frame image data, and receives and stores the N frame of marking data of an object; wherein N is a natural number greater than or equal to 1; displaying the image data of the (N + i) th frame, and receiving and storing the labeling data of the (N + i) th frame of the object; wherein i is a natural number greater than or equal to 2; and determining the intermediate frame marking data of each frame of image data between the Nth frame and the (N + i) th frame of the object according to the Nth frame marking data and the (N + i) th frame marking data of the object, and storing the intermediate frame marking data of each frame.

In one aspect, an embodiment of the present application provides an apparatus for tracking and labeling an object in single-lens multi-frame image data, including: the processor executes the at least one machine executable instruction to perform the method for tracking and labeling the object in the single-lens multiframe image data.

In one aspect, an embodiment of the present application further provides a non-transitory machine-readable storage medium, which stores at least one machine executable instruction, where the at least one machine executable instruction is executed by a processor to implement the method for tracking and labeling an object in single-lens multi-frame image data as described above.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a block diagram illustrating an apparatus for tracking and labeling an object in single-lens multi-frame image data according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram illustrating an architecture of object tracking and labeling processing in single-lens multi-frame image data according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating an object tracking and labeling process in single-lens multi-frame image data according to an embodiment of the present disclosure;

fig. 4 is another processing flow chart of object tracking and labeling processing in single-lens multi-frame image data according to the embodiment of the present application;

FIG. 5a is a flowchart illustrating an annotation process of an object in the Nth frame of image data in FIG. 3 or FIG. 4;

FIG. 5b is a flowchart illustrating another process of labeling an object in the Nth frame of image data in FIG. 3 or FIG. 4;

FIG. 5c is a flowchart illustrating another process of labeling an object in the Nth frame of image data in FIG. 3 or FIG. 4;

FIG. 6a is a flowchart illustrating the process of labeling an object in the N + i-th frame of image data in FIG. 3 or FIG. 4;

FIG. 6b is a flowchart illustrating another process of labeling an object in the N + i-th frame of image data in FIG. 3 or FIG. 4;

FIG. 6c is a flowchart illustrating another process of labeling an object in the N + i-th frame of image data in FIG. 3 or FIG. 4;

FIG. 7 is a flowchart illustrating another process for labeling objects in the N + i-th frame of image data in FIG. 3 or FIG. 4;

FIG. 8 is a flowchart illustrating an annotation process for an object in the intermediate frame image data shown in FIG. 3 or FIG. 4;

FIG. 9 is a flowchart illustrating another process for labeling objects in the intermediate frame image data of FIG. 3 or FIG. 4;

FIG. 10 is a flowchart illustrating another process for labeling objects in the intermediate frame image data of FIG. 3 or FIG. 4;

FIG. 11 is a flowchart illustrating another process for labeling objects in the intermediate frame image data shown in FIG. 3 or FIG. 4.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the related art, when a annotator annotates an object in image data, the annotator generally needs to observe and identify the object in the image data, and to annotate the object one by one. The labeling method is to superpose a labeling frame on a pixel area expressing an object in the image data, and to associate and record the position and size of the labeling frame and the object type or attribute. When the annotator annotates the object in the multi-frame image data, the above operations need to be repeatedly performed. The operation has the problems of low labeling speed and low labeling efficiency, and the object in the single-lens multi-frame image data cannot be tracked and labeled.

The embodiment of the application provides a scheme for tracking and labeling an object in single-lens multi-frame image data, aiming at the problem. In the scheme, an annotation device displays the nth frame of image data in single-lens multi-frame image data, receives and stores the nth frame of annotation data of an object, displays the (N + i) th frame of image data, and receives and stores the (N + i) th frame of annotation data of the object, wherein at least one frame of intermediate frame of image data is arranged between the nth frame of image data and the (N + i) th frame of image data; and the marking device automatically generates the marking data of each frame intermediate frame between the Nth frame and the (N + i) th frame of the object according to the marking data of the Nth frame and the (N + i) th frame.

The annotation device provided by the embodiment of the application can display two frames of image data successively to an annotator, at least one frame of image data is arranged between the two frames of image data, and the annotation data of the image data of each frame of the object in the middle frame of the object is automatically generated through the annotation data of the same object in the two frames of image data annotated by the annotator; the method can realize high-speed and high-efficiency marking operation through the processing of the marking device, and solve the problems of low marking speed and low efficiency when a marker marks an object in single-lens multi-frame image data in the prior art.

Some embodiments of the present application provide a solution for tracking and labeling an object in single-lens multi-frame image data. Fig. 1 shows a structure of a labeling apparatus provided in an embodiment of the present application, where the apparatus 1 includes a processor 11 and at least one memory 12.

In some embodiments, the at least one memory 12 may be a storage device of various modalities, such as a transitory or non-transitory storage medium. At least one machine executable instruction may be stored in the memory 12, and when executed by the processor 11, the at least one machine executable instruction implements the processing for tracking and labeling an object in single-shot multi-frame image data according to the embodiment of the present application.

In some embodiments, the annotation device 1 may be located on the server side. In other embodiments, the annotation device 1 may also be located in a cloud server. In other embodiments, the annotation device 1 may also be located in the client.

As shown in fig. 2, the object tracking and labeling process in single-lens multi-frame image data provided by the embodiment of the present application may include a front-end process 12 and a back-end process 14. The front end process 12 displays the relevant image data or other data and receives the relevant data or information input by the annotator, for example, the front end process 12 may be a process implemented via a web page or a process implemented via a separate application interface. The back-end processing 14 performs corresponding labeling processing according to the relevant data and information received by the front-end processing 12. After the annotation process is completed, the annotation device 1 may further provide the annotation result to other processes or applications on the client, the server, or the cloud server.

The following describes the tracking and labeling process of an object in single-lens multi-frame image data, which is realized by the labeling device 1 executing at least one machine-executable instruction.

In some embodiments of the present application, the single-shot multi-frame image data may be continuous multi-frame image data, for example, multi-frame continuous image data within 1 second. In some application scenarios, the time interval for acquiring the image data is very short, the scene difference between frames is not large, and in order to save the processing amount, it is not necessary to label the object in each frame of image data, and only the image data with a certain time interval may be labeled, so that in some embodiments, the multi-frame image data may be multi-frame image data with the same time interval. In other scenarios, only image data with obvious scene changes need to be labeled, and part of data can be selected from image data of multiple frames for labeling, so that in some embodiments, multiple frames of image data can be multiple frames of data without same time interval.

In the embodiment of the present application, when labeling an object in multiple frames of image data, 1+ i frames of image data may be used as one processing cycle. For example, in some embodiments, at the time of initiation of the annotation task, the first frame may be set to be the starting frame of the first processing period, the 1+ i-th frame may be set to be the ending frame of the first processing period, and the intermediate i-1 frames may be intermediate frames, in frame order. In the second processing cycle, the 1+ i +1 th frame of image data is taken as a start frame, the 1+ i +1+ i frame is taken as an end frame, and the intermediate i-1 frames are intermediate frames. The other processing cycles are analogized in turn. In other embodiments, if a part of image data (for example, N-1 frame of image data) is already labeled in a plurality of frames of image data, the object tracking labeling process provided by the embodiment of the present application may be performed from a frame of image data to be labeled (for example, the nth frame of image data). In other embodiments, the frame order may be reset, resetting the current inter-frame image data to the beginning frame of a processing cycle, when the annotator finds a new object in the current inter-frame image data.

In the embodiment of the present application, the specific value of i may be set according to the requirements of the application scenario. A larger value of i may be set when the variation between scenes expressed by the plurality of image data is small, and a smaller value of i may be set when the variation between scenes expressed by the plurality of image data is large.

The following describes different embodiments with one processing cycle as an example. It can be understood by those skilled in the art that, although one processing cycle is described as an example below, in an actual application scenario, when the number of frames of the image data is large and a plurality of processing cycles need to be executed, each processing cycle may adopt the processing of the object tracking annotation in the multi-frame image data provided in the embodiment of the present application.

Fig. 3 shows an object tracking and labeling process flow in single-lens multi-frame image data according to an embodiment of the present application, that is, a process of performing object tracking and labeling by a labeling device, where the process includes:

301, displaying the image data of the Nth frame in the single-lens multi-frame image data by the annotation device, and receiving and storing the annotation data of the Nth frame of an object; wherein N is a natural number greater than or equal to 1;

step 303, displaying the image data of the (N + i) th frame, and receiving and storing the labeling data of the (N + i) th frame of the object; wherein i is a natural number greater than or equal to 2;

and 305, determining intermediate frame marking data of each frame of image data between the Nth frame and the (N + i) th frame of the object according to the Nth frame marking data and the (N + i) th frame marking data of the object, and storing the intermediate frame marking data of each frame.

Through the processing shown in fig. 3, for single-shot multi-frame image data, the processing may be performed from the first frame image data, or may be performed from one of the first frame image data and the second frame image data.

Through the processing shown in fig. 3, the annotating device can successively display two frames of image data in the single-lens multi-frame image data to an annotator, at least one frame of image data is spaced between the two frames of image data, and the annotating data of the middle frame of image data of each frame of the object is automatically generated through the annotating data of the same object in the two frames of image data annotated by the annotator; the method can realize high-speed and high-efficiency object tracking and labeling operation through the processing of the labeling device, and solves the problems of low labeling speed and low efficiency when a labeling operator labels the object in single-lens multi-frame image data in the prior art.

Fig. 4 shows another processing flow of object tracking and annotation processing in single-lens multi-frame image data according to an embodiment of the present application, including:

step 401, displaying the nth frame image data in the single-lens multi-frame image data by the annotation device, and receiving and storing the nth frame annotation data of an object; wherein N is a natural number greater than or equal to 1;

step 403, displaying the image data of the (N + i) th frame, and receiving and storing the labeling data of the (N + i) th frame of the object; wherein i is a natural number greater than or equal to 2;

step 405, determining intermediate frame marking data of each frame of image data between the nth frame and the N + i frame of the object according to the nth frame marking data and the N + i frame marking data of the object;

step 407, displaying the current intermediate frame image data;

step 409, receiving a frame resetting instruction, setting the current intermediate frame as the Nth frame of image data, storing the determined current frame marking data of the object as the Nth frame marking data, and receiving and storing marking data of another object;

step 411, displaying the image data of the (N + i) th frame, and receiving and storing the labeling data of the (N + i) th frame of the two objects;

and 413, respectively determining the intermediate frame marking data of each frame of image data between the nth frame and the N + i frame of the two objects according to the nth frame marking data and the N + i frame marking data of the two objects.

It will be appreciated by those of ordinary skill in the art that in the process illustrated in FIG. 4, multiple objects may be distinguished by assigning different identifications to the objects. The identifier assigned to the object may be automatically assigned by the annotating device or may be input by the annotator.

Through the processing shown in fig. 4, on the basis of the processing shown in fig. 3, the labeling device can also label an object that newly appears in the intermediate frame image data, and can avoid the problem that the object that newly appears in the intermediate frame image data cannot be labeled. The process of fig. 4 provides a more flexible way of labeling than that of fig. 3.

In the above-described processing of fig. 3 or 4, the annotation of the object in the image data of the nth frame may be referred to as a first annotation, the annotation of the object in the image data of the N + i th frame may be referred to as a last annotation, and the annotation of the object in the image data of the intermediate frame may be referred to as an intermediate automatic annotation.

In some embodiments, a flow diagram of the first labeled process is shown in FIG. 5a, comprising:

step 501, displaying the nth frame of image data in the single-lens multi-frame image data, and receiving input annotation data for annotating an annotation frame of an object; the marking data comprise position data of a marking frame and size data of the marking frame;

and 503, saving the labeling data of the Nth frame of the object.

In step 501, the annotation data received by the annotation device may be input by the annotator in multiple ways, for example, a specific parameter value is directly input in a data input box in the human-computer interface, a preset button or key on the human-computer interface is clicked, the button or key has a corresponding preset instruction or data, or a corresponding option is selected in a pull-down menu provided by the human-computer interface, the pull-down menu may include one or more sub-menus, each sub-menu may include one or more options, and the annotation device receives the annotation data input by the annotator through the human-computer interface.

In the processing flow of the following embodiment, the processing of the annotation device for receiving the input annotation data or other data is similar to the receiving processing in step 501, and will not be described in detail below.

In step 503, the annotation device stores the annotation data of the nth frame of the object, and may store the identifier of the object in association with the annotation data of the nth frame.

Through the process shown in fig. 5a, the annotating device can receive and store annotation data of the first annotation of an object by the annotator.

On the basis of the processing shown in fig. 5a, after receiving the annotation data, the annotation device can further generate an annotation frame and display the annotation frame, so that the annotation operator can observe the annotation frame conveniently. In some embodiments, another flow diagram of the first annotation process is shown in FIG. 5b, including:

502, generating and displaying a corresponding labeling frame according to the labeling data;

and 503, saving the labeling data of the Nth frame of the object.

In some embodiments, after the labeling device generates and displays the labeling frame according to the labeling data, the labeling staff may further adjust the generated labeling frame to realize accurate labeling of the object. Based on the processing shown in fig. 5b, the labeling device can further adjust the labeling data according to the input of the labeling person to obtain the adjusted labeling data. In other embodiments, another flow diagram of the first labeled process is shown in FIG. 5c, including:

502, generating and displaying a corresponding labeling frame according to the received labeling data;

step 504, receiving input adjustment data of the label box, wherein the adjustment data includes one or more of the following: position adjustment data of the marking frame and size adjustment data of the marking frame;

505, determining to obtain adjusted marking data according to the adjustment data and the received marking data;

and 503, saving the labeling data of the Nth frame of the object.

In step 505, the process of obtaining the adjusted annotation data is determined according to the adjustment data and the annotation data, and the following adjustment method may be included according to the difference of the data included in the annotation data.

In the first adjustment mode, under the condition that the adjustment data comprises position adjustment data of the marking frame, the marking device determines to obtain the adjusted position data of the marking frame according to the position adjustment data of the marking frame and the received position data of the marking frame in the marking data.

In some embodiments, when the position data of the adjusted annotation frame is included in the position adjustment data, the annotation device determines the adjusted position data in the position adjustment data as the position data of the adjusted annotation frame. For example, the received position data is (x, y), the position data of the adjusted labeling frame included in the position adjustment data is (x ', y'), and the labeling device determines the coordinates (x ', y') as the position data of the adjusted labeling frame.

In some embodiments, when the position adjustment data includes the position adjustment direction and the position offset of the labeling frame, the labeling device determines to obtain the adjusted position data of the labeling frame according to the position data of the labeling frame in the labeling data and the position adjustment direction and the position offset of the labeling frame. For example, if the received position data is (x, y), the adjustment direction included in the position adjustment data is the x-axis direction, and the offset amount is a, the adjusted position coordinate is (x ', y), and x' ═ x + a.

And in the second adjustment mode, under the condition that the adjustment data comprises the size adjustment data of the marking frame, the marking device determines to obtain the size data of the adjusted marking frame according to the size adjustment data of the marking frame and the size data of the marking frame in the marking data.

In some embodiments, in a case where the size data of the adjusted marking frame is included in the size adjustment data, the marking device determines the adjusted size data as the size data of the adjusted marking frame. For example, the received size data is s × r, the adjusted size data of the labeling box is s '× r', and the labeling device determines s '× r' as the adjusted size data of the labeling box.

In some embodiments, in a case where the size adjustment data includes size increase/decrease data, the labeling device determines to obtain the size data of the adjusted labeling frame based on the size data of the labeling frame and the size increase/decrease data of the labeling frame in the labeling data. For example, the received size data is s × r, the size increase/decrease data is (+ i, -j), the adjusted size data of the labeling frame is s '× r', s '═ s + i, r' ═ r-j, and the labeling device determines s '× r' as the adjusted size data of the labeling frame.

Although some embodiments of the labeling device for performing the first labeling operation are listed in the embodiments of the present application, in a specific application scenario, other equivalent embodiments or alternative embodiments may also be included.

After the marking device executes the first marking, the last marking can be executed.

In some embodiments, FIG. 6a illustrates a process flow of a last annotation, comprising:

601, displaying the image data of the (N + i) th frame in the single-lens multi-frame image data, and receiving input annotation data for annotating an annotation frame of the object; the marking data comprise position data of a marking frame and size data of the marking frame;

step 603, saving the marking data of the (N + i) th frame of the object.

The manner of receiving the input label data by the label device in step 601 is similar to the manner of receiving in step 501, and is not described herein again. When the labeling device stores the labeling data of the N + i th frame of an object in step 603, the object identifier and the labeling data may also be stored in an associated manner as shown in step 503.

In some embodiments, similar to the process of fig. 5b, in order to facilitate the annotator to observe and identify the annotation condition, the annotation device further generates and displays an annotation box according to the annotation data, and fig. 6b shows another processing flow of the last annotation, which includes:

step 602, generating and displaying a corresponding label frame according to the received label data;

step 603, saving the marking data of the (N + i) th frame of the object.

In some embodiments, similar to the process of fig. 5c, in order to perform a more accurate annotation process, the annotation device further adjusts the generated annotation box according to the adjustment data input by the annotator, and fig. 6c shows a process flow of the last annotation, which includes:

601, displaying the image data of the (N + i) th frame in the single-lens multi-frame image data, and receiving input annotation data of an annotation frame for annotating an object; the marking data comprise position data of a marking frame and size data of the marking frame;

step 604, receiving input adjustment data of the label box, where the adjustment data includes one or more of the following: position adjustment data of the marking frame and size adjustment data of the marking frame;

step 605, determining to obtain adjusted marking data according to the adjustment data and the received marking data;

step 603, saving the marking data of the (N + i) th frame of the object.

In step 605, the processing of determining the adjusted annotation data by annotation can be performed according to the first adjustment mode or the second adjustment mode, which is not described herein again.

FIGS. 6a to 6c show the process of the final annotation by the annotating device according to the annotation data inputted by the annotator. An embodiment of the present application further provides a process for automatically performing last annotation, and in some embodiments, fig. 7 shows another process flow of last annotation, including:

701, displaying the image data of the (N + i) th frame in the single-lens multi-frame image data, and generating and displaying a corresponding annotation frame according to the stored annotation data of the Nth frame;

703, under the condition of receiving input marking frame adjustment data, determining to obtain N + i frame marking data according to the adjustment data and the N frame marking data; wherein the adjustment data comprises one or more of: position adjustment data of the marking frame and size adjustment data of the marking frame; the process proceeds to step 705;

step 704, under the condition that the input annotation frame adjustment data is not received, determining the stored annotation data of the Nth frame as the annotation data of the (N + i) th frame; the process proceeds to step 705;

step 705, saving the N + i frame mark data of the object.

In step 703, the processing of determining to obtain the marking data of the N + i th frame according to the adjustment data may refer to the first debugging manner and the second adjusting manner, which is not described herein again.

Through the processing shown in fig. 7, when a static object is labeled or a labeling person considers that the result of automatic labeling is accurate, the object can be automatically labeled in the last labeling processing; when the dynamic object is marked, the marking condition in the N frame can be displayed to a marker in the last marking processing, so that the marker can conveniently observe and identify the object, and the marking frame of the object in the (N + i) frame can be correspondingly adjusted according to the adjustment data input by the marker.

Although the embodiments of the present application have been described with reference to some embodiments in which the annotating device performs the last annotation operation, in a specific application scenario, other equivalent embodiments or alternative embodiments may also be included.

The marking device can execute intermediate automatic marking processing after executing the first marking and the last marking.

In some embodiments, as shown in FIG. 8, the intermediate automatic annotation process can include:

801, determining to obtain a difference mean value of each item of data in the annotation data according to the (N + i) th frame annotation data, the (N) th frame annotation data and the frame number of the intermediate frame image data;

step 803, determining to obtain the numerical value of each item of data included in the labeling data of each frame of the object according to the numerical value of each item of data included in the labeling data of the Nth frame, the sequence of the image data of the intermediate frames and the determined difference mean value of each item of data in the labeling data;

step 805, storing the inter-frame label data of each frame.

Through the processing shown in fig. 8, the labeling device can automatically perform object labeling processing on each intermediate frame image data according to the result of the first labeling and the result of the last labeling.

In some embodiments, in the case where the annotation data includes annotation frame size data, the process shown in fig. 8 may be implemented as the process flow shown in fig. 9:

801a, determining to obtain a difference mean value of the size data of the labeling frame according to the labeling frame size data included in the labeling data of the (N + i) th frame and the labeling data of the (N) th frame and the frame number of the image data of the intermediate frame;

803a, determining to obtain a numerical value of the dimension data of the labeling frame included in the labeling data of each frame of the intermediate frame of the object according to the dimension data of the labeling frame, the sequence of the image data of the intermediate frame and the determined difference mean value of the dimension data of the labeling frame in the labeling data of the Nth frame;

step 805a, storing the inter-frame label data of each frame.

In step 801a, for example, if the frame size of the nth frame of one object is s × r, the frame size of the N + i th frame is s '× r', and the number of frames of the intermediate frame image is N, the difference mean of the frame size data is u, v, u ═ s)/N, and v ═ r —/N. In step 803a, when determining the frame size data of the annotation data of the 2 nd frame in the intermediate frame, it may be determined that the frame size data s2 × r2, s2 ═ s +2u, and r2 ═ r +2v are obtained.

In some embodiments, in the case where the annotation data includes annotation frame location data, the process shown in FIG. 8 can be implemented as the process flow shown in FIG. 10:

801b, determining to obtain a difference mean value of the position data of the labeling frame according to the labeling frame position data included in the labeling data of the (N + i) th frame and the labeling data of the (N) th frame and the frame number of the image data of the intermediate frame;

803b, determining to obtain the numerical value of the position data of the labeling frame included in the labeling data of each frame of the intermediate frame of the object according to the position data of the labeling frame in the labeling data of the Nth frame, the sequence of the image data of the intermediate frame and the determined difference mean value of the position data of the labeling frame;

step 805b, storing the inter-frame label data of each frame.

In step 801b, for example, the position data of the nth frame of an object is (x, y), the position data of the N + i th frame is (x ', y'), and the mean value of the difference values of the position data of the labeling frame is determined to be p, q, p ═ x)/N, and q ═ y)/N. In step 803b, when determining the position data of the annotation frame in the annotation data of the 2 nd frame in the intermediate frame, it may be determined that the position data of the annotation frame is (x2, y2), x2 is x +2p, and y2 is y +2 q.

In other embodiments, in the case that the annotation data includes annotation frame size data and annotation frame position data, the processing in fig. 9 and 10 may be used to determine and obtain the value of the annotation frame size data and the value of the annotation frame position data included in the annotation data of the intermediate frame of the object.

Through the processing of fig. 8 to 10, the estimated labeling data of the object can be determined. When the object is a static object, the difference between the labeling data of the first labeling and the labeling data of the last labeling of the object is not large, and more accurate intermediate frame labeling data can be obtained through the processing of the images in the figures 8-10. When the object is an object moving in consecutive frames, since the time interval of the consecutive frames is small (for example, in the case of obtaining 60 frames of image data in one second, the time interval between adjacent frames is small), the change in the position of the moving object is also very small relative to such a small time interval, and the motion of the object can be regarded as linear motion, and more accurate inter-frame labeling data can be obtained by the processing shown in fig. 8 to 10.

In other embodiments, for the case that the position or the size of the object in the image data has a large change, after the annotation device determines to obtain the intermediate frame annotation data of the object, the determined intermediate frame annotation data may be further adjusted to obtain the annotation data corresponding to the object in the intermediate frame image data. As shown in fig. 11, on the basis of the process shown in fig. 8, the intermediate automatic labeling process can also be implemented as the following process:

step 807, the labeling device displays the current intermediate frame image data, and generates and displays a corresponding labeling frame according to the determined current intermediate frame labeling data;

step 809, receiving the input adjustment data of the label box, wherein the adjustment data includes one or more of the following: position adjustment data of the marking frame and size adjustment data of the marking frame;

step 810, determining to obtain adjusted current intermediate frame marking data according to the determined current intermediate frame marking data and the adjusted data;

and step 805, storing the current intermediate frame marking data.

The processing of determining the adjustment data in step 809 may refer to the first adjustment method and the second adjustment method, which are not described herein again.

Through the processing shown in fig. 11, the annotation device can display the automatic annotation condition of the intermediate frame to the annotator, and can adjust the annotation data estimated by the intermediate annotation according to the adjustment data input by the annotator, so as to obtain the annotation data more matched with the object in the image data.

Embodiments of the subject matter and the functional operations described in this application can be implemented by various systems, digital electronic circuitry, or computer software, firmware, or hardware, including the structures disclosed in this specification and their equivalents, or combinations of these structures. Embodiments of the subject matter described in this specification can be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded for storage in a tangible, non-transitory computer readable medium, for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of these. The term "data processing unit" or "data processing apparatus" includes all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors, multiple computers. These means may include, in addition to hardware, code that creates an executable environment for the computer program in question, e.g., code that constitutes a processor firewall, a protocol stack, a database management system, an operating system, or a combination of these.

A computer program (also known as a program, software application, script, or code) can be written in any programming language, including compiled or interpreted speech; and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that also stores other programs or data (e.g., one or more scripts stored in a markup language document), or in a separate file dedicated to the program in question, or in a coordinated file (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed by one or more computers at one site or distributed across multiple sites and interconnected by a communication network.

The processes or logic diagrams described in this specification can be executed by one or more programmable processors to execute one or more computer programs and perform processes on input data to generate output results. The processes or logic diagrams may be performed by, and various devices may be implemented as, special purpose logic circuitry, e.g., a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC).

Processors for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. The basic unit of a computer includes a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices, including magnetic, magneto-optical disks, or optical disks. However, a computer need not include these devices. Computer-readable media for storing instructions and data include all forms of non-volatile memory, media and storage devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be replaced by, or incorporated in, special purpose logic circuitry.

While this document contains many specifics, these specifics should not be construed as limitations on the scope of the disclosure, but merely as descriptions of features that may be incorporated into specific embodiments of particular inventions. Some of the features described in separate embodiments in this application may also be combined and implemented in a single embodiment. Features which are described in the context of separate embodiments may also be provided in combination in a single embodiment, or in any suitable subcombination. Also, while features may be described above in certain combinations, one or more features may be deleted from one or more of the claimed combinations and the claimed combinations may be further combined or modified.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in such order to achieve desirable results. Also, the separation of various system components in the embodiments should not be understood as requiring such separation in all embodiments. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for tracking and labeling an object in single-lens multi-frame image data is characterized by comprising the following steps:

the marking device receives and stores the marking data of the Nth frame of an object; wherein N is a natural number greater than or equal to 1;

receiving and storing the labeling data of the (N + i) th frame of the object; wherein i is a natural number greater than or equal to 2;

and determining the intermediate frame marking data of each frame of image data between the Nth frame and the (N + i) th frame of the object according to the Nth frame marking data and the (N + i) th frame marking data of the object, and storing the intermediate frame marking data of each frame.

2. The method of claim 1, wherein receiving and storing the nth frame of annotation data for an object comprises:

receiving input marking data of a marking frame for marking the object; the marking data comprise position data of a marking frame and size data of the marking frame;

and saving the labeling data of the Nth frame of the object.

3. The method of claim 2, wherein after receiving the input label data for labeling the label box of the object, the method further comprises:

receiving input annotation box adjustment data, wherein the adjustment data comprises one or more of the following: position adjustment data of the marking frame and size adjustment data of the marking frame;

and determining to obtain the adjusted marking data according to the adjustment data and the received marking data.

4. The method of claim 1, wherein receiving and storing the N + i frame annotation data for the object comprises:

receiving input marking data of a marking frame for marking the object; the marking data comprises position data of a marking frame and size data of the marking frame; the marking data comprise position data of a marking frame and size data of the marking frame;

and saving the marking data of the (N + i) th frame of the object.

5. The method of claim 4, further comprising, after receiving input label data for labeling a label box of the object:

6. The method of claim 1, wherein receiving and storing the N + i frame annotation data for the object comprises:

under the condition that the input marking frame adjusting data is not received, determining the stored marking data of the Nth frame as marking data of the (N + i) th frame;

under the condition of receiving input marking frame adjusting data, determining to obtain N + i frame marking data according to the adjusting data and the N frame marking data; wherein the adjustment data comprises one or more of: position adjustment data of the marking frame and size adjustment data of the marking frame;

and saving the marking data of the (N + i) th frame of the object.

7. The method of claim 1, wherein generating the inter-frame labeling data of each frame of image data between the nth frame and the N + i frame of the object according to the nth frame labeling data and the N + i frame labeling data of the object comprises:

determining to obtain the difference mean value of each item of data in the annotation data according to the (N + i) th frame annotation data, the (N) th frame annotation data and the frame number of the intermediate frame image data;

determining to obtain the numerical value of each item of data included in the labeling data of each frame of the object according to the numerical value of each item of data included in the labeling data of the Nth frame, the sequence of the image data of the intermediate frames and the determined difference mean value of each item of data in the labeling data; the marking data comprises position adjusting data of the marking frame and size adjusting data of the marking frame.

8. The method of claim 7, wherein determining the value of each item included in the inter-frame annotation data of each frame of the object further comprises:

and determining to obtain the adjusted current intermediate frame marking data according to the determined current intermediate frame marking data and the adjusted data.

9. The method according to any one of claims 3, 5, 6, and 8, wherein in a case that the adjustment data includes position adjustment data of the annotation frame, determining to obtain adjusted annotation data according to the adjustment data and the annotation data includes:

and determining to obtain the position data of the adjusted marking frame according to the position adjustment data of the marking frame and the position data of the marking frame in the received marking data.

10. The method of claim 9, wherein the position adjustment data comprises position data of the adjusted label box;

determining to obtain the position data of the adjusted marking frame according to the position adjustment data of the marking frame and the position data of the marking frame in the marking data, and the method comprises the following steps: and determining the adjusted position data in the position adjustment data as the adjusted position data of the labeling frame.

11. The method of claim 9, wherein the position adjustment data includes a position adjustment direction and a position offset of the label box;

determining to obtain the position data of the adjusted marking frame according to the position adjustment data of the marking frame and the position data of the marking frame in the marking data, and the method comprises the following steps: and determining to obtain the adjusted position data of the marking frame according to the position data of the marking frame in the marking data and the position adjusting direction and the position offset of the marking frame.

12. The method according to any one of claims 3, 5, 6, and 8, wherein in a case that the adjustment data includes resizing data of the annotation frame, determining to obtain adjusted annotation data according to the adjustment data and the annotation data includes:

and determining to obtain the size data of the adjusted marking frame according to the size adjustment data of the marking frame and the size data of the marking frame in the marking data.

13. The method of claim 12, wherein the resizing data comprises resizing data of the adjusted callout box;

according to the size adjustment data of the labeling frame and the size data of the labeling frame in the labeling data, the size data of the adjusted labeling frame is determined and obtained, and the method comprises the following steps: and determining the adjusted size data as the size data of the adjusted marking frame.

14. The method of claim 12, wherein the resizing data comprises resizing data;

according to the size adjustment data of the labeling frame and the size data of the labeling frame in the labeling data, the size data of the adjusted labeling frame is determined and obtained, and the method comprises the following steps: and determining to obtain the adjusted size data of the marking frame according to the size data of the marking frame and the size increasing and decreasing data of the marking frame in the marking data.

15. The method of claim 1, wherein determining the inter-frame labeling data for each frame of image data between the nth frame and the N + i frame of the object further comprises:

the marking device displays the current intermediate frame image data;

and receiving a frame resetting instruction, and setting the current intermediate frame as the Nth frame of image data.

16. An apparatus for single-shot multi-frame image data object tracking annotation, comprising a processor and at least one memory, the at least one memory having at least one machine executable instruction stored therein, the processor executing the at least one machine executable instruction to perform the method of any one of claims 1 to 15.

17. A non-volatile storage medium having stored thereon at least one machine executable instruction, the at least one machine executable instruction when executed by a processor implementing a method as claimed in any one of claims 1 to 15.