CN112053388A

CN112053388A - Multi-camera multi-frame image data object tracking and labeling method and device and storage medium

Info

Publication number: CN112053388A
Application number: CN202010757802.2A
Authority: CN
Inventors: 郑贺
Original assignee: Tusimple Inc
Current assignee: Tusimple Inc
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-12-08

Abstract

The invention discloses a method and a device for tracking and labeling an object in multi-camera multi-frame image data, which are used for solving the problem that the object in the multi-camera multi-frame image data cannot be tracked and labeled in the related technology. The marking device acquires and stores the marking data of the same object in the multi-frame image data at the Nth time point in a related manner; the marking data comprises the identification of the object and marking frame data of the object in each frame of image data, and the marking frame data comprises position data and size data of a marking frame; acquiring and storing the annotation data of the object in the multi-frame image data of the (N + i) th time point in a correlated manner; according to the labeling data of the nth time point and the labeling data of the (N + i) th time point of the object, determining the labeling data of each intermediate time point of the object between the nth time point and the (N + i) th time point; and storing the labeling data of the object at each intermediate time point in an associated manner.

Description

Multi-camera multi-frame image data object tracking and labeling method and device and storage medium

Technical Field

The invention relates to the field of data annotation, in particular to a method and a device for tracking and annotating an object in multi-frame image data and a storage medium.

Background

In the related art, when a annotator annotates an object in image data, the annotator needs to observe and identify the object therein, superimpose an annotation frame on the image data to mark the object, and the annotation system records the position and size of the annotation frame in the image data to record annotation information of the object. When multi-frame image data are labeled, the above operations are executed one by one, which cannot perform related labeling on the same object in the multi-frame image data acquired from the same scene by a plurality of cameras, and even cannot perform tracking labeling on the object in the multi-camera multi-frame image data. It can be seen that the related art cannot track and label the object in the multi-camera multi-frame image data.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for tracking and labeling an object in multi-camera multi-frame image data, so as to solve the problem in the related art that the object in the multi-camera multi-frame image data cannot be tracked and labeled.

In one aspect, an embodiment of the present application provides a method for continuously labeling an object in multi-camera multi-frame image data, including:

the method comprises the steps that a labeling device acquires and stores labeling data of the same object in multi-frame image data of an Nth time point in a correlated mode, wherein the multi-frame image data of the Nth time point are image data acquired by a plurality of cameras at the Nth time point in the same scene, and N is a natural number; the marking data comprises the identification of the object and marking frame data of the object in each frame of image data, and the marking frame data comprises position data and size data of a marking frame;

acquiring and storing the annotation data of the object in the multi-frame image data of the (N + i) th time point in an associated manner, wherein the multi-frame image data of the (N + i) th time point is the image data acquired by a plurality of cameras at the (N + i) th time point on the same scene, and i is a natural number greater than or equal to 2;

according to the labeling data of the nth time point and the labeling data of the (N + i) th time point of the object, determining the labeling data of each intermediate time point of the object between the nth time point and the (N + i) th time point; the marking data of one middle time point comprises the marking data of the object in the multi-frame image data acquired by the multi-camera for the same scene at the time point; and storing the labeling data of the object at each intermediate time point in an associated manner.

In one aspect, an embodiment of the present application provides an apparatus for continuously labeling an object in multi-camera multi-frame image data, including: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the method for continuous labeling of an object in multi-camera multi-frame image data as described above.

In one aspect, the present application further provides a non-transitory machine-readable storage medium storing at least one machine executable instruction, where the at least one machine executable instruction is executed by a processor to implement the method for continuously labeling an object in multi-camera multi-frame image data as described above.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is a block diagram of a device for continuously labeling an object in multi-camera multi-frame image data according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of an architecture for continuous object labeling processing in multi-camera multi-frame image data according to an embodiment of the present disclosure;

fig. 3a is a flowchart of an embodiment of a process for tracking and labeling an object in multi-camera multi-frame image data;

fig. 3b is a flowchart of another processing of object tracking labeling in multi-camera multi-frame image data according to the embodiment of the present application;

fig. 4 is a flowchart of another processing of object tracking labeling in multi-camera multi-frame image data according to an embodiment of the present application;

FIG. 5a is a flowchart of a process for first labeling according to an embodiment of the present application;

FIG. 5b is another process flow diagram of the first annotation provided in the embodiments of the present application;

FIG. 5c is another process flow diagram of the first annotation provided in the embodiments of the present application;

FIG. 6a is a flowchart of a process for a last annotation provided in the embodiments of the present application;

FIG. 6b is another processing flow diagram of the last annotation provided in the embodiments of the present application;

FIG. 6c is another processing flow diagram of the last annotation provided in the embodiments of the present application;

FIG. 6d is another processing flow diagram of the last annotation provided in the embodiments of the present application;

FIG. 7a is a flowchart of a process for automatic annotation intermediation according to an embodiment of the present application;

FIG. 7b is a flowchart of another process for automatically labeling the middle of the document according to the embodiment of the present application;

FIG. 7c is a flowchart of another process for automatically labeling the middle of the document according to the embodiment of the present application;

fig. 7d is another processing flow diagram of the intermediate automatic labeling provided in the embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the related art, when a annotator annotates an object in image data, the annotator generally needs to observe and identify the object in the image data, and to annotate the object one by one. The labeling method is to superpose a labeling frame on a pixel area expressing an object in image data, and to associate and record the position and size of the labeling frame and the object type or attribute. When the annotator annotates the object in the multi-frame image data, the above operations need to be repeatedly performed. Such operation has the problems of slow labeling speed and low labeling efficiency. Moreover, for the related labeling that the multi-camera cannot effectively correlate the multi-frame image data acquired from the same scene, the multi-camera cannot track and label the multi-frame image data acquired from the same scene within a period of time.

The embodiment of the application provides a scheme for tracking and labeling an object in multi-camera multi-frame image data. In this scheme, the labeling device needs to perform object tracking and labeling on multiple frames of image data acquired from the same scene by multiple cameras at multiple time points within a period of time. The labeling device respectively obtains and stores the labeling data of an object in the multi-frame image data of the multiple cameras at the Nth time point (or called as a start time point) and the labeling data of the object in the multi-frame image data of the multiple cameras at the N + i th time point (or called as an end time point) in a related mode, determines and obtains the labeling data of the object in the multi-frame image data of the multiple cameras at least one middle time point between the Nth time point and the N + i th time point according to the labeling data and stores the labeling data of the object at each middle time point in a related mode. According to the scheme, the labeling device determines to obtain the labeling data of the object at least one intermediate time point by acquiring and storing the labeling data of the object at the starting time point and the labeling data of the object at the ending time point in a correlated manner, so that the correlated labeling of the same object in the multi-frame image data of the multiple cameras can be realized, and the tracking labeling of the same object in the multi-frame image data of the multiple cameras within a period of time can be realized.

Some embodiments of the present application provide a scheme for tracking and labeling an object in multi-camera multi-frame image data. Fig. 1 shows a structure of an annotation device provided in an embodiment of the present application, where the annotation device 1 includes at least one processor 11 and at least one memory 12.

The at least one processor 11 and the at least one memory 12 may be provided in the same terminal. In some embodiments, the annotation device 1 may be located on the server side. In other embodiments, the annotation device 1 may also be located in a cloud server. In other embodiments, the annotation device 1 may also be located in the client.

The at least one processor 11 and the at least one memory 12 may also be provided in different terminals, respectively. In some embodiments, a portion of the processor 11 and a portion of the memory 12 are located on the server side, and other portions of the processor 11 and other portions of the memory 12 are located on the client side.

In some embodiments, the at least one memory 12 may be a storage device of various modalities, such as a transitory or non-transitory storage medium. The at least one memory 12 may store at least one machine executable instruction, and the at least one machine executable instruction, when executed by the at least one processor 11, may implement the processing for tracking and labeling an object in multiple frames of image data according to the embodiment of the present disclosure.

As shown in fig. 2, the object tracking and labeling process in the multi-camera multi-frame image data provided by the embodiment of the present application may include a front-end process 12 and a back-end process 14. The front-end processing 12 and the back-end processing 14 may be executed in the same terminal, or may be executed in cooperation in different terminals.

The front end process 12 displays the relevant image data or other data and receives the relevant data or information input by the annotator, for example, the front end process 12 may be a process implemented via a web page or a process implemented via a separate application interface. The back-end processing 14 performs corresponding labeling processing according to the relevant data and information received by the front-end processing 12. After the annotation process is completed, the annotation device 1 may further provide the annotation result to other processes or applications on the client, the server, or the cloud server.

The following describes the tracking and labeling process of an object in multi-camera multi-frame image data, which is realized by the labeling device 1 executing at least one machine executable instruction.

In some embodiments of the present application, the annotation device needs to perform object tracking annotation on multiple frames of image data acquired by multiple cameras from the same scene at multiple time points within a period of time, where the period of time includes multiple time points with fixed time intervals, and the fixed time intervals may be the frame rate at which the cameras acquire the image data, or an integral multiple of the frame rate. During the period, at each time point, multiple cameras correspondingly acquire multiple frames of image data, that is, each camera acquires one frame of image data at each time point, and hereinafter, this case will be described as multiple frames of image data of multiple cameras at one time point, or briefly described as multiple frames of image data at one time point.

In the embodiment of the present application, when labeling an object in multi-frame image data, 1+ i time points may be used as one processing cycle. For example, in some embodiments, at the time of initiation of the annotation task, the first time point may be set to the beginning time point of the first processing cycle, the 1+ i-th time point may be set to the ending time point of the first processing cycle, and the intermediate i-1 time points may be intermediate time points in chronological order. In the second processing cycle, the 1+ i +1 st time point is taken as a starting time point, the 1+ i +1+ i th time point is taken as an ending time point, and the middle i-1 time point is taken as a middle time point. The other processing cycles are analogized in turn. In other embodiments, if multiple frames of image data at partial time points (for example, N-1 frames of image data) are already labeled at multiple time points, the object tracking labeling process provided by the embodiment of the present application may be performed from the time point to be labeled (for example, the nth time point). In other embodiments, the frame order may be reset, resetting the current intermediate time point to the starting time point of one processing cycle, when the annotator finds a new object in one frame of image data at the current intermediate time point.

The following describes different embodiments with one processing cycle as an example. It can be understood by those skilled in the art that, although one processing cycle is described as an example below, in an actual application scenario, when the number of frames of the image data is large and a plurality of processing cycles need to be executed, each processing cycle may adopt the processing of the object tracking annotation in the multi-frame image data provided in the embodiment of the present application.

In some embodiments, fig. 3a shows a processing flow of object tracking and labeling in multi-camera multi-frame image data provided by an embodiment of the present application, that is, a processing of object tracking and labeling performed by a labeling device, including:

303, acquiring and storing annotation data of the same object in the multi-frame image data at the nth time point in a related manner by the annotation device, wherein the annotation data comprises an identification of the object and annotation frame data of the object in each frame of image data, and the annotation frame data comprises position data and size data of an annotation frame;

307, acquiring and storing the annotation data of the object in the multi-frame image data of the (N + i) th time point in a related manner;

step 309, determining the labeling data of each intermediate time point of the object between the nth time point and the N + i th time point according to the labeling data of the nth time point and the labeling data of the N + i th time point of the object; the marking data of one middle time point comprises the marking data of the object in the multi-frame image data acquired by the multi-camera for the same scene at the time point;

and 311, correlating and saving the labeling data of the object at each intermediate time point.

The process shown in FIG. 3a may be a back-end process of the trace annotation process.

Fig. 3b shows another processing flow of object tracking and labeling in multi-camera multi-frame image data according to the embodiment of the present application, that is, a processing of object tracking and labeling performed by a labeling device, including:

301, displaying multi-frame image data of an nth time point by a labeling device, wherein the multi-frame image data of the nth time point is image data acquired by a plurality of cameras at the nth time point on the same scene, and N is a natural number;

303, acquiring and storing the labeling data of the same object in the multi-frame image data at the nth time point in a related manner, wherein the labeling data comprises the identification of the object and the labeling frame data of the object in each frame of image data, and the labeling frame data comprises the position data and the size data of a labeling frame;

step 305, displaying multi-frame image data of an N + i time point, wherein the multi-frame image data of the N + i time point is image data acquired by a plurality of cameras at the N + i time point on the same scene, and i is a natural number greater than or equal to 2;

The processing shown in FIG. 3b may be front-end processing and back-end processing of trace annotations.

By the processing shown in fig. 3a or fig. 3b, the multi-frame image data of the multi-camera at a plurality of time points may be processed from the multi-frame image data at the first time point, or from the multi-frame image data at any one of the time points.

Through the processing shown in fig. 3a or fig. 3b, the annotating device can successively display the multi-frame image data of the starting time point and the ending time point in one processing cycle to the annotator, and automatically determine the annotation data of the multi-frame image data of the object at each intermediate time point through the annotation data of the same object in the multi-frame image data at the two time points annotated by the annotator; the object tracking and labeling operation of multi-camera multi-frame image data can be realized at high speed and high efficiency through the processing of the labeling device, and the problem that the multi-frame image data of multiple cameras cannot be labeled in the prior art is solved.

Fig. 4 shows another processing flow of the object tracking and labeling process in the multi-camera multi-frame image data provided by the embodiment of the present application, where the process includes a front-end process and a back-end process, and the process further includes, after step 311 of fig. 3 b:

313, displaying the multi-frame image data of the current middle time point; the current intermediate time point may be a time point between the start time point and the end time point, which is sequentially reached according to a time point sequence;

step 315, receiving a frame resetting instruction, setting the current intermediate time point as an nth time point, saving the determined current time point annotation data of the object as new nth time point annotation data, and acquiring and associatively saving annotation data of the time point of another object;

step 317, displaying the multi-frame image data of the (N + i) th time point, and acquiring and respectively and correlatively storing the labeling data of the (N + i) th time point of the two objects;

319, respectively determining the labeling data of each intermediate time point between the nth time point and the N + i th time point of the two objects according to the labeling data of the nth time point and the labeling data of the N + i th time point of the two objects;

and step 321, storing the labeling data of the two objects at each intermediate time point in a correlated manner.

It will be appreciated by those of ordinary skill in the art that in the process illustrated in FIG. 4, multiple objects may be distinguished by assigning different identifications to the objects. The identifier assigned to the object may be automatically assigned by the annotating device or may be input by the annotator.

Through the processing shown in fig. 4, the labeling device can also label an object that newly appears in the intermediate frame image data, and can avoid the problem that the object that newly appears in the intermediate frame image data cannot be labeled. The process shown in fig. 4 provides a more flexible and efficient way of labeling than that shown in fig. 3a and 3 b.

In the above-described processing of fig. 3a, 3b, or 4, the annotation of an object in the multi-frame image data at the nth time point may be referred to as a first annotation, the annotation of the object in the multi-frame image data at the N + i th time point may be referred to as a last annotation, and the annotation of the object in the multi-frame image data at each intermediate time point may be referred to as an intermediate automatic annotation.

The process flow for the first annotation in some embodiments is shown in FIG. 5a, which includes:

step 501, displaying multi-frame image data acquired by a plurality of cameras in the same scene at the Nth time point by a marking device;

step 503, receiving a labeling instruction, and acquiring the identification of the object and the labeling frame data of the object in each frame of image data;

and 505, storing the obtained identification of the object and the data of the labeling frame of the object in each frame of image data in an associated manner as the labeling data of the object at the nth time point.

In step 503, the annotating device receives the annotation command and obtains the data of the annotation box, which may be input by the annotator in various ways, for example, directly input a specific parameter value in the data input box in the human-computer interface, click a preset button or key on the human-computer interface, where the button or key has a corresponding preset command or data, or select a corresponding option in a pull-down menu provided by the human-computer interface, where the pull-down menu may include one or more sub-menus, each sub-menu may include one or more options, and the annotating device obtains the annotation data input by the annotator through the human-computer interface.

In the processing flow of the following embodiment, the process of the annotation device acquiring the input annotation data or other data is similar to the acquiring process in step 501, and will not be described in detail below.

The step 503 can be implemented in two ways;

the labeling method is as follows: the received marking instructions are one-by-one marking instructions, and the object identification input for the object and the marking frame data of the object in each frame of image data are respectively obtained;

and a second labeling mode: acquiring marking frame data of the object in each frame of image data, receiving a merging marking instruction, and acquiring an identifier of the object;

in the above mode, the labeling mode is one of one-by-one labeling mode, and the labeling mode is a merging labeling mode.

On the basis of the processing shown in fig. 5a, after the annotation device obtains the annotation data, it may further generate an annotation frame and display the annotation frame, so that the annotator can observe the annotation frame conveniently. In some embodiments, another flow diagram of the first annotation process is shown in FIG. 5b, including:

step 504, generating and displaying a corresponding labeling frame according to the labeling data;

In some embodiments, after the labeling device generates and displays the labeling frame according to the labeling data, the labeling staff may further adjust the generated labeling frame to realize accurate labeling of the object. Based on the processing shown in fig. 5b, the labeling device can further adjust the labeling data according to the input of the labeling person to obtain the adjusted labeling data. In other embodiments, another flow diagram of the first labeled process is shown in FIG. 5c, including:

step 506, obtaining the inputted annotation frame adjustment data of the object in the frame of image data, wherein the annotation frame adjustment data comprises the position adjustment data of the annotation frame and/or the size adjustment data of the annotation frame;

step 507, determining to obtain adjusted marking frame data according to the obtained marking frame data and marking frame adjustment data;

In step 507, the adjusted marking frame data is determined according to the obtained marking frame data and marking frame adjustment data, and the following adjustment modes may be included according to the difference of data included in the marking frame adjustment data.

In the first adjustment mode, under the condition that the marking frame adjustment data comprises position adjustment data of the marking frame, the marking device determines to obtain the adjusted position data of the marking frame according to the position adjustment data of the marking frame and the acquired position data of the marking frame in the marking data.

In some embodiments, when the position data of the adjusted annotation frame is included in the position adjustment data, the annotation device determines the adjusted position data in the position adjustment data as the position data of the adjusted annotation frame. For example, the acquired position data is (x, y), the position data of the adjusted labeling frame included in the position adjustment data is (x ', y'), and the labeling device determines the coordinates (x ', y') as the position data of the adjusted labeling frame.

In some embodiments, when the position adjustment data includes the position adjustment direction and the position offset of the labeling frame, the labeling device determines to obtain the adjusted position data of the labeling frame according to the position data of the labeling frame in the labeling data and the position adjustment direction and the position offset of the labeling frame. For example, if the acquired position data is (x, y), the adjustment direction included in the position adjustment data is the z-axis direction, and the offset amount is a, the adjusted position coordinate is (x ', y), and x' is x + a.

And in the second adjustment mode, under the condition that the marking frame adjustment data comprise the size adjustment data of the marking frame, the marking device determines to obtain the size data of the adjusted marking frame according to the size adjustment data of the marking frame and the size data of the marking frame in the marking data.

In some embodiments, in a case where the size data of the adjusted marking frame is included in the size adjustment data, the marking device determines the adjusted size data as the size data of the adjusted marking frame. For example, the acquired size data is s × r, the adjusted size data of the labeling box is s '× r', and the labeling device determines s '× r' as the adjusted size data of the labeling box.

In some embodiments, in a case where the size adjustment data includes size increase/decrease data, the labeling device determines to obtain the size data of the adjusted labeling frame based on the size data of the labeling frame and the size increase/decrease data of the labeling frame in the labeling data. For example, the acquired size data is s × r, the size increase/decrease data is (+ i, -j), the adjusted size data of the labeling frame is s '× r', s '═ s + i, r' ═ r-j, and the labeling device determines s '× r' as the adjusted size data of the labeling frame.

Although some embodiments of the labeling device for performing the first labeling operation are listed in the embodiments of the present application, in a specific application scenario, other equivalent embodiments or alternative embodiments may also be included. After the marking device executes the first marking, the last marking can be executed.

In some embodiments, FIG. 6a illustrates a process flow of a last annotation, comprising:

601, displaying multi-frame image data acquired by the multi-camera at the N + i th time point in the same scene by the marking device;

step 603, receiving a labeling instruction, and acquiring the identification of the object and the labeling frame data of the object in each frame of image data;

and step 605, storing the obtained identification of the object and the data of the labeling frame of the object in each frame of image data in an associated manner as the labeling data of the object at the (N + i) th time point.

The manner of acquiring the input annotation command and the annotation frame data by the annotation device in step 603 is similar to the manner of acquiring in step 503, and is not described here again.

In some embodiments, similar to the process of fig. 5b, in order to facilitate the annotator to observe and identify the annotation condition, the annotation device further generates and displays an annotation box according to the annotation data, and fig. 6b shows another processing flow of the last annotation, which includes:

step 604, generating and displaying a corresponding labeling frame according to the labeling data;

In some embodiments, similar to the process of fig. 5c, in order to perform a more accurate annotation process, the annotation device further adjusts the generated annotation box according to the adjustment data input by the annotator, and fig. 6c shows a process flow of the last annotation, which includes:

step 606, obtaining the inputted annotation frame adjustment data of the object in the frame of image data, wherein the annotation frame adjustment data comprises the position adjustment data of the annotation frame and/or the size adjustment data of the annotation frame;

step 607, determining to obtain adjusted marking frame data according to the obtained marking frame data and marking frame adjustment data;

In step 607, the process of determining the adjusted annotation data by annotation can be performed according to the above-mentioned first or second adjustment manner, which is not described herein again.

In some embodiments, the present application further provides a method for processing a last annotation, as shown in fig. 6d, including:

step 602, displaying a corresponding labeling frame according to the stored labeling data of the Nth frame;

step 608, in a case where the annotation frame adjustment data of the object in the input frame of image data at the N + i th time point is not acquired, determining the saved annotation frame data of the frame of image data at the N th time point of the object as the annotation frame data at the N + i th time point of the object;

step 609, under the condition of acquiring the input annotation frame adjustment data, determining and obtaining the annotation frame data of the frame of image data of the N + i time point of the object according to the annotation frame adjustment data and the annotation frame data of the frame of image data of the N time point; the marking frame adjusting data comprise position adjusting data of the marking frame and/or size adjusting data of the marking frame;

and 605, storing the identification of the object and the data of the labeling frame of the object in the multi-frame image data in a correlated manner as the labeling data of the (N + i) th time point of the object.

In step 609, reference may be made to the first adjustment method and the second adjustment method according to the processing of the label frame adjustment data and the label frame data of the frame of image data at the nth time point.

Through the processing shown in fig. 6d, when a static object is labeled or a labeling person considers that the result of automatic labeling is accurate, the object can be automatically labeled in the last labeling processing; when the dynamic object is marked, the marking condition in the N frame can be displayed to a marker in the last marking processing, so that the marker can conveniently observe and identify the object, and the marking frame of the object in the (N + i) frame can be correspondingly adjusted according to the adjustment data input by the marker.

FIGS. 6a to 6d show the process of the final annotation by the annotating device according to the annotation data inputted by the annotator.

Although the embodiments of the present application have been described with reference to some embodiments in which the annotating device performs the last annotation operation, in a specific application scenario, other equivalent embodiments or alternative embodiments may also be included.

The marking device can execute intermediate automatic marking processing after executing the first marking and the last marking.

In some embodiments, as shown in FIG. 7a, the intermediate automatic annotation process may include:

701a, determining to obtain a difference mean value of each item of data in the labeling frame data of the image data corresponding to each camera according to the labeling data of the nth time point and the labeling data of the (N + i) th time point of the object and the number of the intermediate time points;

703a, determining to obtain the labeling frame data in the image data corresponding to each camera at each intermediate time point of the object according to the labeling frame data of the image data corresponding to each camera at the Nth time point, the sequence of the intermediate time points and the determined difference mean value of each item of data in the labeling frame data;

step 705a, storing the annotation data of the object at each intermediate time point in an associated manner.

Through the processing shown in fig. 7a, the labeling device can automatically perform object labeling processing on each intermediate frame image data according to the result of the first labeling and the result of the last labeling.

In some embodiments, in the case where the annotation data includes annotation frame size data, the process shown in FIG. 7a may be implemented as the process flow shown in FIG. 7 b:

701b, determining to obtain a difference mean value of the dimension data of the labeling frame of the image data corresponding to each camera according to the labeling data of the (N + i) th time point, the dimension data of the labeling frame included in the labeling data of the nth time point and the number of the intermediate time points;

703b, determining to obtain a numerical value of the size data of the labeling frame in the image data corresponding to each camera at each intermediate time point of the object according to the size data of the labeling frame of the image data corresponding to each camera at the Nth time point, the sequence of the intermediate time points and the determined difference mean value of the size data of the labeling frame;

step 705b, the annotation data of the object at each intermediate time point is saved in a correlated manner.

In step 701b, for example, if the frame size of the image data corresponding to one camera at the nth time point of one object is s × r, the frame size of the image data corresponding to the camera at the N + i th time point is s '× r', and the number of intermediate time points is N, the mean difference value of the frame size data is u, v, u ═ s)/N, and v ═ r/N. In step 703a, when determining the frame size data of one camera at the 2 nd time point of the intermediate frame, it may be determined that the frame size data s2 × r2, s2 ═ s +2u, and r2 ═ r +2v are obtained.

In some embodiments, in the case where the annotation data includes annotation frame location data, the process shown in FIG. 7a can be implemented as the process flow shown in FIG. 7 c:

701c, determining to obtain a difference mean value of the position data of the labeling frame of the image data corresponding to each camera according to the labeling frame position data included in the (N + i) th time point labeling data and the nth time point labeling data and the number of the intermediate time points;

703c, determining to obtain a numerical value of the position data of the labeling frame in the image data corresponding to each camera at each intermediate time point of the object according to the position data of the labeling frame in the image data corresponding to each camera at the Nth time point, the sequence of the intermediate time points and the determined difference mean value of the position data of the labeling frame;

step 705c, storing the annotation data of the object at each intermediate time point in an associated manner.

In step 701c, for example, the position data of the annotation frame of the image data corresponding to one camera at the nth time point of one object is (x, y), the position data of the annotation frame of the image data corresponding to the camera at the N + i th time point is (x ', y'), the number of intermediate time points is N, and the mean difference value of the obtained position data of the annotation frame is p, q, p ═ x)/N, and v ═ y —/N. In step 703c, when determining the position data of the annotation frame of one camera at the 2 nd time point of the intermediate frame, it may be determined that the position data of the annotation frame is (x2, y2), x2 is x +2p, and y2 is y +2 q.

In other embodiments, in the case that the annotation data includes annotation frame size data and annotation frame position data, the processing in fig. 7b and fig. 7c may be used to determine and obtain the value of the annotation frame size data and the value of the annotation frame position data included in the annotation data of the intermediate frame of the object.

Through the processing of fig. 7a to 7c, it can be determined that the estimated labeling data of the object is obtained. When the object is a static object, the difference between the labeling data of the first labeling and the labeling data of the last labeling of the object is not large, and more accurate intermediate time point labeling data can be obtained through the processing of fig. 7a to 7 c. When the object is an object moving in consecutive frames, since the time interval of consecutive frames is small (for example, in the case of obtaining 60 frames of image data per second, the time interval between adjacent frames is small), the change in the position of the moving object is also very small with respect to such a small time interval, and the movement of the object can be regarded as linear movement, and more accurate intermediate time point annotation data can be obtained by the processing shown in fig. 7a to 7 c.

In other embodiments, for the case that the position or size of the object in the image data changes greatly, after the annotation device determines that the annotation data of the intermediate time point of an object is obtained, the annotation data of the determined intermediate time point may be further adjusted to obtain the annotation data corresponding to the object in the image data of the intermediate time point. As shown in fig. 7d, on the basis of the process shown in fig. 7a, the intermediate automatic labeling process can also be implemented as the following process:

701d, determining to obtain a difference mean value of each item of data in the labeling frame data of the image data corresponding to each camera according to the labeling data of the nth time point and the labeling data of the (N + i) th time point of the object and the number of the intermediate time points;

703d, determining to obtain the labeling frame data in the image data corresponding to each camera at each intermediate time point of the object according to the labeling frame data of the image data corresponding to each camera at the Nth time point, the sequence of the intermediate time points and the determined difference mean value of each item of data in the labeling frame data;

step 707d, the annotation device displays the multi-frame image data of the current intermediate time point, and displays a corresponding annotation frame on the displayed multi-frame image data according to the determined annotation data of the current intermediate time point;

step 709d, acquiring the input adjustment data of the annotation frame of the object in the frame of image data, where the adjustment data includes one or more of the following: position adjustment data of the marking frame and size adjustment data of the marking frame;

step 710d, determining to obtain the marking frame data of the current intermediate time point after the object is adjusted according to the determined marking frame data of the current intermediate time point and the marking frame adjustment data;

step 705d, storing the labeling data of the object at each intermediate time point in an associated manner.

For determining the processing of the adjustment data in step 710d, reference may be made to the first adjustment method and the second adjustment method, which are not described herein again.

Through the processing shown in fig. 7d, the annotation device can display the automatic annotation condition of the intermediate frame to the annotator, and can adjust the annotation data estimated by the intermediate annotation according to the adjustment data input by the annotator, so as to obtain the annotation data more matched with the object in the image data.

Embodiments of the subject matter and the functional operations described in this application can be implemented by various systems, digital electronic circuitry, or computer software, firmware, or hardware, including the structures disclosed in this specification and their equivalents, or combinations of these structures. Embodiments of the subject matter described in this specification can be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded for storage in a tangible, non-transitory computer readable medium, for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of these. The term "data processing unit" or "data processing apparatus" includes all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors, multiple computers. These means may include, in addition to hardware, code that creates an executable environment for the computer program in question, e.g., code that constitutes a processor firewall, a protocol stack, a database management system, an operating system, or a combination of these.

A computer program (also known as a program, software application, script, or code) can be written in any programming language, including compiled or interpreted speech; and can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that also stores other programs or data (e.g., one or more scripts stored in a markup language document), or in a separate file dedicated to the program in question, or in a coordinated file (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed by one or more computers at one site or distributed across multiple sites and interconnected by a communication network.

The processes or logic diagrams described in this specification can be executed by one or more programmable processors to execute one or more computer programs and perform processes on input data to generate output results. The processes or logic diagrams may be performed by, and various devices may be implemented as, special purpose logic circuitry, e.g., a Field Programmable Gate Array (FPGA) or an Application Specific Integrated Circuit (ASIC).

Processors for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory and/or a random access memory. The basic unit of a computer includes a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices, including magnetic, magneto-optical disks, or optical disks. However, a computer need not include these devices. Computer-readable media for storing instructions and data include all forms of non-volatile memory, media and storage devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices. The processor and the memory can be replaced by, or incorporated in, special purpose logic circuitry.

While this document contains many specifics, these specifics should not be construed as limitations on the scope of the disclosure, but merely as descriptions of features that may be incorporated into specific embodiments of particular inventions. Some of the features described in separate embodiments in this application may also be combined and implemented in a single embodiment. Features which are described in the context of separate embodiments may also be provided in combination in a single embodiment, or in any suitable subcombination. Also, while features may be described above in certain combinations, one or more features may be deleted from one or more of the claimed combinations and the claimed combinations may be further combined or modified.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in such order to achieve desirable results. Also, the separation of various system components in the embodiments should not be understood as requiring such separation in all embodiments. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for tracking and labeling an object in multi-camera multi-frame image data is characterized by comprising the following steps:

2. The method of claim 1, wherein the acquiring and storing the annotation data of the same object in the multi-frame image data at a time point by the annotation device comprises:

the marking device displays multi-frame image data acquired by the multi-camera in the same scene at a time point;

receiving a labeling instruction, and acquiring the identification of the object and labeling frame data of the object in each frame of image data;

and storing the obtained identification of the object and the marking frame data of the object in each frame of image data in an associated manner as the marking data of the object at the time point.

3. The method of claim 2, wherein receiving a labeling command to obtain the identification of the object and the labeled frame data of the object in each frame of image data comprises: the received marking instructions are one by one marking instructions, and the object identification and the marking frame data of the object, which are input to the object in each frame of image data, are respectively obtained.

4. The method of claim 2, wherein receiving a labeling command to obtain the identification of the object and the labeled frame data of the object in each frame of image data comprises: and acquiring the marking frame data of the object in each frame of image data, receiving a merging marking instruction, and acquiring the identification of the object.

5. The method of claim 2, after obtaining the frame of annotation data for the object in the frame of image data, further comprising:

obtaining the marking frame adjusting data of the object in the frame of input image data, wherein the marking frame adjusting data comprises the position adjusting data of the marking frame and/or the size adjusting data of the marking frame;

and determining to obtain the adjusted marking frame data according to the obtained marking frame data and the marking frame adjustment data.

6. The method of claim 1, wherein the acquiring and associating of the annotation data of the object in the multi-frame image data of the N + i-th time point by the annotation device comprises:

displaying a corresponding labeling frame according to the stored labeling data of the Nth frame;

determining the stored annotation frame data of the frame image data of the Nth time point of the object as the annotation frame data of the Nth + i time point of the object under the condition that the annotation frame adjustment data of the object in the input frame image data of the Nth + i time point is not acquired;

under the condition of acquiring input annotation frame adjustment data, determining and obtaining annotation frame data of the frame of image data of the N + i time point of the object according to the annotation frame adjustment data and the annotation frame data of the frame of image data of the N time point; the marking frame adjusting data comprise position adjusting data of the marking frame and/or size adjusting data of the marking frame;

and storing the mark of the object and the marking frame data of the object in the multi-frame image data in a correlated manner as the marking data of the (N + i) th time point of the object.

7. The method of claim 1, wherein determining the labeling data of the object at each intermediate time point between the nth time point and the N + i th time point according to the labeling data of the object at the nth time point and the labeling data of the object at the N + i th time point comprises:

determining to obtain a difference mean value of each item of data in the labeling frame data of the image data corresponding to each camera according to the labeling data of the nth time point and the labeling data of the (N + i) th time point of the object and the number of the intermediate time points;

and determining to obtain the labeling frame data in the image data corresponding to each camera at each middle time point of the object according to the labeling frame data of the image data corresponding to each camera at the Nth time point, the sequence of the middle time points and the determined difference mean value of each item of data in the labeling frame data.

8. The method of claim 7, wherein determining the data of the label box at each intermediate time point of the object further comprises:

the marking device displays the multi-frame image data of the current middle time point;

displaying a corresponding marking frame on the displayed multi-frame image data according to the determined current middle time point marking data; acquiring the adjustment data of the marking frame of the object in the input frame of image data, wherein the adjustment data comprises the position adjustment data of the marking frame and/or the size adjustment data of the marking frame;

and determining to obtain the marking frame data of the current intermediate time point after the object is adjusted according to the determined marking frame data and the marking frame adjustment data of the current intermediate time point.

9. The method according to any one of claims 5, 6, and 8, wherein in a case that the adjustment data includes position adjustment data of the annotation frame, determining to obtain adjusted annotation data according to the adjustment data and the annotation data includes:

and determining to obtain the position data of the adjusted marking frame according to the position adjustment data of the marking frame and the position data of the marking frame in the obtained marking data.

10. The method of claim 9, wherein the position adjustment data comprises position data of the adjusted label box;

determining to obtain the position data of the adjusted marking frame according to the position adjustment data of the marking frame and the position data of the marking frame in the marking data, and the method comprises the following steps: and determining the adjusted position data in the position adjustment data as the adjusted position data of the labeling frame.

11. The method of claim 9, wherein the position adjustment data includes a position adjustment direction and a position offset of the label box;

determining to obtain the position data of the adjusted marking frame according to the position adjustment data of the marking frame and the position data of the marking frame in the marking data, and the method comprises the following steps: and determining to obtain the adjusted position data of the marking frame according to the position data of the marking frame in the marking data and the position adjusting direction and the position offset of the marking frame.

12. The method according to any one of claims 5, 6, and 8, wherein in a case that the adjustment data includes resizing data of the annotation frame, determining to obtain adjusted annotation data according to the adjustment data and the annotation data includes:

and determining to obtain the size data of the adjusted marking frame according to the size adjustment data of the marking frame and the size data of the marking frame in the marking data.

13. The method of claim 12, wherein the resizing data comprises resizing data of the adjusted callout box;

according to the size adjustment data of the labeling frame and the size data of the labeling frame in the labeling data, the size data of the adjusted labeling frame is determined and obtained, and the method comprises the following steps: and determining the adjusted size data as the size data of the adjusted marking frame.

14. The method of claim 12, wherein the resizing data comprises resizing data;

according to the size adjustment data of the labeling frame and the size data of the labeling frame in the labeling data, the size data of the adjusted labeling frame is determined and obtained, and the method comprises the following steps: and determining to obtain the adjusted size data of the marking frame according to the size data of the marking frame and the size increasing and decreasing data of the marking frame in the marking data.

15. The method of claim 1, wherein after determining the labeling data for each intermediate time point between the nth time point and the N + i th time point of the object, further comprising:

and receiving a frame resetting instruction, and setting the multi-frame image data of the current middle time point as the multi-frame image data of the Nth time point.

16. An apparatus for tracking and labeling an object in multi-camera multi-frame image data, comprising at least one processor and at least one memory, at least one machine executable instruction being stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the method according to any one of claims 1 to 15.

17. A non-volatile storage medium having stored thereon at least one machine executable instruction, the at least one machine executable instruction when executed by a processor implementing a method as claimed in any one of claims 1 to 15.