CN110781797A

CN110781797A - Labeling method and device and electronic equipment

Info

Publication number: CN110781797A
Application number: CN201911004862.0A
Authority: CN
Inventors: 赵拯; 管永来; 郑东; 赵五岳
Original assignee: Hangzhou Pan Intelligent Technology Co Ltd
Current assignee: Hangzhou Pan Intelligent Technology Co Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2020-02-11
Anticipated expiration: 2039-10-22
Also published as: CN110781797B

Abstract

The embodiment of the disclosure provides a labeling method, a labeling device and electronic equipment, belonging to the technical field of image processing, wherein the method comprises the following steps: acquiring a group of track pictures in a target acquisition device, wherein the track pictures comprise basic optimal pictures; searching all neighborhood devices of the target acquisition device according to a preset neighborhood dictionary; respectively searching neighborhood pictures of all neighborhood tracks in preset time in each neighborhood device; acquiring a neighborhood optimal picture of each neighborhood track; extracting global characteristics of all neighborhood optimal pictures and basic optimal pictures; acquiring a target optimal picture of which the global characteristics are matched with the basic optimal picture in all the neighborhood optimal pictures; and marking the track pictures corresponding to all the target optimal pictures as the track pictures of the target object. By the processing scheme, pictures of the same target object shot by different target acquisition devices can be marked to obtain a picture set corresponding to the target object, and the track of the target object can be quickly searched in a corresponding scene.

Description

Labeling method and device and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an annotation method, an annotation device, and an electronic device.

Background

Data annotation, as part of the computer vision field, is an essential step in the data processing process. For example, a set of pictures captured by different cameras of the same person in a corresponding scene is labeled, where the corresponding scene may be a mall, construction site or other place where people gather and install multiple cameras. Most of the existing pedestrian identification sample labels are pure manpower labels or match the same pedestrian by extracting human face characteristic points. Pure manpower labeling needs to find out the pictures of the same pedestrian from a large amount of data, so that the labor cost is high, and mistakes are easy to make. The human cost can be greatly reduced by extracting the face characteristic points to match the same pedestrian, but the method has many defects, such as easy error of algorithm matching; many pedestrians are difficult to capture the face characteristic points, cannot perform data annotation, and the like. This results in a large amount of data being wasted.

Therefore, the existing pedestrian identification and marking method has the technical problems that mistakes are easy to make, the human face is difficult to capture, and the labor cost is high.

Disclosure of Invention

In view of the above, embodiments of the present disclosure provide a labeling method, apparatus, and electronic device, which at least partially solve the problems in the prior art.

In a first aspect, an embodiment of the present disclosure provides an annotation method, including:

acquiring a group of track pictures in a target acquisition device, wherein the track pictures comprise basic optimal pictures;

searching all neighborhood devices of the target acquisition device according to a preset neighborhood dictionary;

respectively searching neighborhood pictures of all neighborhood tracks in preset time in each neighborhood device;

acquiring a neighborhood optimal picture of each neighborhood track;

extracting global features of all neighborhood optimal pictures and global features of basic optimal pictures;

acquiring a target optimal picture with global characteristics matched with the basic optimal picture in all the neighborhood optimal pictures;

and marking the track pictures corresponding to all the target optimal pictures as the track pictures of the target object.

According to a specific implementation manner of the embodiment of the present disclosure, the step of respectively searching neighborhood pictures of all neighborhood tracks within a preset time in each neighborhood device includes:

determining the starting moment when the target object enters the visual field range of the target acquisition device;

determining an acquisition period, wherein the acquisition period comprises a first sub-period of a preset period before the start time and a second sub-period of a preset period after the start time;

searching all neighborhood tracks in the acquisition time period in each neighborhood device;

and acquiring neighborhood pictures in all neighborhood tracks.

According to a specific implementation manner of the embodiment of the present disclosure, the step of searching all neighborhood tracks in the acquisition period in each neighborhood device includes:

and searching all neighborhood tracks of the acquisition time period in each neighborhood device except for a path neighborhood device, wherein the path neighborhood device is a neighborhood device which has acquired the neighborhood tracks in all neighborhood devices of the target acquisition device.

According to a specific implementation manner of the embodiment of the present disclosure, the step of obtaining neighborhood pictures in all neighborhood tracks includes:

sequentially setting storage frames of a device number father node, a track number father node and a picture number father node of a neighborhood device according to a preset hierarchy of the feature dictionary;

correspondingly storing all the neighborhood tracks in the same neighborhood device into a device number father node of the same neighborhood device, and correspondingly storing all the neighborhood pictures under one group of tracks into the same track number father node;

and sequentially pulling the neighborhood pictures according to the hierarchy of the feature dictionary.

According to a specific implementation manner of the embodiment of the present disclosure, before the step of searching all neighborhood devices of the target collection device according to a preset neighborhood dictionary, the method further includes:

constructing a neighborhood dictionary storage frame;

arranging a preset number of neighborhood devices in the neighborhood of the current target acquisition device;

respectively marking the identification information of the current target acquisition device and the identification information of the neighborhood device, and correspondingly storing the identification information of the current image acquisition device and the identification information of the neighborhood device to form the neighborhood dictionary.

According to a specific implementation manner of the embodiment of the present disclosure, the step of obtaining the neighborhood optimal picture of each neighborhood track includes:

respectively extracting the global features of each neighborhood picture in each group of neighborhood track pictures according to a preset feature extractor;

respectively carrying out distance summation on the global features of each neighborhood picture and the global features of all other neighborhood pictures in the neighborhood track picture, and accumulating the summation results to obtain the distance sum corresponding to each neighborhood picture;

and determining the neighborhood picture corresponding to the minimum value of the sum of distances in all the neighborhood pictures as the neighborhood optimal picture in the neighborhood track.

According to a specific implementation manner of the embodiment of the present disclosure, the step of obtaining a group of track pictures in the target acquisition device includes:

acquiring a group of track pictures, wherein the track pictures comprise a plurality of original pictures;

according to a preset target object detection algorithm, removing the original picture with the score smaller than a preset value to obtain a picture to be determined;

calculating the ratio of the height to the width of a target object in each pending picture;

and removing the to-be-determined pictures of which the ratio of the height to the width of the target object in all to-be-determined pictures of the group of track pictures is within a preset range to obtain the basic pictures of the group of tracks.

In a second aspect, an embodiment of the present disclosure provides an annotation apparatus, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a group of track pictures in a target acquisition device, and the track pictures comprise basic optimal pictures;

the first searching module is used for searching all neighborhood devices of the target acquisition device according to a preset neighborhood dictionary;

the second searching module is used for respectively searching neighborhood pictures of all neighborhood tracks in preset time in each neighborhood device;

the second acquisition module is used for acquiring a neighborhood optimal picture of each neighborhood track;

the extraction module is used for extracting the global features of all the neighborhood optimal pictures and the global features of the basic optimal pictures;

the third acquisition module is used for acquiring a target optimal picture of which the global characteristics are matched with the basic optimal picture in all the neighborhood optimal pictures;

and the marking module is used for marking all the track pictures corresponding to the target optimal pictures as the track pictures of the target object.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the tagging method of the first aspect or any implementation manner of the first aspect.

In a fourth aspect, the disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the annotation method of the first aspect or any implementation manner of the first aspect.

In a fifth aspect, the disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the annotation method of the first aspect or any of the implementations of the first aspect.

The labeling method, the labeling device and the electronic equipment in the embodiment of the disclosure comprise the following steps: acquiring a group of track pictures in a target acquisition device, wherein the track pictures comprise basic optimal pictures; searching all neighborhood devices of the target acquisition device according to a preset neighborhood dictionary; respectively searching neighborhood pictures of all neighborhood tracks in preset time in each neighborhood device; acquiring a neighborhood optimal picture of each neighborhood track; extracting global characteristics of all neighborhood optimal pictures and basic optimal pictures; acquiring a target optimal picture with global characteristics matched with the basic optimal picture in all the neighborhood optimal pictures; and marking the track pictures corresponding to all the target optimal pictures as the track pictures of the target object. By the scheme, pictures of the same target object shot by different target acquisition devices can be marked to obtain the picture set corresponding to the target object, and the track of the target object can be quickly searched in the corresponding scene.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a labeling method according to an embodiment of the present disclosure;

fig. 2 is a schematic layout diagram of a neighborhood device of a labeling method according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart of another annotation method provided in the embodiments of the present disclosure;

FIG. 4 is a schematic flow chart of another annotation method provided in the embodiments of the present disclosure;

FIG. 5 is a schematic flow chart of another annotation method provided in the embodiments of the present disclosure;

FIG. 6 is a schematic flow chart of another annotation method provided in the embodiments of the present disclosure;

FIG. 7 is a schematic structural diagram of a labeling apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic view of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.

In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.

The embodiment of the disclosure provides a labeling method. The annotation method provided by the embodiment can be executed by a computing device, the computing device can be implemented as software, or implemented as a combination of software and hardware, and the computing device can be integrated in a server, a terminal device, and the like.

Referring to fig. 1, an embodiment of the present disclosure provides an annotation method, including:

s101, acquiring a group of track pictures in a target acquisition device, wherein the track pictures comprise basic optimal pictures;

the labeling method provided by the embodiment of the disclosure can be applied to scenes captured by a plurality of different cameras by the same person. And defining a camera or other equipment with a shooting function as a target acquisition device. The scene may be a mall, construction site or other place where people gather. At present, a market is taken as an example, and a preset number of target acquisition devices are covered in different areas of the market, so that pedestrians can be continuously captured by the target acquisition devices in the moving process. And defining a certain moving pedestrian captured by the target acquisition device as a target object. Defining a path formed by the target object from the picture entering a certain target acquisition device to the disappearance as a track, and defining pictures of the target object captured continuously in the track as a group of corresponding track pictures. Specific embodiments will now be described with the target object being a pedestrian.

Specifically, a group of track pictures in a time period in which a pedestrian is most likely to appear in a corresponding scene, or a group of track pictures at a certain specified time point or time period, are acquired. The target capture device is capable of capturing a plurality of target pictures of a target object in one trajectory. And defining the most representative picture which can most summarize the behavior state of the current track in the plurality of target pictures as a basic optimal picture of the group of track pictures. In this embodiment, the group of track pictures includes a basic optimal picture.

S102, searching all neighborhood devices of the target acquisition device according to a preset neighborhood dictionary;

specifically, in a corresponding scene such as a mall, different areas are covered with a certain number of cameras. Each camera is disposed at a fixed position and has a specific image capturing range. In order to avoid the blind area, when arranging target acquisition devices such as cameras, the setting distance and the placing angle of a certain target acquisition device and other cameras in the neighborhood of the certain target acquisition device are considered, so that the whole sight line range of a certain area can be covered. And defining other cameras in the neighborhood of the target acquisition device as neighborhood devices of the target acquisition device. There may be more than one neighborhood of devices for a target acquisition device.

The pedestrian starts to collect pictures from the capture range of the target collection device, the pedestrian moves, the target collection device continuously tracks and collects the pictures until the pedestrian moves out of the image capture range of the target collection device, and the group of shooting tracks is finished. The pedestrian continues to move and enters the capturing range of the first neighborhood device of the target acquisition device, the first neighborhood device starts to continue to acquire pictures of the pedestrian until the pedestrian moves out of the image capturing range of the first neighborhood device, and the group of shooting tracks is finished. The pedestrian continues to move, and the second neighborhood device of the target acquisition device continues to acquire the picture. Thus, in the region, the target acquisition device and the neighborhood device thereof continuously shoot a plurality of groups of track pictures for the pedestrians.

Alternatively, in different scenarios, there may be multiple target acquisition devices, with each target acquisition device corresponding to multiple neighborhood devices. In order to improve the query speed, a neighborhood dictionary is set, and the target acquisition device and all neighborhood devices thereof are correspondingly stored in the neighborhood dictionary. In this way, all neighborhood devices of the target acquisition device are searched according to a preset neighborhood dictionary.

In an embodiment of the present disclosure, before the step of searching all neighborhood devices of the target acquisition device according to a preset neighborhood dictionary, the method further includes:

constructing a neighborhood dictionary storage frame;

Specifically, a neighborhood dictionary storage frame is built, different levels are preset to be used for correspondingly storing data of different levels, and the query speed is improved. And arranging a preset number of neighborhood devices in the neighborhood of the current target acquisition device. As shown in FIG. 2, cam 0-cam 10 are different target collection devices. For the current target acquisition device cam0, its neighborhood has six devices: cam1, cam2, cam3, cam4, cam5 and cam 6. That is, when a pedestrian appears in the current target acquisition device cam0, cam 1-cam 6 are all the neighborhood devices that the pedestrian may appear next time. And taking the 0-6 Arabic numerals as identification information of the target acquisition device and the neighborhood devices thereof, and correspondingly storing the current target acquisition device cam0 and the neighborhood devices cam 1-cam 6 to form the neighborhood dictionary. During searching, the number 0 of the cam0 of the current target acquisition device is input into the neighborhood dictionary and used as the key value of the dictionary, and the neighborhood devices cam 1-cam 6 of cam0 can be quickly found. Of course, in other embodiments, the identification information of the target collection device and its neighboring devices may be an equipment serial number or other identification information, which is not limited.

S103, respectively searching neighborhood pictures of all neighborhood tracks in preset time in each neighborhood device;

optionally, for example, in a shopping mall, a construction site or other places where people gather, the amount of image data collected in a unit time is large, which brings great difficulty to subsequent data analysis. In implementation, the neighborhood images of all neighborhood tracks in a preset time are usually searched in each neighborhood device. The longer the preset time is, the larger the amount of the acquired picture information is. In the implementation operation, the time period corresponding to the most probable occurrence of a pedestrian in a scene or a specified time period is specifically set.

S104, acquiring a neighborhood optimal picture of each neighborhood track;

through the steps, all neighborhood pictures of the neighborhood tracks in the preset time are found in each neighborhood device. Each group of neighborhood tracks comprises a plurality of neighborhood pictures, and the plurality of neighborhood pictures capture continuous behavior of the target object under the path. And defining the most representative picture which can most summarize the behavior state of the current track in the plurality of neighborhood pictures as a neighborhood optimal picture.

In the neighborhood pictures, there are various ways to extract the neighborhood optimal pictures. Optionally, according to the selection of the user, an optimal picture is specified in a group of neighborhood pictures. For example, in a group of taken pedestrian neighborhood pictures, a specified taken pedestrian picture is the most complete, and a certain picture with a picture angle capable of reflecting the current behavior state most is taken as a neighborhood optimal picture. In addition, the optimal picture may also be selected in such a manner that an extraction algorithm of the neighborhood optimal picture is preset in the device processor, and the neighborhood optimal picture is obtained through the extraction algorithm.

S105, extracting the global features of all the neighborhood optimal pictures and the global features of the basic optimal pictures;

in the above steps S101 and S104, the basic optimal picture and all the neighborhood optimal pictures are respectively obtained, and then, the global features of all the neighborhood optimal pictures and the basic optimal picture are respectively extracted.

The global features refer to overall attributes of the image, and common global features include color features, texture features and shape features, such as intensity histograms and the like. In the device processor, an extraction algorithm of global features is preset, and the global features of all the neighborhood optimal pictures and the basic optimal pictures are respectively obtained through the extraction algorithm. And extracting a global feature from each basic optimal picture, and extracting a global feature from each neighborhood optimal picture.

S106, obtaining a target optimal picture of which the global characteristic is matched with the basic optimal picture in all the neighborhood optimal pictures;

extracting the global characteristics of all the neighborhood optimal pictures and the basic optimal picture, analyzing and comparing the global characteristic information of the basic optimal picture and all the neighborhood optimal pictures by taking the global characteristics of the basic optimal picture as a reference, and defining the optimal picture matched with the global characteristics of the basic optimal picture as a target optimal picture. It is to be understood that "match" is a general concept. In practice, the comparison can be analyzed by presetting the matching similarity value. For example, the matching similarity value is adaptively adjusted according to different environment shooting light. The same camera shoots the same pedestrian in cloudy or sunny days, and the definition of the collected neighborhood pictures may be different. In this way, the parameters of the corresponding neighborhood optimal picture and its global features may also be different. And during the comparison, adaptively adjusting the matched similarity numerical value to obtain a preset number of target optimal pictures so as to provide enough reference data for the subsequent labeling step.

And S107, marking the track pictures corresponding to all the target optimal pictures as the track pictures of the target object.

Optionally, a preset number of target optimal pictures are obtained according to the preset matched similarity value, and the track pictures corresponding to all the preset number of target optimal pictures are labeled as the track pictures of the target object.

In the labeling method provided by the embodiment of the present disclosure, a group of track pictures and basic optimal pictures in the target acquisition device are obtained, neighborhood pictures and neighborhood optimal pictures of all neighborhood devices of the target acquisition device are obtained, global features of all neighborhood optimal pictures and basic optimal pictures are extracted, and the track pictures corresponding to all the target optimal pictures are labeled as the track pictures of the target object through analysis and comparison. By the scheme, pictures of the same target object shot by different target acquisition devices can be marked to obtain a picture set corresponding to the target object; the track of the target object is quickly searched in the corresponding scene; and further identifying the target object through the target optimal picture.

According to a specific implementation manner of the embodiment of the present disclosure, as shown in fig. 3, the step of respectively searching neighborhood pictures of all neighborhood tracks within a preset time in each neighborhood device includes:

s301, determining the starting time of the target object entering the visual field range of the target acquisition device;

optionally, the starting time is specifically set at a time point or some specified time point at which the target object is most likely to appear in the corresponding scene.

S302, determining an acquisition time period, wherein the acquisition time period comprises a first sub-time period of a preset time period before the starting time and a second sub-time period of the preset time period after the starting time;

s303, searching all neighborhood tracks in the acquisition time period in each neighborhood device;

s304, neighborhood pictures in all neighborhood tracks are obtained.

Specifically, with the target object as a pedestrian, the start time t ₀This step is described as an example.

Specifying a starting time t ₀The first sub-period is t for the time point when the pedestrian enters the visual field range of the target acquisition device _0－Δt ₁The second sub-period is t ₀+Δt ₂. The pedestrian is at t ₀The time of day is within the field of view of the target detection device, then the pedestrian is at t ₀Front part of time t _0－Δt ₁And t and ₀the latter part of the moment t ₀+Δt ₂Is likely to occur in saidIn the neighborhood of the target acquisition device. In step S102, after all neighborhood devices of the target collection device are searched according to the preset neighborhood dictionary, the collection time period [ t ] is searched in each neighborhood device _0－Δt ₁，t ₀+Δt ₂]And obtaining all neighborhood tracks in the image, and obtaining neighborhood pictures in all neighborhood tracks. In the disclosed embodiment, Δ t ₁＝Δt ₂60 s. Of course, in other embodiments, t ₀、Δt ₁、Δt ₂May have different arrangements, and is not limited.

According to a specific implementation manner of the embodiment of the present disclosure, the step of searching all neighborhood tracks in the acquisition period in each neighborhood device further includes:

The labeling method according to the embodiment of the present disclosure aims to find out the pictures of the same pedestrian as many as possible without repetition. Therefore, in order to avoid repeatedly selecting a picture of a pedestrian in the same camera, a further path constraint is defined for the step S303.

In particular, it is again elaborated in connection with fig. 2. Suppose that the pedestrian moves in the direction of cam0 and cam3 with cam0 as the current target acquisition device and the position of cam0 as the track starting point. Where there are six devices in the neighborhood of cam 0: cam1, cam2, cam3, cam4, cam5, cam 6; during searching, the acquisition time interval is set to be T. For the first search, cam0 is taken as the current target acquisition device, and cam0 is stored in the track library. Searching for acquisition time interval T in adjacent devices cam 1-cam 6 ₁All neighborhood tracks within, where T ₁< T. Within the time constraint, the picture of the second occurrence of the pedestrian needs to be found in all the neighborhood devices cam 1-cam 6 of cam 0. Assuming that the second-appearing picture is found in cam3, then the neighborhood devices cam0, cam2 and ca of cam3 need to be searched by taking cam3 as the current acquisition device in the second searchm4, cam8 and cam9, and then excluding cam0 stored in the track library, so that the neighborhood device of cam3 does not contain cam0 any more at this time, because pedestrians move from cam0 to cam3, and if cam0 is taken as the neighborhood device of cam3, the previously found pictures are likely to be found again in the second search, an infinite loop is formed, so that the above-mentioned path constraint is required to be added to prevent finding repeated pictures, and the search speed is increased.

According to a specific implementation manner of the embodiment of the present disclosure, as shown in fig. 4, the step of obtaining neighborhood pictures in all neighborhood tracks includes:

s401, sequentially setting storage frames of a device number father node, a track number father node and a picture number father node of a neighborhood device according to a preset hierarchy of a feature dictionary;

s402, correspondingly storing all groups of neighborhood tracks in the same neighborhood device into a device number father node of the same neighborhood device, and correspondingly storing all neighborhood pictures under a group of tracks into the same track number father node;

and S403, sequentially pulling the neighborhood pictures according to the hierarchy of the feature dictionary.

Specifically, 2048-dimensional features are extracted from the neighborhood picture by using a basic model for pedestrian recognition, the features are thinned to 256 dimensions, and a feature dictionary is constructed for subsequent global feature query. According to the preset hierarchy of the feature dictionary, a three-layer structure of image acquisition device number father node, track number father node and picture number father node is sequentially arranged, namely all tracks under the same camera share one image acquisition device number father node, and track pictures under the same track share one track number father node. The feature dictionary stores the neighborhood pictures and the corresponding global features thereof in the form of a tree diagram, so that the query speed can be greatly improved.

According to a specific implementation manner of the embodiment of the present disclosure, as shown in fig. 5, the step of obtaining the neighborhood optimal picture of each neighborhood track includes:

s501, extracting the global feature of each neighborhood picture in each group of neighborhood track pictures respectively according to a preset feature extractor;

specifically, the preset feature extractor may include: the device comprises an extractor and a plurality of cascaded feature extraction modules, wherein the cascaded feature extraction modules comprise a convolution layer and a full connection layer. The convolution layer is used for extracting local features from an input neighborhood picture; the full-link layer is connected to the convolutional layer in the same feature extraction module and extracts global features of the neighborhood picture from the extracted local features.

S502, respectively carrying out distance summation on the global features of each neighborhood picture and the global features of all other neighborhood pictures in the neighborhood track picture, and accumulating summation results to obtain the distance sum corresponding to each neighborhood picture;

it can be understood that the global feature reflects the overall attribute of the neighborhood picture. If the sum of the distances of the global features of the two neighborhood pictures is smaller, the difference of the target pictures corresponding to the global features is smaller, that is, the similarity of the two neighborhood pictures is higher, and the probability of being track pictures of the same target object is higher. If the sum of the distances of the global features of the two neighborhood pictures is larger, it indicates that the difference of the neighborhood pictures corresponding to the global features is larger, that is, the similarity of the two neighborhood pictures is lower, and the probability of the neighborhood pictures which are the same target object is smaller.

Respectively carrying out distance summation on the global features of the current neighborhood picture and the global features of all other neighborhood pictures in the neighborhood track; if any two neighborhood pictures are not completely the same, the accumulated result of the sum of the distances between the global feature of any neighborhood picture and all other neighborhood pictures in the neighborhood track is not completely the same.

S503, determining the neighborhood picture corresponding to the minimum value of the sum of distances in all the neighborhood pictures as the neighborhood optimal picture in the neighborhood track.

It is understood again that the global feature reflects the overall attribute of the neighborhood picture. If the sum of the distances corresponding to a certain neighborhood picture is small, the difference between the neighborhood picture and other neighborhood pictures in the neighborhood track is small on the whole. That is, the neighborhood picture is the most representative neighborhood optimal picture in the set of neighborhood track pictures, which can most summarize the behavior state of the current track.

According to a specific implementation manner of the embodiment of the present disclosure, as shown in fig. 6, the step of obtaining a group of track pictures in the target acquisition device includes:

s601, acquiring a group of track pictures, wherein the track pictures comprise a plurality of original pictures;

s602, according to a preset target object detection algorithm, removing an original picture with the score smaller than a preset value to obtain a picture to be determined;

s603, calculating the ratio of the height to the width of the target object in each pending picture;

s604, removing the to-be-determined pictures of which the ratio of the height to the width of the target object in all to-be-determined pictures of the group of track pictures is within a preset range to obtain the basic pictures of the group of tracks.

In the embodiment of the present disclosure, the preset range is less than 1 or greater than 4.5.

This step is specifically described by taking the target object as an example.

Specifically, a neighborhood device of the target acquisition device continuously acquires track pictures of the pedestrian in the moving process, wherein the neighborhood pictures comprise a plurality of original pictures. If the pedestrian does not appear in the shooting range of the target acquisition device in the moving process, the original picture acquired by the target acquisition device does not include the pedestrian.

And removing the original picture with the score smaller than the preset value according to a preset target object detection algorithm to obtain the picture to be determined. And a calculation formula and a screening condition are arranged in the target object detection algorithm, and the original pictures with the calculation scores smaller than a preset value are removed. It can be understood that the target object detection algorithm detects the integrity of the target object in the original picture, i.e. the score is used to measure the integrity of the target object in the original picture. In this embodiment, the preset value is 0.7. Namely, when the image integrity of the pedestrian in the original picture is lower than 0.7, the pedestrian is removed, and the undetermined picture is obtained. Optionally, the target object detection algorithm sets different weights to some important parts of the pedestrian during calculation, such as a face with a recognition function. When the image of the pedestrian in the original image contains a complete face capable of identifying the identity of the pedestrian, and other body parts do not completely enter the image, the image integrity of the pedestrian in the original image is low, and the image can reach a preset score value to become an undetermined image. The target object detection algorithm, the calculation formula and the screening condition can be set according to different scene adaptability without limitation.

Further, calculating the ratio of the height to the width of the target object in the pending picture; and removing the to-be-determined pictures of which the ratio of the height to the width of the target object in all to-be-determined pictures of the group of track pictures is within a preset range. Generally, the ratio of the height to the width of the normal pedestrian is about 2-3, and besides the ratio, the undetermined pictures such as the target object with too wide width or too high height belong to the dirty picture, and need to be removed to obtain the base picture of the group of tracks. In an embodiment of the present disclosure, the preset range is less than 1 or greater than 4.5. Of course, in other embodiments, the preset value may be adaptively adjusted, which is not limited.

In correspondence with the above method embodiment, referring to fig. 7, the disclosed embodiment further provides a labeling apparatus 70, the apparatus comprising:

a first obtaining module 701, configured to obtain a set of track pictures in a target collection device, where the track pictures include a basic optimal picture;

a first searching module 702, configured to search all neighborhood devices of the target acquisition device according to a preset neighborhood dictionary;

a second searching module 703, configured to search, in each of the neighborhood devices, neighborhood pictures of all neighborhood tracks in a preset time respectively;

a second obtaining module 704, configured to obtain a neighborhood optimal picture of each neighborhood track;

a third obtaining module 705, configured to obtain a target optimal picture, in all the neighborhood optimal pictures, where global features of the target optimal picture are matched with the base optimal picture;

and a labeling module 706, configured to label the track pictures corresponding to all the target optimal pictures as the track pictures of the target object.

The apparatus shown in fig. 7 may correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.

Referring to fig. 8, an embodiment of the present disclosure also provides an electronic device 80, which includes:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the tagging method of the preceding method embodiment.

The disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the labeling method in the aforementioned method embodiments.

The disclosed embodiments also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the annotation method of the aforementioned method embodiments.

Referring now to FIG. 8, a block diagram of an electronic device 80 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, the electronic device 80 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 80 are also stored. The processing apparatus 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, or the like; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 80 to communicate wirelessly or by wire with other devices to exchange data. While the figures illustrate an electronic device 80 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of labeling, comprising:

acquiring a neighborhood optimal picture of each neighborhood track;

2. The labeling method according to claim 1, wherein the step of searching the neighborhood pictures of all neighborhood tracks within a preset time in each of the neighborhood devices respectively comprises:

and acquiring neighborhood pictures in all neighborhood tracks.

3. The labeling method of claim 2, wherein said step of finding all neighborhood trajectories within said acquisition period within each neighborhood device comprises:

4. The labeling method according to claim 2, wherein the step of obtaining the neighborhood pictures in all the neighborhood tracks comprises:

5. The labeling method of claim 1, wherein before the step of searching all neighborhood devices of the target collection device according to a preset neighborhood dictionary, the method further comprises:

constructing a neighborhood dictionary storage frame;

6. The labeling method according to any one of claims 1 to 5, wherein the step of obtaining a neighborhood optimal picture for each neighborhood track comprises:

7. The labeling method according to any one of claims 1 to 5, wherein the step of obtaining a set of track pictures in the target acquisition device comprises:

8. A marking device, the device comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the annotation method of any one of the preceding claims 1-7.

10. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the annotation method of any one of the preceding claims 1-7.