CN113408339A

CN113408339A - Label construction method and related device

Info

Publication number: CN113408339A
Application number: CN202110513555.6A
Authority: CN
Inventors: 胡海波; 唐邦杰; 刘忠耿; 潘华东
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2021-09-17

Abstract

The application discloses a label construction method and a related device, wherein the label construction method comprises the following steps: obtaining a training set, wherein the training set comprises a plurality of training images, and each training image comprises a specific target; performing body part segmentation on a specific target in each training image to obtain local images of a plurality of body parts; obtaining the characteristics of corresponding training images by using local images of all body parts of each training image; obtaining the similarity of the training images relative to a plurality of reference images by using the characteristics of each training image, wherein each reference image comprises a specific target; and setting labels of the training images according to a plurality of similarities corresponding to each training image. Through the mode, the method and the device can avoid the intervention of manual label definition, can reflect the actual complete degree of the target more objectively, have stronger robustness and can be more suitable for a downstream algorithm model.

Description

Label construction method and related device

Technical Field

The application belongs to the technical field of image recognition, and particularly relates to a label construction method and a related device.

Background

Most of image optimization methods adopted in the prior art pay attention to quality evaluation of images, and the integrity degree of a target is ignored; in the field of video structuring, the fact that the target in the image is complete enough is often a precondition for accurate attribute identification, and integrity evaluation is deficient in most image optimization schemes. In the integrity scoring link, the neural network usually directly outputs the integrity score of the target, and the construction of the integrity score label is usually manually and subjectively calibrated according to a set of standards, so that the actual integrity degree of the target cannot be accurately reflected, and the accuracy of subsequent attribute identification and pedestrian re-identification tasks is influenced. For example, as shown in fig. 1, fig. 1 is a schematic diagram of the integrity change of the upper body of a human body. The degree of the shielding of the upper half of the human body is continuously changed in the continuous multi-frame images, and the score labels corresponding to the slight changes are difficult to give through manual definition standards.

Disclosure of Invention

The application provides a tag construction method and a related device, so as to avoid the intervention of manual tag definition.

In order to solve the technical problem, the application adopts a technical scheme that: provided is a tag construction method including: obtaining a training set, wherein the training set comprises a plurality of training images, and each training image comprises a specific target; performing body part segmentation on the specific target in each training image to obtain a plurality of local images of the body parts; obtaining features of the corresponding training images using local images of all the body parts of each of the training images; obtaining the similarity of the training image relative to a plurality of reference images by using the characteristics of each training image, wherein each reference image comprises the specific target; and setting labels of the training images according to a plurality of similarity degrees corresponding to each training image.

Wherein the step of body part segmentation of the specific target in each of the training images to obtain local images of a plurality of the body parts comprises: obtaining a plurality of human body part key points of the specific target in each training image; obtaining a plurality of local images of the body part according to the plurality of key points of the body part; wherein the local image of each body part comprises a plurality of key points of the body part.

Wherein the step of obtaining a plurality of local images of the body part from the plurality of body part keypoints comprises: and acquiring a plurality of key points of the human body part, which belong to the current body part, and taking an image corresponding to the minimum external distance shape as a local image of the current body part.

Before the step of using the image corresponding to the minimum outer distance shape as the local image of the current body part, the method further comprises: judging whether all the human body part key points of the current body part are detected or not, wherein the confidence coefficient of each human body part key point is greater than or equal to a threshold value; if yes, directly entering the step of taking the image corresponding to the minimum external distance shape as the local image of the current body part; otherwise, updating the key points of the human body part which are not detected or detected in the current body part but have the confidence coefficient smaller than the threshold value by using the existing data, and entering the step of taking the image corresponding to the minimum external distance shape as the local image of the current body part.

Wherein the plurality of human body part key points comprise: a head, a neck, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, a left waist, a right waist, a left knee, a right knee, a left ankle, and a right ankle; the body part comprises an upper half body, a middle part of the body and a lower half body; wherein the local image of the upper half comprises a head, a neck, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, a left waist and a right waist; the local image of the middle part of the body comprises a left waist, a right waist, a left knee and a right knee; the partial image of the lower body includes a left knee, a right knee, a left ankle, and a right ankle.

Wherein the step of obtaining features of the corresponding training images using the local images of all the body parts of each of the training images comprises: performing feature extraction on the local images of all the body parts; and performing one-dimensional feature vector splicing on the local images of all the body parts of each training image to obtain the corresponding features of the training images.

Wherein the step of extracting the features of the corresponding training images using the local images of all the body parts of each of the training images comprises: extracting features of the local images of all the body parts and the whole image of the specific target; and performing one-dimensional feature vector splicing on the features of all the body parts of each training image and the overall features of a specific target to obtain the corresponding features of the training images.

Wherein, before the step of obtaining the similarity of the training image relative to the plurality of reference images by using the features of each training image, the method comprises: taking the training images which meet preset conditions in the training set as the reference images; wherein the preset condition comprises at least one of the following: the outline definition of the specific target in the training image exceeds a first preset value, the quality of the training image exceeds a second preset value, the body part of the specific target in the training image is complete, and only the specific target is contained in the training image.

Wherein the step of obtaining the similarity of the training images with respect to the plurality of reference images using the features of each of the training images comprises: for each training image, obtaining the similarity between the features of the training image and the features of each reference image; when the training image and the reference image are the same image, the similarity is 0; and when the training image and the reference image are different images, the similarity is cosine similarity.

Wherein the step of obtaining, for each of the training images, a similarity between the features of the training image and the features of each of the reference images comprises: aligning features of all of the reference images in a column to form a first matrix and aligning features of all of the training images in a row to form a second matrix; obtaining a similarity matrix by using the first matrix and the second matrix, wherein when the reference image and the training image are the same image, the value of the reference image and the value of the training image in the similarity matrix are 0; when the reference image and the training image are different images, the value of the reference image and the training image in the similarity matrix is the cosine similarity of the features of the reference image and the features of the training image.

Wherein the step of setting labels of the training images according to the plurality of similarities corresponding to each of the training images includes: for each of the training images, taking a maximum value of the plurality of the similarities corresponding to the training image as its label.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a label building apparatus comprising: a processor and a memory, wherein the processor is coupled to the memory for implementing the tag construction method in any of the above embodiments.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an apparatus having a storage function, on which program data is stored, the program data being executable by a processor to implement the tag construction method described in any of the above embodiments.

Being different from the prior art situation, the beneficial effect of this application is: in the label construction method provided by the application, body parts of a specific target in a training image are divided, and the characteristics of the training image are obtained according to local images of all the body parts; and then, setting labels by calculating the similarity between the features of the training image and the reference image. The mode that this application provided can reduce the intervention of artifical mark, can be more objective reflect the actual complete degree of target, and its robustness is stronger, and can be applicable to downstream algorithm model more. In addition, the method can reduce the time required by the labeling process of a large number of training images and improve the efficiency.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:

FIG. 1 is a schematic diagram of the change in the integrity of the upper body of a human body;

FIG. 2 is a schematic flow chart diagram of an embodiment of a tag construction method according to the present application;

FIG. 3 is a schematic diagram of an embodiment of a plurality of training images;

FIG. 4 is a flowchart illustrating an embodiment corresponding to step S102 in FIG. 2;

FIG. 5a is a schematic diagram of an embodiment of a training image;

FIG. 5b is a schematic diagram of an embodiment of the training image of FIG. 5a after passing through a human key detection model;

FIG. 5c is a diagram of an embodiment of a partial image of the upper body of FIG. 5 a;

FIG. 5d is a schematic diagram of an embodiment of a partial image of the middle portion of the body shown in FIG. 5 a;

FIG. 5e is a diagram of an embodiment of a partial lower body image of the user's body shown in FIG. 5 a;

FIG. 6 is a flowchart illustrating an embodiment corresponding to step S202 in FIG. 4;

FIG. 7 is a schematic structural diagram of an embodiment of a tag build framework of the present application;

FIG. 8 is a schematic structural diagram of an embodiment of a label building apparatus according to the present application;

fig. 9 is a schematic structural diagram of an embodiment of a device with a storage function according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an embodiment of a tag construction method according to the present application, where the tag construction method specifically includes:

s101: a training set is obtained, wherein the training set comprises a plurality of training images, and each training image comprises a specific target.

Specifically, in this embodiment, the training images in the training set may be sequences from the same video stream that contain a specific target. And each training image can contain other similar or non-similar objects besides the specific object. Optionally, the specific target is a person; the training set may contain images of the front, sides, or back of the particular target.

For example, as shown in fig. 3, fig. 3 is a schematic diagram of an embodiment of a plurality of training images. When the specific target is a pedestrian reading a book in the lower head in fig. 3, the training image may include only the specific target, or include the specific target and a non-specific target, and the like.

S102: body part segmentation is performed on a specific target in each training image to obtain local images of a plurality of body parts.

Specifically, referring to fig. 4, fig. 4 is a schematic flowchart illustrating an embodiment corresponding to step S102 in fig. 2. The step S102 specifically includes:

s201: a plurality of human body part key points of a specific target in each training image are obtained.

Specifically, in the present embodiment, an existing human body key point detection model may be employed to obtain all human body part key points of a specific target in each training image. Optionally, the following 14 key points of the human body part can be obtained through the above step S201: head, neck, left shoulder, right shoulder, left elbow, right elbow, left wrist, right wrist, left waist, right waist, left knee, right knee, left ankle, and right ankle.

S202: obtaining local images of a plurality of body parts according to a plurality of key points of the human body parts; wherein the local image of each body part comprises a plurality of key points of the body part.

Specifically, in this embodiment, the process of implementing step S202 may be: and acquiring a plurality of key points of the human body part belonging to the current body part, and taking the image corresponding to the minimum external distance shape as a local image of the current body part. The above-mentioned manner of using the minimum bounding rectangle can well filter out unspecific objects other than the specific object. Of course, in other embodiments, an image corresponding to a minimum circumscribed circle of a plurality of key points of a human body part may be used as the local image of the current body part.

Optionally, the plurality of body parts may include an upper body, a mid-body, and a lower body. The partial image of the upper half body comprises a head, a neck, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, a left waist and a right waist; the partial image of the middle part of the body comprises a left waist, a right waist, a left knee and a right knee; the partial image of the lower body includes a left knee, a right knee, a left ankle, and a right ankle.

As shown in fig. 5a-5b, fig. 5a is a schematic diagram of an embodiment of a training image, and fig. 5b is a schematic diagram of an embodiment of the training image in fig. 5a after passing through a human body key point detection model. The 14 mentioned human body part key points contained in the training image can be detected through a human body key point detection model; then, body part division is carried out on the current specific target according to a preset rule, and the division result is shown as a rectangular frame in fig. 5 b; finally, outputting local images of each body part as shown in fig. 5c, 5d and 5 e; fig. 5c is a schematic diagram of an embodiment of the upper body partial image in fig. 5a, fig. 5d is a schematic diagram of an embodiment of the middle body partial image in fig. 5a, and fig. 5e is a schematic diagram of an embodiment of the lower body partial image in fig. 5 a.

Of course, in some cases, the specific target contained in the training image may not be clear, which may result in that all of the human body part keypoints are not detected using the human body keypoint detection model, or the confidence of the detected human body part keypoints is below a threshold. At this time, please refer to fig. 6, wherein fig. 6 is a flowchart illustrating an embodiment corresponding to step S202 in fig. 4. The step S202 specifically includes:

s301: a plurality of human body part key points belonging to a current body part are obtained.

S302: and judging whether all human body part key points of the current body part are detected or not, wherein the confidence coefficient of each human body part key point is greater than or equal to a threshold value.

For example, when the current body part is the upper half of the body, it is determined whether all the human body part key points of the head, the neck, the left shoulder, the right shoulder, the left elbow, the right elbow, the left wrist, the right wrist, the left waist and the right waist are detected, and the confidence of the human body part key points is greater than or equal to a threshold; the specific threshold value can be set according to actual requirements.

When the current body part is the middle part of the body, judging whether the human body part key points of the left waist, the right waist, the left knee and the right knee are all detected, wherein the confidence coefficient of each human body part key point is greater than or equal to a threshold value; the specific threshold value can be set according to actual requirements.

When the current body part is the lower half of the body, judging whether all the human body part key points of the left knee, the right knee, the left ankle and the right ankle are detected, wherein the confidence coefficient of each human body part key point is greater than or equal to a threshold value; the specific threshold value can be set according to actual requirements.

If yes, go directly to step S303: and taking the image corresponding to the minimum external distance shape of the key points of the plurality of human body parts as a local image of the current body part.

Otherwise, go to step S304: updating the human body part key points which are not detected or detected but have confidence coefficient smaller than a threshold value in the current body part by using the existing data; and then proceeds to step S303.

Specifically, for the key points of the human body part that are not detected or detected in the current body part but have a confidence level smaller than the threshold, the data already existing in step S304 may be a data set of key points of the same human body part that have been detected in other training images and have a confidence level greater than or equal to the threshold, and the positions of the key points of the human body part in the current body part may be obtained by using the existing data set. For example, the average position of the existing data set may be used as the position of the key point of the body part in the current body part. For another example, the specific target has a motion trajectory in the current training set, and the motion trajectory of the key point of the human body part can be obtained by using the existing data set, and the position of the key point of the human body part in the current body part can be obtained by using the motion trajectory.

Of course, in other embodiments, when there is a part of key points of the human body part in the current body part that is not detected, or the confidence of the key points of the part of the human body part is smaller than the threshold, the region of the current body part can be directly obtained by using the existing data. The existing data may be regions of the same body part in other training images that have been correctly detected and recognized, and the average of the existing regions of a plurality of the same body parts may be used as the region of the current body part.

S103: the local images of all body parts of each training image are used to obtain the features of the corresponding training image.

Specifically, the specific implementation process of step S103 may be: A. extracting the characteristics of local images of all body parts; for example, the feature extraction of the local image may be performed using an existing pedestrian re-identification reid model as a feature extractor. B. And performing one-dimensional feature vector splicing on the features of all body parts of each training image to obtain the features of the corresponding training images.

For example, the training set may be represented as Φ_g＝{I_q1，I_q2，...，I_qm，I_g1，I_g2，...，I_gn}; for a certain training image I_gkThe corresponding upper body partial image, the body middle partial image and the lower body partial image are I respectively_gku，I_gkmAnd I_gkd. Extracting the local image by using a reid model as a feature extractor to obtain the features of the upper body, the middle body and the lower body, which are respectively marked as f_gku，f_gkmAnd f_gkd. Features of the training image

Wherein,

representing the stitching of one-dimensional feature vectors. The general pedestrian re-recognition task or the video structuring task is often more concerned about the characteristics of human body parts, and other contents except human body targets in the image are often interfered with the subsequent recognition and classification. The content except the human body target in the image can be ignored by the characteristic splicing mode.

Of course, in other embodiments, the specific implementation process of step S103 may also be: A. extracting the characteristics of the local images of all body parts and the whole image of the specific target; the above-mentioned specific target overall image can be obtained from the training image by means of conventional image recognition. The specific feature extraction method is similar to that in the above embodiments, and is not described herein again. B. And performing one-dimensional feature vector splicing on the features of all body parts of each training image and the overall features of the specific target to obtain the features of the corresponding training images.

For example, assume that the upper body feature, the middle body feature, the lower body feature, and the overall feature of a specific object are respectively denoted as f_gku，f_gkm、f_gkd、f_gkp. Features of the training image

Wherein,

representing the stitching of one-dimensional feature vectors. The content except the human body target in the image can be ignored by the characteristic splicing mode.

S104: and obtaining the similarity of the training image relative to a plurality of reference images by using the characteristics of each training image, wherein each reference image contains a specific target.

Specifically, in this embodiment, before the step S104, a plurality of reference images need to be selected; the reference images may be selected from the training set, and the specific selection process is as follows: taking the training images which accord with the preset conditions in the training set as reference images; wherein the preset condition comprises at least one of the following conditions: the outline definition of the specific target in the training image exceeds a first preset value, the quality of the training image exceeds a second preset value, the body part of the specific target in the training image is complete, and the specific target is only contained in the training image. Since the reference image is an image in the training set, the feature of the reference image does not need to be obtained subsequently, so as to reduce the calculation amount. Of course, in other embodiments, the plurality of reference images may also be images outside the current training set that meet the preset condition.

Further, the specific implementation process of step S104 may be: for each training image, obtaining the similarity between the features of the training image and the features of each reference image; when the training image and the reference image are the same image, the similarity is 0; when the training image and the reference image are different images, the similarity is cosine similarity.

For example, assume a training set image feature set F_g＝{f_q1，f_q2，...，f_qm，f_g1，f_g2，...，f_gnReference set image feature set F_q＝{f_q1，f_q2，...，f_qm}; the similarity calculation formula of each training image and each reference image in the training set is as follows:

optionally, for simplicity of the calculation process, a matrix calculation mode can be directly introduced; for example, the features of all reference images may be first arranged in a column to form a first matrix, and the features of all training images may be arranged in a row to form a second matrix; obtaining a similarity matrix by using the first matrix and the second matrix, wherein when the reference image and the training image are the same image, the value of the reference image and the value of the training image in the similarity matrix are 0; when the reference image and the training image are different images, the value of the similarity matrix is the cosine similarity of the features of the reference image and the features of the training image.

For example, assume a training set image feature set F_g＝{f_q1，f_q2，...，f_qm，f_g1，f_g2，...，f_gnReference set image feature set F_q＝{f_q1，f_q2，...，f_qmAnd obtaining a similarity matrix with m rows and n columns by matrix multiplication and pairwise similarity calculation:

each column in the similarity matrix represents similarity obtained by pairwise calculation of the same training image and different reference images.

S105: and setting labels of the training images according to a plurality of similarities corresponding to each training image.

Specifically, in this embodiment, the specific implementation process of the step S105 may be: for each training image, the maximum value of the plurality of similarities corresponding to the training image is taken as its label. Alternatively, when the similarity matrix as shown in step S104 described above is obtained, the maximum value of each column may be directly obtained. The mode of setting up the label is comparatively simple, accurate.

In summary, in the label construction method provided by the present application, body parts of a specific target in a training image are divided, and features of the training image are obtained according to local images of all body parts; and then, setting labels by calculating the similarity between the features of the training image and the reference image. The mode that this application provided can reduce the intervention of artifical mark, can be more objective reflect the actual complete degree of target, and its robustness is stronger, and can be applicable to downstream algorithm model more. In addition, the method can reduce the time required by the labeling process of a large number of training images and improve the efficiency.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of a tag building framework according to the present application, where the tag building framework includes a first obtaining module 10, a dividing module 12, a second obtaining module 14, a third obtaining module 16, and a setting module 18. The first obtaining module 10 is specifically configured to obtain a training set, where the training set includes a plurality of training images, and each training image includes a specific target. The segmentation module 12 is coupled to the first obtaining module 10, and is specifically configured to perform body part segmentation on a specific target in each training image to obtain a local image of a plurality of body parts. The second obtaining module 14 is coupled to the dividing module 12, and is specifically configured to obtain features of corresponding training images using the local images of all body parts of each training image. The third obtaining module 16 is coupled to the second obtaining module 14, and is specifically configured to obtain similarities of the training images with respect to a plurality of reference images by using features of each training image, where each reference image includes a specific target. The setting module 18 is coupled to the third obtaining module 16, and is specifically configured to set labels of the training images according to a plurality of similarities corresponding to each of the training images.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an embodiment of a tag constructing apparatus according to the present application. The label building apparatus comprises a processor 20 and a memory 22 coupled to each other for cooperating with each other to implement the label building method described in any of the above embodiments. In the present embodiment, the processor 20 may also be referred to as a CPU (Central Processing Unit). The processor 20 may be an integrated circuit chip having signal processing capabilities. The Processor 20 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In addition, the label building apparatus provided in the present application may further include other structures, such as a common display screen, a communication circuit, etc., which are not described in the present application.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a device with a storage function according to the present application. The apparatus 30 having a storage function has stored thereon program data 300, the program data 300 being executable by a processor to implement the tag construction method described in any of the above embodiments. The program data 300 may be stored in the storage device in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

The above embodiments are merely examples, and not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure, or their direct or indirect application to other related arts, are included in the scope of the present disclosure.

Claims

1. A label construction method is characterized in that,

obtaining a training set, wherein the training set comprises a plurality of training images, and each training image comprises a specific target;

performing body part segmentation on the specific target in each training image to obtain a plurality of local images of the body parts;

obtaining features of the corresponding training images using local images of all the body parts of each of the training images;

obtaining the similarity of the training image relative to a plurality of reference images by using the characteristics of each training image, wherein each reference image comprises the specific target;

and setting labels of the training images according to a plurality of similarity degrees corresponding to each training image.

2. The label building method according to claim 1, wherein the step of body part partitioning the specific target in each of the training images to obtain a plurality of partial images of the body part comprises:

obtaining a plurality of human body part key points of the specific target in each training image;

obtaining a plurality of local images of the body part according to the plurality of key points of the body part; wherein the local image of each body part comprises a plurality of key points of the body part.

3. The label building method according to claim 2, wherein the step of obtaining a plurality of partial images of the body part from the plurality of human body part keypoints comprises:

and acquiring a plurality of key points of the human body part, which belong to the current body part, and taking an image corresponding to the minimum external distance shape as a local image of the current body part.

4. The label building method according to claim 3, wherein before the step of using the image corresponding to the minimum outer contour as the current local image of the body part, the method further comprises:

judging whether all the human body part key points of the current body part are detected or not, wherein the confidence coefficient of each human body part key point is greater than or equal to a threshold value;

if yes, directly entering the step of taking the image corresponding to the minimum external distance shape as the local image of the current body part;

otherwise, updating the key points of the human body part which are not detected or detected in the current body part but have the confidence coefficient smaller than the threshold value by using the existing data, and entering the step of taking the image corresponding to the minimum external distance shape as the local image of the current body part.

5. The label building method according to claim 2,

the plurality of human body part key points include: a head, a neck, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, a left waist, a right waist, a left knee, a right knee, a left ankle, and a right ankle; the body part comprises an upper half body, a middle part of the body and a lower half body;

wherein the local image of the upper half comprises a head, a neck, a left shoulder, a right shoulder, a left elbow, a right elbow, a left wrist, a right wrist, a left waist and a right waist; the local image of the middle part of the body comprises a left waist, a right waist, a left knee and a right knee; the partial image of the lower body includes a left knee, a right knee, a left ankle, and a right ankle.

6. The label construction method according to claim 1, wherein the step of obtaining the corresponding feature of the training image by using the local images of all the body parts of each of the training images comprises:

performing feature extraction on the local images of all the body parts;

and performing one-dimensional feature vector splicing on the local images of all the body parts of each training image to obtain the corresponding features of the training images.

7. The label construction method according to claim 1, wherein the step of extracting the corresponding feature of the training image by using the local images of all the body parts of each of the training images comprises:

extracting features of the local images of all the body parts and the whole image of the specific target;

and performing one-dimensional feature vector splicing on the features of all the body parts of each training image and the overall features of a specific target to obtain the corresponding features of the training images.

8. The label construction method according to claim 1, wherein the step of obtaining the similarity of the training image with respect to the plurality of reference images using the features of each of the training images is preceded by the steps of:

taking the training images which meet preset conditions in the training set as the reference images; wherein the preset condition comprises at least one of the following: the outline definition of the specific target in the training image exceeds a first preset value, the quality of the training image exceeds a second preset value, the body part of the specific target in the training image is complete, and only the specific target is contained in the training image.

9. The label construction method according to claim 8, wherein the step of obtaining the similarity of the training image with respect to a plurality of reference images using the features of each of the training images comprises:

for each training image, obtaining the similarity between the features of the training image and the features of each reference image; when the training image and the reference image are the same image, the similarity is 0; and when the training image and the reference image are different images, the similarity is cosine similarity.

10. The label construction method according to claim 9, wherein the step of obtaining, for each of the training images, a similarity between the features of the training image and the features of each of the reference images comprises:

aligning features of all of the reference images in a column to form a first matrix and aligning features of all of the training images in a row to form a second matrix;

obtaining a similarity matrix by using the first matrix and the second matrix, wherein when the reference image and the training image are the same image, the value of the reference image and the value of the training image in the similarity matrix are 0; when the reference image and the training image are different images, the value of the reference image and the training image in the similarity matrix is the cosine similarity of the features of the reference image and the features of the training image.

11. The label construction method according to claim 9, wherein the step of setting labels of the training images according to the plurality of similarities corresponding to each of the training images includes:

for each of the training images, taking a maximum value of the plurality of the similarities corresponding to the training image as its label.

12. A label building apparatus, comprising: a processor and a memory, wherein the processor is coupled to the memory for implementing the label construction method of any of claims 1-11.

13. An apparatus having a memory function, characterized in that program data are stored thereon, which program data are executable by a processor to implement the label building method as claimed in any one of claims 1-11.