CN111739060A - Identification method, device and storage medium - Google Patents

Identification method, device and storage medium Download PDF

Info

Publication number
CN111739060A
CN111739060A CN201910706446.9A CN201910706446A CN111739060A CN 111739060 A CN111739060 A CN 111739060A CN 201910706446 A CN201910706446 A CN 201910706446A CN 111739060 A CN111739060 A CN 111739060A
Authority
CN
China
Prior art keywords
image
images
target
objects
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910706446.9A
Other languages
Chinese (zh)
Other versions
CN111739060B (en
Inventor
刘武
鲍慊
梅涛
阮威健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910706446.9A priority Critical patent/CN111739060B/en
Publication of CN111739060A publication Critical patent/CN111739060A/en
Application granted granted Critical
Publication of CN111739060B publication Critical patent/CN111739060B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses an identification method, equipment and a storage medium, wherein the method comprises the following steps: obtaining at least two images; obtaining key point information of each object in each image; determining a characteristic sequence of each object in each image at least based on the key point information of each object in each image; at least obtaining a first parameter between the objects in the at least two images based on the characteristic sequence of the objects in the images, wherein the first parameter is used for representing the similarity degree between the objects of different images in the at least two images; at least a target object is determined based on a first parameter between objects in the at least two images, the target object being a similar object between the at least two images.

Description

Identification method, device and storage medium
Technical Field
The present application relates to identification technologies, and in particular, to an identification method, device, and storage medium.
Background
Object tracking is the obtaining of the position and tracking number of each object in a video, such as the position of a pedestrian and the tracking number of a pedestrian, by computer vision techniques. In the related art, for an object in two adjacent frames of images in a video, two steps, such as two steps of correlating pedestrian recognition data and pedestrian tracking data, are generally required to at least realize the tracking of a pedestrian appearing in the two frames of images. Taking two adjacent frames as an example of a current frame and a previous frame, the current frame at least needs to distinguish a pedestrian appearing in the current frame and identical to the pedestrian appearing in the previous frame, a pedestrian newly appearing in the current frame, and a pedestrian appearing in the previous frame but disappearing in the current frame through the two steps. To ensure the accuracy of these two steps, the respective network models are usually used for the association of pedestrian identification and tracking data. It can be understood that before the network model for pedestrian recognition and the network model for association of tracking data are used for pedestrian recognition and association of tracking data, respective training of the models is required, and the trained network model is used for association of pedestrian recognition and tracking data. It can be understood that two independent network models are adopted to realize object tracking, on one hand, two different network models are trained, and the calculated amount is increased invisibly; on the other hand, the output of the network model for pedestrian recognition is usually the input of the associated network model for tracking data, and the two independent network models are processed in different ways, so that the associated network model for tracking data cannot obtain the expected input, that is, the two network models cannot be effectively connected.
Disclosure of Invention
In order to solve the existing technical problem, embodiments of the present application provide an identification method, an identification device, and a storage medium, which can at least avoid the problems that the computation amount is large and the models cannot be effectively connected due to the adoption of two independent network models in the related art.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides an identification method, which comprises the following steps:
obtaining at least two images;
obtaining key point information of each object in each image;
determining a characteristic sequence of each object in each image at least based on the key point information of each object in each image;
at least obtaining a first parameter between the objects in the at least two images based on the characteristic sequence of the objects in the images, wherein the first parameter is used for representing the similarity degree between the objects of different images in the at least two images;
at least a target object is determined based on a first parameter between objects in the at least two images, the target object being a similar object between the at least two images.
In the above scheme, the method includes:
obtaining at least two global feature maps for respective objects in respective images, wherein different global feature maps of the same object are at least partially different;
correspondingly, the determining the feature sequence of each object in each image based on at least the key point information of each object in each image includes:
and determining the local feature sequence of each object based on the at least two global feature maps and the key point information of each object.
In the foregoing solution, the determining a local feature sequence of each object based on at least two global feature maps and the keypoint information of each object includes:
the key point information of each object is at least represented as position information of at least two key parts of each object;
obtaining each target image of each object from each global feature map of each object, wherein the target image of each object is an image corresponding to the position relation of at least two key parts of each object in the global feature map of each object;
based on the respective target images of the respective objects, a local feature sequence of the respective objects is determined.
In the above solution, the obtaining at least a first parameter between objects in the at least two images based on the feature sequence of each object in each image includes:
will t beiRespective objects in the image and the t-thi-1Combining feature sequences of each object in the image pairwise to obtain feature tensor information, wherein the tth object is a feature tensoriImage, t-thi-1The images are two adjacent images in the at least two images;
performing convolution processing on the characteristic tensor information for at least two times to obtain two target matrixes, wherein at least part of elements in the target matrixes are represented as the tthiImage, t-thi-1The degree of similarity of any two objects in the image;
and at least obtaining first parameters between each object in the first image and each object in the second image based on the two target matrixes.
In the foregoing solution, the obtaining at least a first parameter between each object in the first image and each object in the second image based on the two object matrices includes:
performing normalized exponential function softmax operation on a first target matrix of the two target matrices according to columns to obtain a first matching probability matrix, wherein at least part of elements of the first matching probability matrix are represented as the tthiImages to ti-1A match probability between image objects;
performing softmax operation on a second target matrix of the two target matrices according to rows to obtain a second matching probability matrix, wherein at least part of elements in the second matching probability matrix are represented as the tthi-1Images to tiA match probability between image objects;
and taking the element value with a large value at the same position in the first matching probability matrix and the second matching probability matrix as a first parameter between corresponding objects in the first image and the second image.
An embodiment of the present application provides an identification device, which includes:
a first obtaining unit configured to obtain at least two images;
a second obtaining unit configured to obtain key point information of each object in each image;
a first determining unit, configured to determine a feature sequence of each object in each image based on at least the key point information of each object in each image;
a third obtaining unit, configured to obtain at least a first parameter between objects in the at least two images based on a feature sequence of each object in each image, where the first parameter is used to characterize a degree of similarity between objects in different images of the at least two images;
a second determining unit, configured to determine at least a target object based on at least the first parameter between the objects in the at least two images, where the target object is a similar object between the at least two images.
In the above solution, the apparatus further includes:
a fourth obtaining unit, configured to obtain at least two global feature maps for each object in each image, where different global feature maps of the same object are at least partially different;
correspondingly, the first determining unit is configured to determine the local feature sequence of each object based on the at least two global feature maps and the keypoint information of each object.
In the foregoing solution, the first determining unit is configured to:
the key point information of each object is at least represented as position information of at least two key parts of each object;
obtaining each target image of each object from each global feature map of each object, wherein the target image of each object is an image corresponding to the position relation of at least two key parts of each object in the global feature map of each object;
based on the respective target images of the respective objects, a local feature sequence of the respective objects is determined.
In the foregoing scheme, the third obtaining unit is configured to:
will t beiRespective objects in the image and the t-thi-1Combining feature sequences of each object in the image pairwise to obtain feature tensor information, wherein the tth object is a feature tensoriImage, t-thi-1The images are two adjacent images in the at least two images;
performing convolution processing on the characteristic tensor information for at least two times to obtain two target matrixes, wherein at least part of elements in the target matrixes are represented as the tthiImage, t-thi-1The degree of similarity of any two objects in the image;
and at least obtaining first parameters between each object in the first image and each object in the second image based on the two target matrixes.
In the foregoing scheme, the third obtaining unit is configured to:
performing normalized exponential function softmax operation on a first target matrix of the two target matrices according to columns to obtain a first matching probability matrix, wherein at least part of elements of the first matching probability matrix are represented as the tthiImages to ti-1A match probability between image objects;
performing softmax operation on a second target matrix of the two target matrices according to rows to obtain a second matching probability matrix, wherein at least part of elements in the second matching probability matrix are represented as the tthi-1Images to tiA match probability between image objects;
and taking the element value with a large value at the same position in the first matching probability matrix and the second matching probability matrix as a first parameter between corresponding objects in the first image and the second image.
An embodiment of the application provides an identification device, which comprises a processor and a storage medium for storing a computer program; wherein the processor is configured to execute at least the aforementioned identification method when executing the computer program.
The storage medium stores a computer program, and the computer program performs at least the foregoing identification method when executed.
The identification method, the equipment and the storage medium provided by the embodiment of the application comprise the following steps: obtaining at least two images; obtaining key point information of each object in each image; determining a characteristic sequence of each object in each image at least based on the key point information of each object in each image; at least obtaining a first parameter between the objects in the at least two images based on the characteristic sequence of the objects in the images, wherein the first parameter is used for representing the similarity degree between the objects of different images in the at least two images; at least a target object is determined based on a first parameter between objects in the at least two images, the target object being a similar object between the at least two images.
In the embodiment of the application, the feature sequence of each object in each image is determined based on the key point information of each object in each image, at least the similarity between each object in different images is obtained based on the feature sequence of the object, and which objects are similar objects between at least two images are determined according to the obtained similarity. Compared with the scheme of adopting two independent network models to realize object tracking in the related technology, the method at least does not need the connection between the models and has small calculation amount. In addition, the target object is determined based on the similarity degree between the objects in different images, and the identification accuracy of the target object can be ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart illustrating an implementation of a first embodiment of an identification method provided in the present application;
fig. 2 is a schematic flow chart illustrating an implementation of a second embodiment of the identification method provided in the present application;
FIG. 3 is a schematic diagram illustrating a network model according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating the operation of the network model provided herein;
FIG. 5 is a schematic illustration of the eigen-representation tensors provided herein;
fig. 6 is a schematic structural diagram illustrating a first embodiment of an identification device according to the present application;
fig. 7 is a schematic structural diagram of a second embodiment of an identification device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
In a first embodiment of the identification method provided in the present application, as shown in fig. 1, the method includes:
step 101: obtaining at least two images;
in this step, the at least two images may be adjacent images or non-adjacent images.
Step 102: obtaining key point information of each object in each image;
step 103: determining a characteristic sequence of each object in each image at least based on the key point information of each object in each image;
step 104: at least obtaining a first parameter between the objects in the at least two images based on the characteristic sequence of the objects in the images, wherein the first parameter is used for representing the similarity degree between the objects of different images in the at least two images;
step 105: at least a target object is determined based on a first parameter between objects in the at least two images, the target object being a similar object between the at least two images.
The entity performing steps 101-105 is any device that can be used to identify a target object.
In the foregoing solution, a feature sequence of each object in each image is determined based on the key point information of each object in each image, at least a similarity degree between each object in different images is obtained based on the feature sequence of the object, and which objects are similar objects between at least two images is determined according to the obtained similarity degree. Compared with the scheme of realizing object tracking by adopting two independent network models in the related technology, the method does not need the connection between the models and has small calculation amount. In addition, the target object is determined based on the similarity degree between the objects in different images, and the identification accuracy of the target object can be ensured.
In a second embodiment of the identification method provided in the present application, as shown in fig. 2, the method includes:
step 201: obtaining at least two images;
step 202: obtaining key point information of each object in each image;
step 203: obtaining at least two global feature maps for respective objects in respective images, wherein different global feature maps of the same object are at least partially different;
step 204: determining a local feature sequence of each object based on the at least two global feature maps and the key point information of each object;
step 205: at least obtaining a first parameter between each object in the at least two images based on the local feature sequence of each object in each image, wherein the first parameter is used for representing the similarity degree between the objects of different images in the at least two images;
step 206: at least a target object is determined based on a first parameter between objects in the at least two images, the target object being a similar object between the at least two images.
The entity performing steps 201-206 is any device that can be used to identify a target object.
In the foregoing solution, a local feature sequence of each object in each image is determined based on at least two global feature maps and keypoint information of each object in each image, at least a degree of similarity between each object in different images is obtained based on the local feature sequences of the objects, and which objects are similar objects between at least two images is determined according to the obtained degree of similarity. Compared with the scheme of realizing object tracking by adopting two independent network models in the related technology, the method does not need the connection between the models and has small calculation amount. In addition, the target object is determined based on the local features of the objects in each image, and the granularity is considered to be finer from the local features of the objects, so that the identification accuracy of the target object can be ensured.
Based on the foregoing second embodiment of the method, the determining a local feature sequence of each object based on at least two global feature maps and the keypoint information of each object includes:
the key point information of each object is at least represented as position information of at least two key parts of each object;
obtaining each target image of each object from each global feature map of each object, wherein the target image of each object is an image corresponding to the position relation of at least two key parts of each object in the global feature map of each object;
based on the respective target images of the respective objects, a local feature sequence of the respective objects is determined.
In the scheme, the local features of the object are determined by combining the global feature map of the object and the position information of the key part, the determination accuracy of the local features can be ensured, the target object is determined from the local features of the object, the considered granularity is more detailed, and the identification accuracy of the target object can be ensured.
In a first and/or second embodiment of the foregoing method, the obtaining at least a first parameter between the objects in the at least two images based on the feature sequences of the objects in the images includes:
will t beiRespective objects in the image and the t-thi-1Combining feature sequences of each object in the image pairwise to obtain feature tensor information, wherein the tth object is a feature tensoriImage, t-thi-1The images are two adjacent images in the at least two images;
performing convolution processing on the characteristic tensor information for at least two times to obtain two target matrixes, wherein at least part of elements in the target matrixes are represented as the tthiImage, t-thi-1The degree of similarity of any two objects in the image;
and at least obtaining first parameters between each object in the first image and each object in the second image based on the two target matrixes.
In the foregoing scheme, the first parameter is obtained based on the feature sequences of the objects in the adjacent images, so that the accuracy of obtaining the first parameter can be ensured.
In the foregoing solution, the obtaining at least a first parameter between each object in the first image and each object in the second image based on the two object matrices includes:
performing normalized exponential function softmax operation on a first target matrix of the two target matrices according to columns to obtain a first matching probability matrix, wherein the first matching probability matrix is obtainedAt least some of the elements of the probability matrix are denoted as tthiImages to ti-1A match probability between image objects;
performing softmax operation on a second target matrix of the two target matrices according to rows to obtain a second matching probability matrix, wherein at least part of elements in the second matching probability matrix are represented as the tthi-1Images to tiA match probability between image objects;
and taking the element value with a large value at the same position in the first matching probability matrix and the second matching probability matrix as a first parameter between corresponding objects in the first image and the second image.
In the above scheme, some elements in the first matching probability matrix are denoted as tthiImages to ti-1The match probability between image objects can be viewed as a backward match probability matrix. Some elements in the second match probability matrix are denoted as tthi-1Images to tiThe match probability between image objects can be viewed as a forward match probability matrix. The first parameter is obtained according to the two probability matrixes, namely the forward matching probability matrix and the backward matching probability matrix, and the first parameter is calculated from the non-single probability matrix, so that the calculation accuracy of the first parameter can be improved, and the identification accuracy of the target object is further ensured.
The technical solution of the embodiment of the present application will be described with reference to fig. 3 to 5.
As shown in fig. 3, in the embodiment of the present application, a network model for identifying a target object is provided, and for an object, such as a pedestrian, appearing in two adjacent frames of images, a matching probability between pedestrians in the two adjacent frames of images is obtained by using the network model, and based on the matching probability, which pedestrian/pedestrians in a current frame is/are the same person as which pedestrian/pedestrians in a previous frame is/are a person in the current frame, which pedestrians are persons who newly appear in the current frame, and which pedestrians appear in the previous frame and disappear in the current frame. Namely, the network model is utilized to realize the matching between the pedestrians in the two adjacent frames of images.
In the embodiment of the application, the network model at least comprises a feature learning network and a measurement network. It should be understood by those skilled in the art that before the feature learning network and the measurement network are used to identify the target object, the network model, specifically, the feature learning network and the measurement network, needs to be trained (training phase), and the trained network model is used to identify the target object (testing/applying phase).
The training phase and the application (test) phase are described separately below:
a training stage: it should be noted that, in the training phase, two adjacent frames of images are used: t thi-1Frame image and tiAnd training the network model by using the frame image.
The specific scheme is as follows:
step 500: collecting two adjacent frames of images, and carrying out human body detection on each collected (frame) image so as to identify all pedestrians in each frame of image;
in this step, the t-th acquisitioni-1Frame image and tiFrame image, as shown in FIG. 4, ti-1Frame image and tiThe frame images are all images including a person and a certain background, and the human body detection is carried out on each frame image in the two frame images by adopting target detection methods such as SSD (solid State disk), Faster RCNN (fast Multi-object Detector), YOLO (fast Multi object Detector) and the like, so that the person in the two frame images is identified, and each pedestrian in each frame image is distinguished by using a detection frame.
The method can be preferentially adopted for human body detection in consideration of high identification accuracy of fast RCNN.
Step 501: calculating key point information of each pedestrian in each image of two adjacent frames of images;
in this step, the calculation of the key point information is performed for each pedestrian in each frame of image. Considering that the object in the application scene is a pedestrian, the key point information of the object is specifically the position coordinates of the human body trunk of each pedestrian in the detection frame of the pedestrian. The key point information of the pedestrian in the embodiment of the application is the human body key point of each pedestrian, and the position coordinates of 14 parts such as a head, a neck, left and right shoulders, left and right buttocks, left and right elbows, left and right knees, left and right ankles and the like in the detection frame of the pedestrian are included.
Wherein, the calculation of the human body key points can adopt a human body key point detection model such as an Hourglass (Hourglass) network. Based on this, as can be known to those skilled in the art, the network model of the embodiment of the present application includes a human body key point detection model in addition to the feature learning network and the measurement network. The human body key point detection model can be directly used without training in the embodiment of the application.
Step 502: inputting the two adjacent frames of images and the detection frames of all pedestrians in each frame of image into a network model, specifically a feature learning network, and obtaining at least two global feature maps for each pedestrian in each frame of image;
step 503: obtaining each target image of each object from each global feature map of each object, wherein the target image of each object is an image corresponding to the position relation of at least two key parts of each object in the global feature map of each object;
step 504: determining a sequence of local features, such as (final) feature representation vectors, of each object from each target image of each object;
the steps 502-504 are implemented based on at least a feature learning network. The feature learning network may be specifically a vgg (visual Geometry group) neural network or a residual error (ResNet) network. With ResNet, considering that the deeper the ResNet network structure (the more residual blocks) can extract relatively more abundant features, in particular, with a ResNet101 network, the ResNet101 includes at least two residual blocks, each of which includes at least two convolutional layers, and those skilled in the art should understand that the convolutional layers are used for performing feature map calculation on images input to the convolutional layers.
As shown in fig. 4, in a specific implementation, each frame image input to the feature learning network, specifically the ResNet101 network, outputs a certain feature map through the operation of each residual block in the ResNet101 network, and the last convolutional layer of each residual block. In the embodiment of the present application, for each pedestrian in each frame image, a feature map output by the last convolutional layer of 4 residual blocks is extracted, for example, a feature map conv2 output by the last convolutional layer of the 2 nd residual block, a feature map conv3 output by the last convolutional layer of the 3 rd residual block, a feature map conv4 output by the last convolutional layer of the 4 th residual block, and a feature map conv5 output by the last convolutional layer of the 5 th residual block are extracted as four feature maps of the corresponding pedestrian. The four feature maps of the corresponding pedestrian can be regarded as global feature representations of the pedestrian on four levels. The convolutional layers of different residual blocks can be regarded as different levels, and the convolutional layers of four residual blocks can be regarded as four levels. In general, feature information represented by feature maps output at different levels is different, and the deeper (more) the level is, the more abundant the image feature information is represented. It can be understood that as the hierarchy is deepened, the size of the feature map on the hierarchy is smaller, but the expressed image feature information is richer. The feature map output by the last convolutional layer of each residual block is typically a global feature map. It can be understood that at least two global feature maps of each pedestrian for each frame image are obtained based on the feature learning network in the embodiment of the application. Wherein, the global feature map representation contains the feature description of the pedestrian and the surrounding, and is usually not fine enough. In the embodiment of the present application, in combination with the human body key point information obtained in step 501, for each pedestrian, in the global feature map for each level of the pedestrian, the image (target image) features at the positions corresponding to the plurality of human body parts are intercepted as the local feature vectors at the corresponding level.
Taking the aforementioned 14 key parts as an example, for each pedestrian, 14 local feature vectors can be obtained on each level, and each local feature vector is a feature vector of the position of the corresponding part in the global feature map. For each pedestrian, a total of 14 local feature vectors at four levels are obtained. On each level, averaging 14 local feature vectors of a certain pedestrian to obtain feature representation of the pedestrian on the level; and processing the feature vectors on each level by a convolution layer with the convolution kernel size of 1 × 1 to obtain dimension-reduced feature representations, and then connecting (corresponding) the obtained feature representations on each level subjected to dimension-reduction processing together to obtain a (final) feature representation vector of the pedestrian. The aforementioned (final) feature representation vector of a certain object, such as a pedestrian, can be regarded as a local feature sequence of the object in the embodiment of the present application.
In this step, the local features of the human body are taken into consideration, so that fine-grained consideration is realized, and parameters such as feature tensor information and the first parameter obtained based on the local features of the human body are more accurate.
In this step, the value of the number of the hierarchy levels is 4, and the value of the number of the key parts is 14, it can be understood that both the number of the hierarchy levels and the value of the key parts can be flexibly set according to actual conditions. Similarly, the size of the convolution kernel used for the dimensionality reduction processing of the feature vectors on each level can take any reasonable value, and is not particularly limited.
Step 505: will t beiRespective objects in the image and the t-thi-1Combining feature sequences of each object in the image in pairs to obtain feature tensor information;
the scheme of the previous step 504 can obtain the ti-1Frame image and tiThe (final) features of all pedestrians in the frame image represent vectors. Since the number of pedestrians contained in each frame of image is not necessarily the same, in order to obtain a uniform representation, the number of pedestrians in each frame is expanded to N (for example, the value is 60), and when the number of actual pedestrians in each frame of image is less than N, the feature representation vectors of pedestrians that are not present in these images are filled with zero vectors. Will t bei-1Frame image and tiThe pedestrian feature expression vectors of the frame images are combined in pairs and randomly to obtain a feature expression tensor S (feature tensor information) between two continuous frames of pedestrians.
In specific implementation, since two adjacent frames of images have time sequence, such as t-th timeiFrame image later than tthi-1Frame image, arbitrary combination between pedestrian feature expression vectors of respective pedestrians in two images, tthi-1The pedestrian feature representation vector of the pedestrian in the frame image is located in front of or behind the combination result, and the meaning is different. Any combination in the embodiments of the present application includes the tthi-1The pedestrian feature of the pedestrian in the frame image indicates a case where the vector is located in front of the combination result, and also includes a case where the vector is located behind the combination result. In the following, for example, let us assume the t-thi-1Frame image, t-thiThree pedestrians are included in each frame image. See A1, A2, A3 as ti-1Feature representation vectors of three persons included in the frame image; b1, B2 and B3 are the t-thiThe feature representation vectors of the three persons included in the frame image. Then t isi-1Frame image, t-thiEach frame image includes any two-by-two combination of feature expression vectors of P ═ 3 pedestrians, and the obtained feature expression tensor S is shown in fig. 5. The dimension of the eigen-representation tensor S is P Nc(ii) a Wherein N iscThe representation feature represents the length of the vector.
Step 506: inputting the feature representation tensor S into a measurement network to obtain a first target matrix (represented by M1) and a second target matrix (represented by M2);
the measurement network in this step is a similarity measurement network, and can be specifically realized by at least two convolution processes, for example, the measurement network is realized by using 5 convolution layers with convolution kernel 1 × 1, and the feature representation tensor S obtains a similarity measurement matrix M through the processing of the convolution layers, where the dimension of the matrix is N × N. In the process of pedestrian tracking, there are situations that pedestrians enter and leave the scene, and the similarity measurement matrix M does not take this situation into account, so that by supplementing one row or one column on the similarity measurement matrix M to respectively indicate that a pedestrian enters or leaves the current frame, elements in the supplemented row or column take on a hyper-parameter σ obtained according to experience. The similarity measure matrix after the apparent supplementary row is M1 (first target matrix) with dimension (N +1) × N. The similarity measure matrix after the supplementary columns is considered to be M2 (second target matrix), dimension N x (N + 1).
Step 507: carrying out normalized exponential function (softmax) operation on the first target matrix M1 according to columns, and carrying out softmax operation on the second target matrix M2 according to rows to obtain a first matching probability matrix and a second matching probability matrix;
in this step, the similarity measurement matrices M1 and M2 are respectively column-wisesoftmax operation and softmax by row operation, thereby obtaining a first match probability matrix (with A)bTo be represented) and a second matching probability matrix (denoted by a)fTo indicate). Wherein A isbCode the t thiFrame image to ti-1The matching probability between the pedestrians in the frame image is a backward matching probability matrix; a. thefCode the t thi-1Frame image to tiAnd the matching probability among the pedestrians in the frame image is a forward matching probability matrix. For concrete implementation of softmax operation, please refer to the related description.
Step 508: obtaining the tiFrame image and ti-1A true matching incidence matrix G of the frame image based on the true matching incidence matrix G and the first matching probability matrix AbAnd a second matching probability matrix AfAnd obtaining the trained network model.
In this step, in the training phase, the t-th labeliFrame image and ti-1And the dimension of a real matching incidence matrix G of the frame image is (N +1) × (N +1), and the matrix element is 0 or 1. When a certain pedestrian is at the t-th positioniFrame image and ti-1And when the frame images exist, the value of the element at the corresponding position is 1, otherwise, the value is 0. Wherein the first (N-1) row and the first (N-1) column of the matrix G represent the matching relationship between two consecutive frames of pedestrians. Line N and column N give the current frame, tthiPedestrian indices of presence and departure in the frame image. If a certain pedestrian leaves at the current frame, the corresponding position of the Nth row is 1, otherwise, the corresponding position is 0; and if a certain pedestrian appears in the current frame, setting the corresponding position of the Nth row as 1, otherwise, setting the corresponding position as 0.
The loss function of the network model is realized by the average of the following four loss functions.
Loss function one:
Figure BDA0002152269570000141
a second loss function:
Figure BDA0002152269570000151
a loss function three:
Figure BDA0002152269570000152
a loss function of four:
Figure BDA0002152269570000153
wherein G isf、GbRemoving the matrix of the last row and the last column from the true matching incidence matrix G; gwRepresenting a matrix in which the last row and the last column of the true matching incidence matrix G are removed simultaneously;
Figure BDA0002152269570000154
represents removal of AfThe matrix of the last row is then,
Figure BDA0002152269570000155
represents removal of AbThe matrix of the last column. Wherein,
Figure BDA0002152269570000156
equivalent to the product of corresponding elements of two matrixes, equivalent to the modulus of the two matrixes, log logarithm ∑ijSumming all elements of the matrix; max () takes the maximum of the two. The final loss function of the network model is the average of the four loss functions.
It should be understood by those skilled in the art that the training process of the above steps 500-508 is actually to find the optimal values of the convolution weight parameters of the feature learning network, specifically the ResNet101 convolution layer, and the convolution weight parameters of the convolution layers in the measurement network when the final loss function of the network model is the minimum value. And the characteristic learning network and the measurement network when the final loss function of the network model is the minimum value can be used as the trained network, and the network model at least comprising the characteristic learning network and the measurement network is the trained model.
The testing/application stage: and matching the pedestrians of two adjacent frames by using the trained network model.
In particular toLet the adjacent two frames at this stage be tiFrame image and ti+1Frame image, will tiFrame image and ti+1The frame image is executed with the steps 500-507 to obtain the t-th imageiFrame image and ti+1First matching probability matrix A of frame imagebAnd a second matching probability matrix Af. Obtaining a first matching probability matrix AbAnd a second matching probability matrix AfAt the position of the corresponding element, i.e. maximum
Figure BDA0002152269570000157
The elements in a represent the probability of similarity (match probability) between the pedestrians of two consecutive frames. Determining the tth based on the matching similarityi+1Which pedestrian or pedestrians in the frame image and the t-thiWhich pedestrian is the same pedestrian in the frame image and which pedestrian is at the t-th positioni+1The newly appearing pedestrian in the frame image, which pedestrian/pedestrians is/are at the t-thi+1A pedestrian disappearing in the frame image.
For example, the first and second light sources may be,
Figure BDA0002152269570000161
the value of the element on the first row and the first column is 0.3, which represents the t-thiA1 user and t in frame imagei+1The similarity probability of the B1 user in the frame image is 0.3;
Figure BDA0002152269570000162
the value of the element on the first row and the first column is 0.8, which represents the t-thi+1B1 user in frame image and tiThe similarity probability of the a1 user in the frame image is 0.8; then A is*The value of the element in the first row and the first column is the maximum value of 0.8 of 0.3 and 0.8, which represents the tthiA1 user and t in frame imagei+1If the probability that the B1 user in the frame image is the same person is 80% and is greater than or equal to the first preset probability, the A1 user and the B1 user appear at the t-th positioniFrame image and ti+1The same person in the frame image. If based on
Figure BDA0002152269570000163
Knows t-thi+1A user appears in the frame image, which corresponds to the t-thiIf the similarity probability of all users appearing in the frame image is less than the second threshold, the user is regarded as being at the t-thi+1New users appearing in the frame image. If based on
Figure BDA0002152269570000164
Knows t-thiA user appears in the frame image, which corresponds to the t-thi+1If the similarity probability of all users appearing in the frame is less than the third threshold, the user is considered as being at the t-thiAppearing in the frame image but at the t-thi+1Users whose frames disappear.
Compared with a scheme in the related art in which two independent network models are adopted to realize object tracking and both the two independent network models need independent training, in the scheme of the embodiment of the application, one network model is adopted to match the similarity between pedestrians among images, and only the network model is trained, so that the calculation amount can be greatly reduced. In addition, the technical scheme in the embodiment of the application realizes and does not need the connection between the models in one go, and can avoid the problem that the models cannot be effectively connected due to the adoption of two independent models in the related technology. In addition, the similarity degree is obtained based on the local feature sequence of the object, and the considered granularity is finer from the local feature of the object, so that the identification accuracy of the target object can be ensured. The similarity degree between the pedestrians is obtained according to the two probability matrixes, namely the forward matching probability matrix and the backward matching probability matrix, and the identification accuracy of the target object can be ensured.
An embodiment of the present application further provides an identification device, as shown in fig. 6, the identification device includes: a first obtaining unit 601, a second obtaining unit 602, a first determining unit 603, a third obtaining unit 604, and a second determining unit 605; wherein,
a first obtaining unit 601 for obtaining at least two images;
a second obtaining unit 602, configured to obtain key point information of each object in each image;
a first determining unit 603 configured to determine a feature sequence of each object in each image based on at least the key point information of each object in each image;
a third obtaining unit 604, configured to obtain at least a first parameter between the objects in the at least two images based on the feature sequence of each object in each image, where the first parameter is used to characterize a similarity degree between the objects;
a second determining unit 605, configured to determine at least a target object based on at least the first parameter between the objects in the at least two images, where the target object is a similar object between the at least two images.
In an optional embodiment, the apparatus further comprises:
a fourth obtaining unit, configured to obtain at least two global feature maps for each object in each image, where different global feature maps of the same object are at least partially different;
correspondingly, the first determining unit is configured to determine the local feature sequence of each object based on the at least two global feature maps and the keypoint information of each object.
In an optional embodiment, the first determining unit 601 is configured to:
the key point information of each object is at least represented as position information of at least two key parts of each object;
obtaining each target image of each object from each global feature map of each object, wherein the target image of each object is an image corresponding to the position relation of at least two key parts of each object in the global feature map of each object;
based on the respective target images of the respective objects, a local feature sequence of the respective objects is determined.
In an alternative embodiment, the third obtaining unit 604 is configured to:
will t beiRespective objects in the image and the t-thi-1Sequence of features of individual objects in an imageCombining every two to obtain the characteristic tensor information, wherein the tthiImage, t-thi-1The images are two adjacent images in the at least two images;
performing convolution processing on the characteristic tensor information for at least two times to obtain two target matrixes, wherein at least part of elements in the target matrixes are represented as the tthiImage, t-thi-1The degree of similarity of any two objects in the image;
and at least obtaining first parameters between each object in the first image and each object in the second image based on the two target matrixes.
In an alternative embodiment, the third obtaining unit 604 is configured to:
performing normalized exponential function softmax operation on a first target matrix of the two target matrices according to columns to obtain a first matching probability matrix, wherein at least part of elements of the first matching probability matrix are represented as the tthiImages to ti-1A match probability between image objects;
performing softmax operation on a second target matrix of the two target matrices according to rows to obtain a second matching probability matrix, wherein at least part of elements in the second matching probability matrix are represented as the tthi-1Images to tiA match probability between image objects;
and taking the element value with a large value at the same position in the first matching probability matrix and the second matching probability matrix as a first parameter between corresponding objects in the first image and the second image.
It should be noted that, in the identification device according to the embodiment of the present application, because the principle of solving the problem is similar to that of the identification method, the implementation process and the implementation principle of the identification device can be described by referring to the implementation process and the implementation principle of the identification method, and repeated details are not repeated.
The first obtaining Unit 601, the second obtaining Unit 602, the first determining Unit 603, the third obtaining Unit 604, and the second determining Unit 605 may be implemented by a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU), or a Programmable Gate Array (FPGA) in the identification device in practical application.
The present application also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, perform at least the steps of the identification method of the foregoing embodiment. The computer readable storage medium may be specifically a memory. The memory may be the memory 72 as shown in fig. 7.
The embodiment of the application also provides the identification equipment. Fig. 7 is a schematic diagram of a hardware structure of an identification device according to an embodiment of the present application, and as shown in fig. 7, the identification apparatus includes: a communication component 73 for data transmission, at least one processor 71 and a memory 72 for storing computer programs capable of running on the processor 71. The various components in the terminal are coupled together by a bus system 74. It will be appreciated that the bus system 74 is used to enable communications among the components of the connection. The bus system 74 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 74 in fig. 7.
Wherein the processor 71 performs the steps of the identification method of the previous embodiment.
It will be appreciated that the memory 72 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 72 described in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
The method disclosed in the above embodiments of the present application may be applied to the processor 71, or implemented by the processor 71. The processor 71 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 71. The processor 71 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Processor 71 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 72, and the processor 71 reads the information in the memory 72 and performs the steps of the aforementioned methods in conjunction with its hardware.
In an exemplary embodiment, the recognition Device may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, MCUs, microprocessors (microprocessors), or other electronic components for performing the aforementioned recognition method.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.
The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. An identification method, characterized in that the method comprises:
obtaining at least two images;
obtaining key point information of each object in each image;
determining a characteristic sequence of each object in each image at least based on the key point information of each object in each image;
at least obtaining a first parameter between the objects in the at least two images based on the characteristic sequence of the objects in the images, wherein the first parameter is used for representing the similarity degree between the objects of different images in the at least two images;
at least a target object is determined based on a first parameter between objects in the at least two images, the target object being a similar object between the at least two images.
2. The method according to claim 1, characterized in that it comprises:
obtaining at least two global feature maps for respective objects in respective images, wherein different global feature maps of the same object are at least partially different;
correspondingly, the determining the feature sequence of each object in each image based on at least the key point information of each object in each image includes:
and determining the local feature sequence of each object based on the at least two global feature maps and the key point information of each object.
3. The method of claim 2, wherein the determining the local feature sequence of each object based on the at least two global feature maps and the keypoint information of each object comprises:
the key point information of each object is at least represented as position information of at least two key parts of each object;
obtaining each target image of each object from each global feature map of each object, wherein the target image of each object is an image corresponding to the position relation of at least two key parts of each object in the global feature map of each object;
based on the respective target images of the respective objects, a local feature sequence of the respective objects is determined.
4. The method according to any one of claims 1 to 3, wherein the obtaining at least a first parameter between each object in the at least two images based on the feature sequence of each object in each image comprises:
will t beiRespective objects in the image and the t-thi-1Combining feature sequences of each object in the image pairwise to obtain feature tensor information, wherein the tth object is a feature tensoriImage, t-thi-1The images are two adjacent images in the at least two images;
performing convolution processing on the characteristic tensor information for at least two times to obtain two target matrixes, wherein at least part of elements in the target matrixes are represented as the tthiImage, t-thi-1The degree of similarity of any two objects in the image;
and at least obtaining first parameters between each object in the first image and each object in the second image based on the two target matrixes.
5. The method of claim 4, wherein obtaining at least a first parameter between each object in the first image and each object in the second image based on the two object matrices comprises:
performing normalized exponential function softmax operation on a first target matrix of the two target matrices according to columns to obtain a first matching probability matrix, wherein at least part of elements of the first matching probability matrix are represented as the tthiImages to ti-1A match probability between image objects;
the second target in the two target matrixesPerforming softmax operation on the matrix according to rows to obtain a second matching probability matrix, wherein at least part of elements in the second matching probability matrix are represented as the tthi-1Images to tiA match probability between image objects;
and taking the element value with a large value at the same position in the first matching probability matrix and the second matching probability matrix as a first parameter between corresponding objects in the first image and the second image.
6. An identification device, characterized in that the device comprises:
a first obtaining unit configured to obtain at least two images;
a second obtaining unit configured to obtain key point information of each object in each image;
a first determining unit, configured to determine a feature sequence of each object in each image based on at least the key point information of each object in each image;
a third obtaining unit, configured to obtain at least a first parameter between objects in the at least two images based on a feature sequence of each object in each image, where the first parameter is used to characterize a degree of similarity between objects in different images of the at least two images;
a second determining unit, configured to determine at least a target object based on at least the first parameter between the objects in the at least two images, where the target object is a similar object between the at least two images.
7. The apparatus of claim 6, further comprising:
a fourth obtaining unit, configured to obtain at least two global feature maps for each object in each image, where different global feature maps of the same object are at least partially different;
correspondingly, the first determining unit is configured to determine the local feature sequence of each object based on the at least two global feature maps and the keypoint information of each object.
8. The apparatus of claim 7, wherein the first determining unit is configured to:
the key point information of each object is at least represented as position information of at least two key parts of each object;
obtaining each target image of each object from each global feature map of each object, wherein the target image of each object is an image corresponding to the position relation of at least two key parts of each object in the global feature map of each object;
based on the respective target images of the respective objects, a local feature sequence of the respective objects is determined.
9. The apparatus according to any of claims 6 to 8, characterized by said third obtaining unit for:
will t beiRespective objects in the image and the t-thi-1Combining feature sequences of each object in the image pairwise to obtain feature tensor information, wherein the tth object is a feature tensoriImage, t-thi-1The images are two adjacent images in the at least two images;
performing convolution processing on the characteristic tensor information for at least two times to obtain two target matrixes, wherein at least part of elements in the target matrixes are represented as the tthiImage, t-thi-1The degree of similarity of any two objects in the image;
and at least obtaining first parameters between each object in the first image and each object in the second image based on the two target matrixes.
10. The apparatus of claim 9, wherein the third obtaining unit is configured to:
performing normalized exponential function softmax operation on a first target matrix of the two target matrices according to columns to obtain a first matching probability matrix, wherein at least part of elements of the first matching probability matrix are represented as the tthiImages to ti-1A match probability between image objects;
performing softmax operation on a second target matrix of the two target matrices according to rows to obtain a second matching probability matrix, wherein at least part of elements in the second matching probability matrix are represented as the tthi-1Images to tiA match probability between image objects;
and taking the element value with a large value at the same position in the first matching probability matrix and the second matching probability matrix as a first parameter between corresponding objects in the first image and the second image.
11. An identification device comprising a processor and a storage medium for storing a computer program; wherein the processor is adapted to perform at least the identification method of any of claims 1 to 5 when executing the computer program.
12. A storage medium storing a computer program which, when executed, performs at least the identification method of any one of claims 1 to 5.
CN201910706446.9A 2019-08-01 2019-08-01 Identification method, equipment and storage medium Active CN111739060B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910706446.9A CN111739060B (en) 2019-08-01 2019-08-01 Identification method, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910706446.9A CN111739060B (en) 2019-08-01 2019-08-01 Identification method, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111739060A true CN111739060A (en) 2020-10-02
CN111739060B CN111739060B (en) 2024-07-19

Family

ID=72646014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910706446.9A Active CN111739060B (en) 2019-08-01 2019-08-01 Identification method, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111739060B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040017944A1 (en) * 2002-05-24 2004-01-29 Xiaoging Ding Method for character recognition based on gabor filters
CN109377539A (en) * 2018-11-06 2019-02-22 北京百度网讯科技有限公司 Method and apparatus for generating animation
CN109492551A (en) * 2018-10-25 2019-03-19 腾讯科技(深圳)有限公司 The related system of biopsy method, device and application biopsy method
CN109492550A (en) * 2018-10-25 2019-03-19 腾讯科技(深圳)有限公司 The related system of biopsy method, device and application biopsy method
CN109657533A (en) * 2018-10-27 2019-04-19 深圳市华尊科技股份有限公司 Pedestrian recognition methods and Related product again
CN109697446A (en) * 2018-12-04 2019-04-30 北京字节跳动网络技术有限公司 Image key points extracting method, device, readable storage medium storing program for executing and electronic equipment
CN109840500A (en) * 2019-01-31 2019-06-04 深圳市商汤科技有限公司 A kind of 3 D human body posture information detection method and device
CN110009800A (en) * 2019-03-14 2019-07-12 北京京东尚科信息技术有限公司 A kind of recognition methods and equipment
CN110070010A (en) * 2019-04-10 2019-07-30 武汉大学 A kind of face character correlating method identified again based on pedestrian

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040017944A1 (en) * 2002-05-24 2004-01-29 Xiaoging Ding Method for character recognition based on gabor filters
CN109492551A (en) * 2018-10-25 2019-03-19 腾讯科技(深圳)有限公司 The related system of biopsy method, device and application biopsy method
CN109492550A (en) * 2018-10-25 2019-03-19 腾讯科技(深圳)有限公司 The related system of biopsy method, device and application biopsy method
CN109657533A (en) * 2018-10-27 2019-04-19 深圳市华尊科技股份有限公司 Pedestrian recognition methods and Related product again
CN109377539A (en) * 2018-11-06 2019-02-22 北京百度网讯科技有限公司 Method and apparatus for generating animation
CN109697446A (en) * 2018-12-04 2019-04-30 北京字节跳动网络技术有限公司 Image key points extracting method, device, readable storage medium storing program for executing and electronic equipment
CN109840500A (en) * 2019-01-31 2019-06-04 深圳市商汤科技有限公司 A kind of 3 D human body posture information detection method and device
CN110009800A (en) * 2019-03-14 2019-07-12 北京京东尚科信息技术有限公司 A kind of recognition methods and equipment
CN110070010A (en) * 2019-04-10 2019-07-30 武汉大学 A kind of face character correlating method identified again based on pedestrian

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENGHO HSIN 等: "Affine Invariant Local Features based on Novel Keypoint Detection and Grouping", 《2012 FOURTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE)》, 31 December 2012 (2012-12-31), pages 296 - 301 *
成翔昊 等: "基于关键点的由粗到精三维人脸特征点定位", 《仪器仪表学报》, vol. 39, no. 10, 31 October 2018 (2018-10-31), pages 256 - 264 *

Also Published As

Publication number Publication date
CN111739060B (en) 2024-07-19

Similar Documents

Publication Publication Date Title
Wu et al. FMD-Yolo: An efficient face mask detection method for COVID-19 prevention and control in public
US11200424B2 (en) Space-time memory network for locating target object in video content
US11151725B2 (en) Image salient object segmentation method and apparatus based on reciprocal attention between foreground and background
WO2021248859A1 (en) Video classification method and apparatus, and device, and computer readable storage medium
US20210158023A1 (en) System and Method for Generating Image Landmarks
US11048948B2 (en) System and method for counting objects
CN108197532A (en) The method, apparatus and computer installation of recognition of face
CN111402294A (en) Target tracking method, target tracking device, computer-readable storage medium and computer equipment
CN111104925B (en) Image processing method, image processing apparatus, storage medium, and electronic device
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
CN112749666B (en) Training and action recognition method of action recognition model and related device
CN112818955A (en) Image segmentation method and device, computer equipment and storage medium
CN112802076A (en) Reflection image generation model and training method of reflection removal model
CN113807361A (en) Neural network, target detection method, neural network training method and related products
CN111027555A (en) License plate recognition method and device and electronic equipment
Li et al. Robust foreground segmentation based on two effective background models
Shi et al. Segmentation quality evaluation based on multi-scale convolutional neural networks
CN112906671B (en) Method and device for identifying false face-examination picture, electronic equipment and storage medium
Wang et al. Non-local attention association scheme for online multi-object tracking
CN111401335B (en) Key point detection method and device and storage medium
CN112348011A (en) Vehicle damage assessment method and device and storage medium
CN111739060B (en) Identification method, equipment and storage medium
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN113177483B (en) Video object segmentation method, device, equipment and storage medium
CN114049608A (en) Track monitoring method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant