CN114387612A - Human body weight recognition method and device based on bimodal feature fusion network - Google Patents

Human body weight recognition method and device based on bimodal feature fusion network Download PDF

Info

Publication number
CN114387612A
CN114387612A CN202111407271.5A CN202111407271A CN114387612A CN 114387612 A CN114387612 A CN 114387612A CN 202111407271 A CN202111407271 A CN 202111407271A CN 114387612 A CN114387612 A CN 114387612A
Authority
CN
China
Prior art keywords
human body
feature fusion
fusion network
bimodal
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111407271.5A
Other languages
Chinese (zh)
Inventor
王文
胡顺达
朱世强
宋伟
林哲远
金天磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202111407271.5A priority Critical patent/CN114387612A/en
Publication of CN114387612A publication Critical patent/CN114387612A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body weight recognition method and a human body weight recognition device based on a bimodal feature fusion network, wherein the method comprises the following steps: acquiring a color image of a human body to be identified and other corresponding modal images; inputting the color images and other corresponding modal images into a trained bimodal feature fusion network, and extracting the features of the human body to be recognized; and comparing the characteristics of the human body to be recognized with the characteristics of the human body image library to obtain the recognition result of the human body to be recognized. Aiming at the problem of human body weight recognition, the color image of the human body to be recognized and other corresponding modal images are input into a trained bimodal feature fusion network for feature extraction, and the extracted feature information quantity is richer than the features extracted according to a single modal image, so that the accuracy of the human body weight recognition is higher than that of the human body weight recognition according to the single modal image.

Description

Human body weight recognition method and device based on bimodal feature fusion network
Technical Field
The application relates to the field of computer vision human body weight identification, in particular to a human body weight identification method and device based on a bimodal feature fusion network.
Background
The human body weight recognition technology is a key technology in the field of computer vision, and has wide application prospect and high application value. The technology plays a key role in practical application scenes such as automatic driving, intelligent monitoring, man-machine interaction, intelligent robots and the like. By means of a human body weight recognition technology, the track of the pedestrian can be predicted in automatic driving, so that actions such as avoidance can be performed in advance; in intelligent monitoring, a suspect, a lost child and the like can be quickly retrieved from a large number of videos; in the human-computer interaction, more intelligent interaction can be provided; in the intelligent robot, following of a target person and the like can be realized.
In recent years, with the popularization of deep learning, the human body weight recognition technology has been developed dramatically.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
the existing human body weight recognition model depends on a color image and learns characteristics such as color, texture and the like from the color image, the information content of the characteristics is single, and the existing human body weight recognition model cannot meet the requirement on the accuracy in a complex scene, such as a campus and school uniforms with the same color worn by students. Furthermore, if a suspect escapes, the person often changes clothes to disguise the person, and the existing human body heavy recognition model cannot recognize the target of the changed clothes. Therefore, in the human body re-identification model, besides a single color image feature, how to blend in features of more other modality images to enrich the information content of finally extracted features is an urgent problem to be solved.
Disclosure of Invention
The embodiment of the application aims to provide a human body weight identification method and a human body weight identification device based on a bimodal feature fusion network, so as to solve the technical problem of single feature information quantity in the related technology.
According to a first aspect of the embodiments of the present application, a human body re-identification method based on a bimodal feature fusion network is provided, which includes:
acquiring a color image of a human body to be identified and other corresponding modal images;
inputting the color images and other corresponding modal images into a trained bimodal feature fusion network, and extracting the features of the human body to be recognized;
and comparing the characteristics of the human body to be recognized with the characteristics of the human body image library to obtain the recognition result of the human body to be recognized.
Further, the bimodal feature fusion network comprises:
the color image feature extraction backbone network is used for extracting first features from the color image of the human body to be identified;
the other modality image feature extraction backbone network is used for extracting second features from other modality images of the human body to be identified; and
and the bimodal feature fusion device is used for fusing the first feature and the second feature into the feature of the human body to be recognized.
Further, the training process of the bimodal feature fusion network comprises:
acquiring a training set, wherein the training set is divided into a plurality of subsets, and each subset comprises a color image of a plurality of persons and other corresponding modal images;
inputting one subset into the bimodal feature fusion network, and extracting features of the subset;
classifying people according to the characteristics of the subsets to obtain cross entropy loss;
dividing the characteristics of the subsets into triples to obtain ternary losses;
carrying out weighted summation on the cross entropy loss and the ternary loss to obtain the loss of the subset;
updating the parameters of the bimodal feature fusion network according to the loss of the subset to obtain an updated bimodal feature fusion network;
and sequentially inputting the subsets into the bimodal feature fusion network for the rest subsets, extracting the features of the subsets, and updating the parameters of the bimodal feature fusion network according to the loss of the subsets to obtain the updated bimodal feature fusion network until the loss of the subsets is converged.
Further, the features of the human body image library are obtained by inputting the color images of each pair of human bodies and the corresponding other modal images in the human body image library into the bimodal feature fusion network, wherein the human body image library comprises a plurality of color images of human bodies and corresponding other modal images.
Further, comparing the features of the human body to be recognized with the features of the human body image library to obtain the recognition result of the human body to be recognized, including:
calculating the characteristic distance between the characteristics of the human body to be recognized and the characteristics of the human body image library;
and setting the human body image corresponding to the minimum characteristic distance as the recognition result of the human body to be recognized.
According to a second aspect of the embodiments of the present application, there is provided a human body weight recognition apparatus based on a bimodal feature fusion network, including:
the acquisition module is used for acquiring a color image of a human body to be identified and other corresponding modal images;
the characteristic extraction module is used for inputting the human body color image and the corresponding other modal images into a trained bimodal characteristic fusion network and extracting the characteristics of the human body to be recognized;
and the comparison module is used for comparing the characteristics of the human body to be recognized with the characteristics of the human body image library to obtain the recognition result of the human body to be recognized.
Further, the training process of the bimodal feature fusion network comprises:
acquiring a training set, wherein the training set is divided into a plurality of subsets, and each subset comprises a color image of a plurality of persons and other corresponding modal images;
inputting one subset into the bimodal feature fusion network, and extracting features of the subset;
classifying people according to the characteristics of the subsets to obtain cross entropy loss;
dividing the characteristics of the subsets into triples to obtain ternary losses;
carrying out weighted summation on the cross entropy loss and the ternary loss to obtain the loss of the subset;
updating the parameters of the bimodal feature fusion network according to the loss of the subset to obtain an updated bimodal feature fusion network;
and sequentially inputting the subsets into the bimodal feature fusion network for the rest subsets, extracting the features of the subsets, and updating the parameters of the bimodal feature fusion network according to the loss of the subsets to obtain the updated bimodal feature fusion network until the loss of the subsets is converged.
Further, the comparison module comprises:
the calculation submodule is used for calculating the characteristic distance between the characteristics of the human body to be recognized and the characteristics of the human body image library;
and the setting submodule is used for setting the human body image corresponding to the minimum characteristic distance as the recognition result of the human body to be recognized.
According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium having stored thereon computer instructions, characterized in that the instructions, when executed by a processor, implement the steps of the method according to the first aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
according to the embodiment, aiming at the problem of human body weight recognition, the color image of the human body to be recognized and other corresponding modal images are input into the trained bimodal feature fusion network for feature extraction, and the extracted feature information quantity is richer than the features extracted according to a single modal image; and comparing the extracted features with the features of the human body image library to obtain the recognition result of the human body to be recognized, wherein the accuracy of the human body re-recognition is higher than that of the human body re-recognition performed according to the single-mode image.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating a human body re-recognition method based on a bimodal feature fusion network according to an exemplary embodiment.
FIG. 2 is a schematic diagram illustrating a structure of a bimodal feature fusion network in accordance with an exemplary embodiment.
FIG. 3 is a flowchart illustrating a training process for a bimodal feature fusion network in accordance with an exemplary embodiment.
Fig. 4 is a flowchart illustrating step S13 according to an exemplary embodiment.
Fig. 5 is a block diagram illustrating a human body re-recognition apparatus based on a bimodal feature fusion network according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Fig. 1 is a flowchart illustrating a human body re-recognition method based on a bimodal feature fusion network according to an exemplary embodiment, and as shown in fig. 1, the method may include the following steps:
step S11: acquiring a color image of a human body to be identified and other corresponding modal images;
step S12: inputting the color images and other corresponding modal images into a trained bimodal feature fusion network, and extracting the features of the human body to be recognized;
step S13: and comparing the characteristics of the human body to be recognized with the characteristics of the human body image library to obtain the recognition result of the human body to be recognized.
According to the embodiment, aiming at the problem of human body weight recognition, the color image of the human body to be recognized and other corresponding modal images are input into the trained bimodal feature fusion network for feature extraction, and the extracted feature information quantity is richer than the features extracted according to a single modal image; and comparing the extracted features with the features of the human body image library to obtain the recognition result of the human body to be recognized, wherein the accuracy of the human body re-recognition is higher than that of the human body re-recognition performed according to the single-mode image.
In step S11, acquiring a color image of the human body to be recognized and corresponding other modality images;
specifically, the color image of the human body to be recognized refers to a color RGB image shot by the human body to be recognized, and preferably, the color image may be a color image shot by a mobile phone, a common color camera, an industrial monitoring camera, and the like; the other modality image refers to any other modality image except the color image, and can be obtained through hardware equipment, or can be generated from the color image through software.
In a preferred example, a depth camera with a color image function may be used to capture a color image and a depth image simultaneously, where the depth image is used as the other modality image; or a color camera with an infrared function can be adopted to shoot and acquire the color image and the infrared image at the same time, and the infrared image is taken as the image of the other modes.
In another preferred example, a color image is acquired, and a human body contour map is generated from the color image as the other-mode image or a gray scale map is generated as the other-mode image in a software manner.
In step S12, inputting the color image and the corresponding other modality images into a trained bimodal feature fusion network, and extracting features of the human body to be recognized;
specifically, the bimodal feature fusion network includes: the system comprises a color image feature extraction backbone network, other modal image feature extraction backbone networks and a bimodal feature fusion device, wherein the color image feature extraction backbone network is used for extracting a first feature from a color image of a human body to be identified; the other modality image feature extraction backbone network is used for extracting second features from other modality images of the human body to be identified; the bimodal feature fusion device is used for fusing the first feature and the second feature into the feature of the human body to be recognized.
Preferably, in the embodiment of the present application, as shown in fig. 2, the color image feature extraction backbone network and the other modality image feature extraction backbone network both use the ResNet50 model, and the model is input as an image with a resolution of 256 × 128 pixels and output as a feature vector with a dimension of 2048. The rectangular parallelepiped block in fig. 2 represents the convolutional layer in the backbone network. In a preferred example, the output of each convolution layer of the color image feature extraction backbone network and the other modal image feature extraction backbone network can be fused by a dual-modal feature fusion device, and the fused features are input to the next convolution layer of the color image feature extraction backbone network, so that semantic information with different abstraction degrees can be fully utilized, and the final human body features have more expression capability and distinguishing capability.
Specifically, as shown in fig. 3, the training process of the bimodal feature fusion network includes:
step S21: acquiring a training set, wherein the training set is divided into a plurality of subsets, and each subset comprises a color image of a plurality of persons and other corresponding modal images;
specifically, the acquisition mode of the training set is the same as the acquisition mode of the color image of the human body to be recognized and the corresponding other modality images in step S11; the dividing of the training set into a plurality of subsets means that all color images and corresponding other modal images in the training set are evenly divided into a plurality of subsets of a certain size, wherein the subsets are composed of a plurality of anchor point images, positive sample images and negative sample images, the anchor point images are images randomly selected from the training set, the positive sample images are images belonging to the same person as the anchor point images, and the negative sample images are images not belonging to the same person as the anchor point images. The size of the subset is determined by the circumstances, generally speaking, the larger the subset is, the shorter the time consumption of network training is, but the higher the requirement on hardware, especially on data storage space is; the smaller the subset, the longer the time consuming network training, but the lower the requirements on hardware, especially data storage space. The advantage of dividing the training set into subsets is that the subset size can be flexibly set according to the limitations of existing hardware devices.
Step S22: inputting one subset into the bimodal feature fusion network, and extracting features of the subset;
specifically, before this step, data enhancement processing may be performed on the images in the subset, and the enhanced subset is input into the bimodal feature fusion network for feature extraction, so that the degree of overfitting of the network to a training set can be reduced, and the re-recognition accuracy is improved. Preferably, the image data is enhanced by using enhancement techniques such as random flipping and random cropping.
Step S23: classifying people according to the characteristics of the subsets to obtain cross entropy loss;
in particular, a cross entropy loss function
Figure BDA0003373179240000081
The formula of (1) is:
Figure BDA0003373179240000082
wherein, N represents the image logarithm of the color image and other corresponding modal images in a subset, g represents the one-hot coded label of the image sample, W and b represent the weight parameter and the bias parameter of the last full-connected layer of the bimodal feature fusion network respectively, and f represents the feature vector extracted by the bimodal feature fusion network.
Step S24: dividing the characteristics of the subsets into triples to obtain ternary losses;
in particular, a ternary loss function
Figure BDA0003373179240000083
The formula of (1) is:
Figure BDA0003373179240000084
wherein a, p, n respectively represent an anchor point image, a positive sample image and a negative sample image in the triplet, fa,fp,fnRepresenting the features of the anchor image, the positive sample image and the negative sample image, respectively, the function d (-) calculates the distance between the two feature vectors, and m represents the distance threshold parameter of the ternary loss function.
Step S25: carrying out weighted summation on the cross entropy loss and the ternary loss to obtain the loss of the subset;
in particular, the loss of said subset
Figure BDA0003373179240000091
The formula of (1) is:
Figure BDA0003373179240000092
wherein, λ represents the weighting coefficient of weighted summation, and λ is more than or equal to 0 and less than or equal to 1. The larger the λ, the more the network is concerned with ternary losses; conversely, the network will be more concerned with cross-entropy loss. Preferably, λ is 0.5.
Step S26: updating the parameters of the bimodal feature fusion network according to the loss of the subset to obtain an updated bimodal feature fusion network;
step S27: and sequentially inputting the subsets into the bimodal feature fusion network for the rest subsets, extracting the features of the subsets, and updating the parameters of the bimodal feature fusion network according to the loss of the subsets to obtain the updated bimodal feature fusion network until the loss of the subsets is converged.
In the specific implementation of step S26 and step S27, parameters of the bimodal feature fusion network may be updated by using Adam, SGD, and other optimization methods, and the model trained in step S27, that is, the model with the input subset loss convergence, may be accelerated by model acceleration methods such as model pruning and quantization, and finally deployed into an actual production environment for application.
In step S13, the features of the human body to be recognized are compared with the features of the human body image library to obtain the recognition result of the human body to be recognized.
Specifically, the features of the human body image library are obtained by inputting the color images of each pair of human bodies and the corresponding other modal images in the human body image library into the bimodal feature fusion network, wherein the human body image library comprises a plurality of color images of human bodies and corresponding other modal images.
In a specific implementation, the human body image library is composed of a color image of a human body and corresponding other modality images, and the number of people included in the human body image library is not limited, and the human body image library may be a single person or a plurality of persons. In a preferred example, such as screening for suspects from surveillance video, the image library may consist of a single suspects or multiple suspects, i.e., multiple suspects may be screened simultaneously.
Specifically, as shown in fig. 4, this step includes the following sub-steps:
step S31: calculating the characteristic distance between the characteristics of the human body to be recognized and the characteristics of the human body image library;
step S32: taking the human body image corresponding to the minimum characteristic distance as the recognition result of the human body to be recognized;
in the specific implementation of step S31-step S32, the feature distance represents the similarity between two feature vectors, i.e., the similarity between two human bodies, and the smaller the distance value, the greater the similarity, and the greater the possibility that two human bodies belong to the same person. Preferably, the characteristic distance may be calculated using a euclidean distance or a cosine distance.
Corresponding to the embodiment of the human body weight recognition method based on the bimodal feature fusion network, the application also provides an embodiment of a human body weight recognition device based on the bimodal feature fusion network.
Fig. 5 is a block diagram illustrating a human body re-recognition apparatus based on a bimodal feature fusion network according to an exemplary embodiment. Referring to fig. 5, the apparatus may include:
the acquisition module 21 is configured to acquire a color image of a human body to be identified and other corresponding modality images;
the feature extraction module 22 is configured to input the human body color image and the corresponding other modal images into a trained bimodal feature fusion network, and extract features of the human body to be recognized;
the comparison module 23 is configured to compare the features of the human body to be recognized with the features of the human body image library to obtain a recognition result of the human body to be recognized;
in particular, the module may comprise the following sub-modules:
the calculation submodule is used for calculating the characteristic distance between the characteristics of the human body to be recognized and the characteristics of the human body image library;
and the setting submodule is used for setting the human body image corresponding to the minimum characteristic distance as the recognition result of the human body to be recognized.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
Correspondingly, the present application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method for human body re-identification based on a bimodal feature fusion network as described above.
Accordingly, the present application also provides a computer readable storage medium, on which computer instructions are stored, wherein the instructions, when executed by a processor, implement the human body weight recognition method based on the bimodal feature fusion network as described above.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A human body weight recognition method based on a bimodal feature fusion network is characterized by comprising the following steps:
acquiring a color image of a human body to be identified and other corresponding modal images;
inputting the color images and other corresponding modal images into a trained bimodal feature fusion network, and extracting the features of the human body to be recognized;
and comparing the characteristics of the human body to be recognized with the characteristics of the human body image library to obtain the recognition result of the human body to be recognized.
2. The method of claim 1, wherein the bimodal feature fusion network comprises:
the color image feature extraction backbone network is used for extracting first features from the color image of the human body to be identified;
the other modality image feature extraction backbone network is used for extracting second features from other modality images of the human body to be identified; and
and the bimodal feature fusion device is used for fusing the first feature and the second feature into the feature of the human body to be recognized.
3. The method of claim 1, wherein the training process of the bimodal feature fusion network comprises:
acquiring a training set, wherein the training set is divided into a plurality of subsets, and each subset comprises a color image of a plurality of persons and other corresponding modal images;
inputting one subset into the bimodal feature fusion network, and extracting features of the subset;
classifying people according to the characteristics of the subsets to obtain cross entropy loss;
dividing the characteristics of the subsets into triples to obtain ternary losses;
carrying out weighted summation on the cross entropy loss and the ternary loss to obtain the loss of the subset;
updating the parameters of the bimodal feature fusion network according to the loss of the subset to obtain an updated bimodal feature fusion network;
and sequentially inputting the subsets into the bimodal feature fusion network for the rest subsets, extracting the features of the subsets, and updating the parameters of the bimodal feature fusion network according to the loss of the subsets to obtain the updated bimodal feature fusion network until the loss of the subsets is converged.
4. The method according to claim 1, wherein the features of the human body image library are obtained by inputting color images and corresponding other modality images of each pair of human bodies in the human body image library into the bimodal feature fusion network, wherein the human body image library comprises color images and corresponding other modality images of a plurality of human bodies.
5. The method according to claim 1, wherein comparing the features of the human body to be recognized with the features of the human body image library to obtain the recognition result of the human body to be recognized comprises:
calculating the characteristic distance between the characteristics of the human body to be recognized and the characteristics of the human body image library;
and setting the human body image corresponding to the minimum characteristic distance as the recognition result of the human body to be recognized.
6. A human body weight recognition device based on a bimodal feature fusion network is characterized by comprising:
the acquisition module is used for acquiring a color image of a human body to be identified and other corresponding modal images;
the characteristic extraction module is used for inputting the human body color image and the corresponding other modal images into a trained bimodal characteristic fusion network and extracting the characteristics of the human body to be recognized;
and the comparison module is used for comparing the characteristics of the human body to be recognized with the characteristics of the human body image library to obtain the recognition result of the human body to be recognized.
7. The apparatus of claim 6, wherein the training process of the bimodal feature fusion network comprises:
acquiring a training set, wherein the training set is divided into a plurality of subsets, and each subset comprises a color image of a plurality of persons and other corresponding modal images;
inputting one subset into the bimodal feature fusion network, and extracting features of the subset;
classifying people according to the characteristics of the subsets to obtain cross entropy loss;
dividing the characteristics of the subsets into triples to obtain ternary losses;
carrying out weighted summation on the cross entropy loss and the ternary loss to obtain the loss of the subset;
updating the parameters of the bimodal feature fusion network according to the loss of the subset to obtain an updated bimodal feature fusion network;
and sequentially inputting the subsets into the bimodal feature fusion network for the rest subsets, extracting the features of the subsets, and updating the parameters of the bimodal feature fusion network according to the loss of the subsets to obtain the updated bimodal feature fusion network until the loss of the subsets is converged.
8. The apparatus of claim 6, wherein the comparison module comprises:
the calculation submodule is used for calculating the characteristic distance between the characteristics of the human body to be recognized and the characteristics of the human body image library;
and the setting submodule is used for setting the human body image corresponding to the minimum characteristic distance as the recognition result of the human body to be recognized.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.
10. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method according to any one of claims 1-5.
CN202111407271.5A 2021-11-24 2021-11-24 Human body weight recognition method and device based on bimodal feature fusion network Pending CN114387612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111407271.5A CN114387612A (en) 2021-11-24 2021-11-24 Human body weight recognition method and device based on bimodal feature fusion network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111407271.5A CN114387612A (en) 2021-11-24 2021-11-24 Human body weight recognition method and device based on bimodal feature fusion network

Publications (1)

Publication Number Publication Date
CN114387612A true CN114387612A (en) 2022-04-22

Family

ID=81195492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111407271.5A Pending CN114387612A (en) 2021-11-24 2021-11-24 Human body weight recognition method and device based on bimodal feature fusion network

Country Status (1)

Country Link
CN (1) CN114387612A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223018A (en) * 2022-06-08 2022-10-21 东北石油大学 Cooperative detection method and device for disguised object, electronic device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115223018A (en) * 2022-06-08 2022-10-21 东北石油大学 Cooperative detection method and device for disguised object, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN108960080B (en) Face recognition method based on active defense image anti-attack
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN111539370A (en) Image pedestrian re-identification method and system based on multi-attention joint learning
CN109598268A (en) A kind of RGB-D well-marked target detection method based on single flow depth degree network
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN112734775A (en) Image annotation, image semantic segmentation and model training method and device
CN109871821B (en) Pedestrian re-identification method, device, equipment and storage medium of self-adaptive network
CN110110755B (en) Pedestrian re-identification detection method and device based on PTGAN region difference and multiple branches
JP2016057918A (en) Image processing device, image processing method, and program
CN111582126B (en) Pedestrian re-recognition method based on multi-scale pedestrian contour segmentation fusion
CN111985385A (en) Behavior detection method, device and equipment
CN112348117A (en) Scene recognition method and device, computer equipment and storage medium
CN111931686B (en) Video satellite target tracking method based on background knowledge enhancement
CN113033523B (en) Method and system for constructing falling judgment model and falling judgment method and system
CN110765841A (en) Group pedestrian re-identification system and terminal based on mixed attention mechanism
CN111709471A (en) Object detection model training method and object detection method and device
CN116052218B (en) Pedestrian re-identification method
CN115830531A (en) Pedestrian re-identification method based on residual multi-channel attention multi-feature fusion
KR20150065370A (en) Apparatus and method for recognizing human actions
CN114387612A (en) Human body weight recognition method and device based on bimodal feature fusion network
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
CN110334703B (en) Ship detection and identification method in day and night image
CN111259701B (en) Pedestrian re-identification method and device and electronic equipment
CN115527168A (en) Pedestrian re-identification method, storage medium, database editing method, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination