CN110516603B

CN110516603B - Information processing method and device

Info

Publication number: CN110516603B
Application number: CN201910799755.5A
Authority: CN
Inventors: 王健; 王之港; 孙昊; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2022-03-18
Anticipated expiration: 2039-08-28
Also published as: CN110516603A

Abstract

The embodiment of the application discloses an information processing method and device. One embodiment of the method comprises: acquiring a target human body image, and inputting the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of each component; and obtaining human body characteristics of the human body included in the target human body image output from the human body detection model, wherein the human body characteristics of the human body comprise a characteristic matrix of the human body included in the target human body image output from the human body detection network and a component characteristic matrix composed of component characteristics of each component of the human body output from the component segmentation network. The method provided by the embodiment of the application can fully acquire the characteristics corresponding to each part, and avoids neglecting the detailed characteristics of the small parts, so that the recall rate and the accuracy rate of human body detection are improved.

Description

Information processing method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of internet, and particularly relates to an information processing method and device.

Background

With the development of image detection technology, human body detection technology is more and more widely applied. In the process of detecting the human body, the head and the limbs can be comprehensively detected.

In the related art, a deep neural network is often used to detect an image. When the result is predicted by using the deep neural network, the detection precision can be improved by adopting modes of horizontally cutting the image into sub-images and the like.

Disclosure of Invention

The embodiment of the application provides an information processing method and device.

In a first aspect, an embodiment of the present application provides an information processing method, including: acquiring a target human body image, and inputting the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of each component; and obtaining human body characteristics of the human body contained in the target human body image output from the human body detection model, wherein the human body characteristics of the human body comprise a characteristic matrix of the human body contained in the target human body image output from the human body detection network and a component characteristic matrix consisting of component characteristics of each component of the human body output from the component segmentation network.

In some embodiments, obtaining the human body features of the human body included in the target human body image output from the human body detection model includes: a mask matrix of the human body output from the component segmentation network is obtained, wherein the mask matrix includes component features of individual components of the human body.

In some embodiments, the method further comprises: the following characteristic processing steps are performed: transposing the component feature matrix to obtain a transposed matrix; determining a result of matrix multiplication between the feature matrix and the transposed matrix as a first result; performing bitwise multiplication on the two feature matrixes to obtain a bitwise multiplication result, and determining a result of the bitwise multiplication result and the matrix multiplication of the transposed matrix as a second result; based on the first result and the second result, a human body is identified.

In some embodiments, identifying the human body based on the first result and the second result comprises: splicing the first result and the second result to obtain a splicing result; and determining whether the human body contained in the target human body image and the human body contained in the specified human body image indicate the same person or not based on the similarity of the splicing result and the features of the human body contained in the specified human body image.

In some embodiments, the training step of the human detection model comprises: acquiring a human body image sample, and acquiring a characteristic matrix of the human body image sample by using a human body detection network; acquiring a component characteristic matrix of a human body image sample by utilizing a component segmentation network; executing a characteristic processing step on the characteristic matrix and the component characteristic matrix of the human body image sample to obtain a first result and a second result corresponding to the human body image sample; and training an initial human body detection model based on the first result and the second result corresponding to the human body image sample.

In some embodiments, training the initial human detection model based on the first result and the second result corresponding to the human image sample includes the following steps of generating a prediction result: respectively indicating the component characteristics of each component of the human body sample in a first result corresponding to the human body image sample containing the human body sample, inputting the first result into a full connection layer and a classification layer of an initial human body detection network for processing, and obtaining a first prediction result of the human body sample; and respectively indicating the component characteristics of each component of the human body sample in a second result corresponding to the human body image sample containing the human body sample, inputting the second result into a full connection layer and a classification layer of the initial component segmentation network, and processing to obtain a second prediction result of the human body sample.

In some embodiments, training the initial human detection model based on the first result and the second result corresponding to the human image sample further includes: respectively carrying out the following operations on the initial human body detection network and the initial component segmentation network: determining a loss value of the human body sample according to a preset loss function, a prediction result of the human body sample and a preset mark, and training based on the loss value, wherein the preset loss function is associated with a classification learning target; and performing measurement learning according to the prediction result, the preset label and the preset measurement function of the human body sample, so that the similarity of the characteristics of the same human body determined by the learned human body detection model is greater than the similarity of the characteristics of different human bodies.

In a second aspect, an embodiment of the present application provides an information processing apparatus, including: an acquisition unit configured to acquire a target human body image, input the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of each component; an output unit configured to obtain human body characteristics of a human body included in a target human body image output from the human body detection model, wherein the human body characteristics of the human body include a characteristic matrix of the human body included in the target human body image output from the human body detection network, and a component characteristic matrix composed of component characteristics of respective components of the human body output from the component segmentation network.

In some embodiments, the output unit is further configured to: a mask matrix of the human body output from the component segmentation network is obtained, wherein the mask matrix includes component features of individual components of the human body.

In some embodiments, the apparatus further comprises: a processing unit configured to perform the following characteristic processing steps: transposing the component feature matrix to obtain a transposed matrix; determining a result of matrix multiplication between the feature matrix and the transposed matrix as a first result; performing bitwise multiplication on the two feature matrixes to obtain a bitwise multiplication result, and determining a result of the bitwise multiplication result and the matrix multiplication of the transposed matrix as a second result; an identification unit configured to identify the human body based on the first result and the second result.

In some embodiments, the identification unit comprises: the splicing module is configured to splice the first result and the second result to obtain a splicing result; and the determining module is configured to determine whether the human body contained in the target human body image and the human body contained in the specified human body image indicate the same person or not based on the similarity of the splicing result and the features of the human body contained in the specified human body image.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of the information processing method.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements a method as in any one of the embodiments of the information processing method.

According to the information processing scheme provided by the embodiment of the application, firstly, a target human body image is obtained, and the target human body image is input into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of all components. And then obtaining human body characteristics of the human body contained in the target human body image output from the human body detection model, wherein the human body characteristics of the human body comprise a characteristic matrix of the human body contained in the target human body image output from the human body detection network and a component characteristic matrix formed by component characteristics of each component of the human body output from the component segmentation network. The scheme provided by the embodiment of the application can fully acquire the characteristics corresponding to all the parts, and avoids neglecting the detailed characteristics of the small parts, so that the recall rate and the accuracy rate of human body detection are improved. In addition, the embodiment of the application can respectively determine the overall characteristics of the human body and the component characteristics of each component by utilizing the human body detection network and the component segmentation network, so that the relation among the components can be acquired while paying attention to the independent detailed characteristics of each component, and the output characteristics are very detailed and accurate.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of an information processing method according to the present application;

FIG. 3 is a schematic diagram of an application scenario of an information processing method according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of an information processing method according to the present application;

FIG. 5 is a schematic block diagram of one embodiment of an information processing apparatus according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the information processing method or information processing apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an information processing application, a video application, a live application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server may analyze and perform other processing on the received data such as the target human body image, and feed back a processing result (e.g., human body characteristics) to the terminal device.

It should be noted that the information processing method provided in the embodiment of the present application may be executed by the server 105 or the

terminal devices

101, 102, and 103, and accordingly, the information processing apparatus may be provided in the server 105 or the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of an information processing method according to the present application is shown. The information processing method comprises the following steps:

step 201, obtaining a target human body image, inputting the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of each component.

In this embodiment, an execution subject of the information processing method (for example, a server or a terminal device shown in fig. 1) may acquire a target human body image from a local or other electronic device and input the image into the human body detection network and the component segmentation network in the human body detection model, respectively. Specifically, the human body features include component features of respective components of the human body included in the target human body image. The parts herein may refer to predetermined parts of the human body, such as arms, legs, face, head, etc. A component feature is a feature that characterizes a component. In some cases, the output human body characteristics may include not only component characteristics but also overall characteristics of the human body and the like.

The human body detection model may be a deep neural network, such as a convolutional neural network or a deep residual error network, and the like, and may include a human body detection network and a component segmentation network. Specifically, various parts of the human body can be detected by using the human body detection model, for example, the human body detection model can output the positions of the various parts in the target human body image. The output human body features can be output by the convolution layer of the human body detection model, and can also be output by the full connection layer of the human body detection model.

The human detection network can be used for extracting image features and can comprise convolutional layers and the like. The component segmentation network can be an example segmentation algorithm (such as Mask candidate area convolution neural network Mask-R-CNN) or a semantic segmentation algorithm. The component segmentation network may extract component features of each component of the human body included in the image, and may include convolutional layers and the like.

Step 202, obtaining human body characteristics of a human body included in a target human body image output from the human body detection model, wherein the human body characteristics of the human body include a human body characteristic matrix included in the target human body image output from the human body detection network and a component characteristic matrix composed of component characteristics of each component of the human body output from the component segmentation network.

In this embodiment, the execution subject may obtain the human body feature output from the human body detection model. The resulting human features may be represented in the form of a matrix. Specifically, the feature matrix, i.e., the feature map, may be represented as a three-dimensional matrix (C, H × W), where C is the dimension of the feature and H, W are the height and width of the feature map, respectively. The component features of each of the components described above may be represented as a row of features or a column of features. Accordingly, the component characteristics of the individual components may constitute a component characteristic matrix.

In some optional implementations of this embodiment, step 202 may include: a mask matrix of the human body output from the component segmentation network is obtained, wherein the mask matrix includes component features of individual components of the human body.

In these alternative implementations, the execution body may derive a mask matrix of the component segmentation network output. That is, the feature matrix may be represented as a mask matrix. In the mask matrix, a feature of a feature may appear as a feature map in which only the feature of the feature is present, and areas in which other features than the feature are located are occluded. Specifically, the mask matrix may be represented as (N, H W), where N is the number of human body parts and H, W is the height and width of the mask matrix, respectively. Here H, W may be the same as H, W for the feature matrix of the human detection network output.

The implementation modes can utilize the mask characteristics of each part to form a mask matrix, so that when the characteristics of each part are expressed, the influence of the characteristics of other parts on the expression of the characteristics of the part can be maximally reduced, and the characteristics of each part can be more accurately expressed.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the information processing method according to the present embodiment. In the application scenario of fig. 3, the execution main body 301 may obtain a target human body image 302, input the target human body image 302 into a human body detection network 3031 and a component segmentation network 3032 in a human body detection model 303 trained in advance, and obtain human body features 304 of a human body included in the target human body image 302 output from the human body detection model 303, where the human body features of the human body include a feature matrix of the human body included in the target human body image output from the human body detection network 3031, and a component feature matrix composed of component features of various components of the human body output from the component segmentation network 3032, and the human body detection model 303 is used for extracting human body features including the component features of the various components.

The method provided by the embodiment of the application can fully acquire the characteristics corresponding to each part, and avoids neglecting the detailed characteristics of the small parts, so that the recall rate and the accuracy rate of human body detection are improved. In addition, in the embodiment, the human body detection network and the component segmentation network can be utilized to respectively determine the overall characteristics of the human body and the component characteristics of each component, so that the relation among the components can be acquired while paying attention to the independent detailed characteristics of each component, and the output characteristics are very detailed and accurate.

With further reference to FIG. 4, a flow 400 of yet another embodiment of an information processing method is shown. The flow 400 of the information processing method includes the following steps:

step 401, obtaining a target human body image, and inputting the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of each component.

Step 402, obtaining human body characteristics of a human body included in a target human body image output from the human body detection model, wherein the human body characteristics of the human body include a human body characteristic matrix included in the target human body image output from the human body detection network and a component characteristic matrix composed of component characteristics of each component of the human body output from the component segmentation network.

In this embodiment, the execution subject may obtain the human body feature output from the human body detection model. The resulting human features may be represented in the form of a matrix. Specifically, the feature matrix, i.e., the feature map, may be represented as a three-dimensional matrix (C, H × W), where C is the dimension of the feature and H, W are the height and width of the feature map, respectively.

Step 403, performing the following feature processing steps: step 4031, transpose the feature matrix of the part to get a transposed matrix; step 4032, determine the result of the matrix multiplication between the feature matrix and the transposed matrix as the first result; step 4033, carry on the multiplication according to bit to two feature matrixes, get the multiplication result according to bit, confirm the result of the multiplication according to bit and matrix multiplication of the transpose matrix as the second result.

In this embodiment, the executing body may transpose the component feature matrix to obtain a transposed matrix. And determining a result of matrix multiplication of the characteristic matrix output by the human body detection network and the transposed matrix as a first result. In particular, each column of features in the first result may be a component feature of one component.

The execution body may perform a bit-wise multiplication of two of the feature matrices, perform a matrix multiplication of a result of the bit-wise multiplication and the transpose matrix, and take the result as a second result. In particular, each column of features in the second result may be a component feature of one component.

In practice, both the first result and the second result may be expressed as (C, N), where C is the feature dimension and N is the number of components.

Based on the first result and the second result, a human body is identified, step 404.

In this embodiment, the execution body may perform human body recognition based on the first result and the second result in various ways. For example, the execution subject may determine a similarity between the first result and a feature of a human body included in the designated human body image, and determine a similarity between the second result and the feature of the human body included in the designated human body image. If both the similarities are greater than the similarity threshold, it may be determined that the human body included in the target human body image indicates the same person as the human body included in the specified human body image.

In some optional implementations of this embodiment, step 404 may include: splicing the first result and the second result to obtain a splicing result; and determining whether the human body contained in the target human body image and the human body contained in the specified human body image indicate the same person or not based on the similarity of the splicing result and the features of the human body contained in the specified human body image.

In these alternative implementations, the first result and the second result are both matrices, so the execution body may splice the first result and the second result to obtain a spliced result. And then, the splicing result is used as the characteristic of the target human body image, and the characteristic is compared with the characteristic of the human body contained in the specified human body image so as to carry out identification. Specifically, the executing agent may determine that the human body included in the target human body image and the human body included in the designated human body image indicate the same person when the similarity between the features of the human body included in the target human body image and the features of the human body included in the designated human body image is greater than a similarity threshold. In addition, the execution main body may determine that the human body included in the target human body image and the human body included in the specified human body image indicate the same person when the similarity between the features of the human body included in the target human body image in the human body image set and the features of the human body included in the specified human body image is greater than the similarity between the features of the human body included in other human body images in the human body image set and the features of the human body included in the specified human body image.

The implementation modes can enable the characteristics of the first result and the second result to be fully fused through splicing so as to obtain accurate image characteristics, and therefore the accuracy of human body recognition is improved.

In some optional implementations of this embodiment, the training step of the human detection model includes: acquiring a human body image sample, and acquiring a characteristic matrix of the human body image sample by using a human body detection network; acquiring a component characteristic matrix of a human body image sample by utilizing a component segmentation network; executing the characteristic processing step on the characteristic matrix and the component characteristic matrix of the human body image sample to obtain a first result and a second result corresponding to the human body image sample; and training an initial human body detection model based on the first result and the second result corresponding to the human body image sample.

In these optional implementation manners, the execution subject may obtain a human body image sample in the training sample set, and obtain a feature matrix and a component feature matrix of the human body image sample by respectively using a human body detection network and a component segmentation network in the human body detection model. Then, the executing body may execute the feature processing step to obtain a first result and a second result, and train an initial human detection model. The initial human detection model herein refers to a human detection model to be trained.

The execution subject may train the initial human detection model based on the first result and the second result of the human image sample in various ways. For example, the execution body may splice the results of the first result and the second result, and input the spliced result into the convolutional layer to further extract features. The execution subject may then input the further extracted features into a full link layer, etc., to obtain a prediction result of the human detection model. In this way, the execution subject may determine a loss value using the prediction result, and train an initial human detection model using the loss value.

The realization modes can be trained by utilizing the first result and the second result, so that the training is carried out through the detailed characteristics of the fully fused human body overall characteristics and component characteristics, and the recall rate and the accuracy rate of the human body detection model obtained through training are improved.

In some optional application scenarios of these optional implementations, the training of the initial human detection model based on the first result and the second result corresponding to the human image sample may include the following steps of generating a prediction result: respectively indicating the component characteristics of each component of the human body sample in a first result corresponding to the human body image sample containing the human body sample, inputting the first result into a full connection layer and a classification layer of an initial human body detection network for processing, and obtaining a first prediction result of the human body sample; and respectively indicating the component characteristics of each component of the human body sample in a second result corresponding to the human body image sample containing the human body sample, inputting the second result into a full connection layer and a classification layer of the initial component segmentation network, and processing to obtain a second prediction result of the human body sample.

In these optional application scenarios, the execution subject may input the component features of each component in the first result into the fully-connected layer and the classification layer for processing, and input the component features of each component in the second result into the fully-connected layer and the classification layer for processing (for example, the classification layer may include a softmax function), so as to implement independent training on each component. In this way, the full connection layer can obtain the processing results corresponding to each component, respectively. The execution body may generate the prediction result of each component based on the processing result of the full connection layer and the classification layer in various ways. The prediction result may specifically be an identity of an identified person. Thereafter, the execution subject or other electronic device may be trained using the first prediction result and the second prediction result, respectively. For example, the execution subject or other electronic device may train the initial human detection network using the first prediction result and train the initial component segmentation network using the second prediction result, and/or train the initial component segmentation network using the first prediction result and train the initial human detection network using the second prediction result. The initial human body detection network and the initial component segmentation network are respectively a human body detection network and a component segmentation network to be trained. Usually, the first prediction result and the second prediction result are used for training respectively, and in some cases, the first prediction result and the second prediction result can be combined for training.

The human body detection model in the application scenes can utilize the component characteristics of each component to generate an accurate prediction result, so that the human body detection model can accurately predict the component characteristics of each component.

Optionally, training the initial human detection model based on the first result and the second result corresponding to the human image sample, which may further include: respectively carrying out the following operations on the initial human body detection network and the initial component segmentation network: determining a loss value of the human body sample according to a preset loss function, a prediction result of the human body sample and a preset mark, and training based on the loss value, wherein the preset loss function is associated with a classification learning target; and performing measurement learning according to the prediction result, the preset label and the preset measurement function of the human body sample, so that the similarity of the characteristics of the same human body determined by the learned human body detection model is greater than the similarity of the characteristics of different human bodies.

Specifically, the execution subject may determine the loss value of the human body sample by using the prediction result of each component and a preset label obtained by labeling the human body in advance. And training, such as back propagation, in the initial human body detection model using the loss value to respectively train the human body detection network and the component segmentation network in the initial human body detection model. In addition, the execution main body can also utilize a preset measurement function to enable the human body detection network and the component segmentation network to respectively perform measurement learning, so that the similarity of the features of the same human body extracted by the human body detection network and the component segmentation network after the measurement learning is greater than the similarity of the features of different human bodies.

These alternative approaches may combine training with loss values and metric learning to improve training efficiency and training accuracy.

The embodiment can fuse the overall characteristics of the human body with the characteristics of the parts by calculating the first result and the second result, thereby obtaining more accurate human body characteristics and increasing the accuracy of human body identification.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an information processing apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the information processing apparatus 500 of the present embodiment includes: an acquisition unit 501 and an output unit 502. The acquiring unit 501 is configured to acquire a target human body image, and input the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, where the human body detection model is used to extract human body features including component features of each component. An output unit 502 configured to obtain human body characteristics of a human body included in a target human body image output from the human body detection model, wherein the human body characteristics of the human body include a characteristic matrix of the human body included in the target human body image output from the human body detection network and a component characteristic matrix composed of component characteristics of respective components of the human body output from the component segmentation network.

In some embodiments, the acquisition unit 501 of the information processing apparatus 500 may acquire a target human body image from a local or other electronic device and input the image into the human body detection network and the component segmentation network in the human body detection model, respectively. Specifically, the human body features include component features of respective components of the human body included in the target human body image.

In some embodiments, the output unit 502 may obtain the human body feature output from the human body detection model. The resulting human features may be represented in the form of a matrix. Specifically, the feature matrix, i.e., the feature map, may be represented as a three-dimensional matrix (C, H × W), where C is the dimension of the feature and H, W are the height and width of the feature map, respectively. The component features of each of the components described above may be represented as a row of features or a column of features. Accordingly, the component characteristics of the individual components may constitute a component characteristic matrix.

In some optional implementations of this embodiment, the output unit is further configured to: a mask matrix of the human body output from the component segmentation network is obtained, wherein the mask matrix includes component features of individual components of the human body.

In some optional implementations of this embodiment, the apparatus further includes: a processing unit configured to perform the following characteristic processing steps: transposing the component feature matrix to obtain a transposed matrix; determining a result of matrix multiplication between the feature matrix and the transposed matrix as a first result; performing bitwise multiplication on the two feature matrixes to obtain a bitwise multiplication result, and determining a result of the bitwise multiplication result and the matrix multiplication of the transposed matrix as a second result; an identification unit configured to identify the human body based on the first result and the second result.

In some optional implementations of this embodiment, the identifying unit includes: the splicing module is configured to splice the first result and the second result to obtain a splicing result; and the determining module is configured to determine whether the human body contained in the target human body image and the human body contained in the specified human body image indicate the same person or not based on the similarity of the splicing result and the features of the human body contained in the specified human body image.

In some optional implementations of this embodiment, the training step of the human detection model includes: acquiring a human body image sample, and acquiring a characteristic matrix of the human body image sample by using a human body detection network; acquiring a component characteristic matrix of a human body image sample by utilizing a component segmentation network; executing a characteristic processing step on the characteristic matrix and the component characteristic matrix of the human body image sample to obtain a first result and a second result corresponding to the human body image sample; and training an initial human body detection model based on the first result and the second result corresponding to the human body image sample.

In some optional implementation manners of this embodiment, training the initial human detection model based on the first result and the second result corresponding to the human image sample includes the following steps of generating a prediction result: respectively indicating the component characteristics of each component of the human body sample in a first result corresponding to the human body image sample containing the human body sample, inputting the first result into a full connection layer and a classification layer of an initial human body detection network for processing, and obtaining a first prediction result of the human body sample; and respectively indicating the component characteristics of each component of the human body sample in a second result corresponding to the human body image sample containing the human body sample, inputting the second result into a full connection layer and a classification layer of the initial component segmentation network, and processing to obtain a second prediction result of the human body sample.

In some optional implementation manners of this embodiment, training the initial human detection model based on the first result and the second result corresponding to the human image sample further includes: respectively carrying out the following operations on the initial human body detection network and the initial component segmentation network: determining a loss value of the human body sample according to a preset loss function, a prediction result of the human body sample and a preset mark, and training based on the loss value, wherein the preset loss function is associated with a classification learning target; and performing measurement learning according to the prediction result, the preset label and the preset measurement function of the human body sample, so that the similarity of the characteristics of the same human body determined by the learned human body detection model is greater than the similarity of the characteristics of different human bodies.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and an output unit. The names of these units do not form a limitation on the units themselves in some cases, and for example, the acquiring unit may also be described as "a unit that acquires a target human body image, inputs the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a target human body image, and inputting the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of each component; and obtaining human body characteristics of the human body contained in the target human body image output from the human body detection model, wherein the human body characteristics of the human body comprise a characteristic matrix of the human body contained in the target human body image output from the human body detection network and a component characteristic matrix consisting of component characteristics of each component of the human body output from the component segmentation network.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. An information processing method comprising:

acquiring a target human body image, and inputting the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of each component;

obtaining human body features of a human body included in the target human body image output from the human body detection model, wherein the human body features of the human body include a feature matrix of the human body included in the target human body image output from the human body detection network and a component feature matrix composed of component features of each component of the human body output from the component segmentation network;

transposing the component feature matrix to obtain a transposed matrix; determining a result of matrix multiplication between a feature matrix output by the human body detection network and the transposed matrix as a first result; carrying out bitwise multiplication on the characteristic matrixes output by the two human body detection networks to obtain a bitwise multiplication result, and determining the result of the bitwise multiplication and the result of the matrix multiplication of the transposed matrix as a second result;

identifying the human body based on the first result and the second result.

2. The method according to claim 1, wherein the obtaining of the human body features of the human body included in the target human body image output from the human body detection model comprises:

obtaining a mask matrix of the human body output from the component segmentation network, wherein the mask matrix includes component features of individual components of the human body.

3. The method of claim 1, wherein the identifying the human body based on the first result and the second result comprises:

splicing the first result and the second result to obtain a splicing result;

and determining whether the human body contained in the target human body image and the human body contained in the specified human body image indicate the same person or not based on the similarity of the splicing result and the features of the human body contained in the specified human body image.

4. The method of claim 1, wherein the training of the human detection model comprises:

acquiring a human body image sample, and acquiring a characteristic matrix of the human body image sample by using the human body detection network;

acquiring a component feature matrix of the human body image sample by utilizing the component segmentation network;

executing the characteristic processing step on the characteristic matrix and the component characteristic matrix of the human body image sample to obtain a first result and a second result corresponding to the human body image sample;

and training an initial human body detection model based on the first result and the second result corresponding to the human body image sample.

5. The method according to claim 4, wherein the training of the initial human detection model based on the first result and the second result corresponding to the human image sample comprises the following steps of generating a prediction result:

respectively indicating the component characteristics of each component of the human body sample in a first result corresponding to the human body image sample containing the human body sample, and inputting the component characteristics into a full connection layer and a classification layer of an initial human body detection network for processing to obtain a first prediction result of the human body sample;

and respectively indicating the component characteristics of each component of the human body sample in a second result corresponding to the human body image sample containing the human body sample, inputting the component characteristics into a full connection layer and a classification layer of an initial component segmentation network, and processing to obtain a second prediction result of the human body sample.

6. The method of claim 5, wherein training an initial human detection model based on the first and second results corresponding to the human image sample further comprises:

respectively executing the following operations on the initial human body detection network and the initial component segmentation network:

determining a loss value of the human body sample according to a preset loss function, a prediction result of the human body sample and a preset mark, and training based on the loss value, wherein the preset loss function is associated with a classification learning target; and performing measurement learning according to the prediction result of the human body sample, a preset label and a preset measurement function, so that the similarity of the characteristics of the same human body determined by the human body detection model obtained by learning is greater than the similarity of the characteristics of different human bodies.

7. An information processing apparatus comprising:

an acquisition unit configured to acquire a target human body image, input the target human body image into a human body detection network and a component segmentation network in a human body detection model trained in advance, wherein the human body detection model is used for extracting human body features including component features of each component;

an output unit configured to obtain human body features of a human body included in the target human body image output from the human body detection model, wherein the human body features of the human body include a feature matrix of the human body included in the target human body image output from the human body detection network, and a component feature matrix composed of component features of respective components of the human body output from the component segmentation network;

a processing unit configured to perform the following characteristic processing steps: transposing the component feature matrix to obtain a transposed matrix; determining a result of matrix multiplication between a feature matrix output by the human body detection network and the transposed matrix as a first result; carrying out bitwise multiplication on the characteristic matrixes output by the two human body detection networks to obtain a bitwise multiplication result, and determining the result of the bitwise multiplication and the result of the matrix multiplication of the transposed matrix as a second result;

an identification unit configured to identify the human body based on the first result and the second result.

8. The apparatus of claim 7, wherein the output unit is further configured to:

9. The apparatus of claim 7, wherein the identifying unit comprises:

the splicing module is configured to splice the first result and the second result to obtain a splicing result;

and the determining module is configured to determine whether the human body contained in the target human body image and the human body contained in the specified human body image indicate the same person or not based on the similarity of the splicing result and the characteristics of the human body contained in the specified human body image.

10. The apparatus of claim 7, wherein the training of the human detection model comprises:

11. The apparatus according to claim 10, wherein the training of the initial human detection model based on the first result and the second result corresponding to the human image sample comprises the following steps of generating a prediction result:

12. The apparatus of claim 11, wherein the training of an initial human detection model based on the first and second results corresponding to the human image sample further comprises:

13. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-6.