WO2021051857A1 - Target object matching method and apparatus, electronic device and storage medium - Google Patents

Target object matching method and apparatus, electronic device and storage medium Download PDF

Info

Publication number
WO2021051857A1
WO2021051857A1 PCT/CN2020/092332 CN2020092332W WO2021051857A1 WO 2021051857 A1 WO2021051857 A1 WO 2021051857A1 CN 2020092332 W CN2020092332 W CN 2020092332W WO 2021051857 A1 WO2021051857 A1 WO 2021051857A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
image
matching
feature
target
Prior art date
Application number
PCT/CN2020/092332
Other languages
French (fr)
Chinese (zh)
Inventor
颜鲲
杨昆霖
侯军
伊帅
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2022504597A priority Critical patent/JP7262659B2/en
Priority to SG11202110892SA priority patent/SG11202110892SA/en
Priority to KR1020227011057A priority patent/KR20220053670A/en
Publication of WO2021051857A1 publication Critical patent/WO2021051857A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to a target object matching method and device, electronic equipment and storage medium.
  • Human face matching or human hand matching is used to determine whether the human body in a picture matches the human face or human hand, because there are sometimes many people in an image, and each of them may have different actions and sizes. Similarly, there may even be situations where people and people overlap each other. Various reasons make it very challenging to match human faces and human hands.
  • the present disclosure proposes a technical solution for target object matching.
  • a target object matching method which includes: acquiring a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target The object includes at least one of a human hand and a human face; feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain all The degree of matching between the first target object in the first image and the second target object in the second image; based on the first target object in the first image and the second image The matching degree of the second target object in, establish a bipartite graph between the first target object and the second target object; based on the bipartite graph between the first target object and the second target object Figure, determine the matched first target object and second target object. Based on the above configuration, the matching accuracy between target objects can be improved, and it is suitable for scenes where multiple people have overlapping areas, and has better applicability.
  • the first image corresponding to the first target object and the second image corresponding to the second target object in the input image are respectively subjected to feature processing to obtain the first image
  • the degree of matching between the first target object in an image and the second target object in the second image includes: performing feature extraction processing on the first image and the second image to obtain all The first feature of the first image and the second feature of the second image; perform classification processing on the connection feature of the first feature and the second feature to obtain the first feature in the first image
  • the degree of matching between the target object and the second target object in the second image is determined on the above configuration, the degree of matching between two target objects can be easily obtained, and high-precision features and accurate matching degree can be obtained in the process.
  • the classification processing is performed on the connection features of the first feature and the second feature to obtain the first target object in the first image and the second image
  • the degree of matching of the second target object includes: performing feature fusion processing on the connection feature of the first feature and the second feature to obtain a fusion feature; inputting the fusion feature to a fully connected layer to execute the The classification process obtains the degree of matching between the first target object in the first image and the second target object in the second image. Based on the above configuration, classification efficiency and classification accuracy can be improved through fusion processing.
  • the establishing the first target object and the second target object based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the second target object includes: in response to the second target object including only a human face, based on the first target object in the first image and all the images in the second image According to the matching degree of the second target object, a bipartite graph between the human body and the face is established; in response to the second target object including only human hands, based on the first target object and the first target object in the first image According to the matching degree of the second target object in the two images, a bipartite graph between the human body and the human hand is established; in response to the second target object including the human face and the human hand, based on the first image in the first image
  • the degree of matching between the target object and the second target object in the second image is to establish the bipartite graph between the human body and the human face and the bipartite graph between the human body and the human hand;
  • the establishing the first target object and the second target object based on the degree of matching between the first target object in the first image and the second target object in the second image includes: establishing a bipartite graph between the first target object and the second target object based on the first target object and the second target object whose matching degree is greater than a first threshold. Based on the above configuration, the bipartite graph structure can be simplified and the matching efficiency can be improved.
  • the determining the matched first target object and the second target object based on the bipartite graph between the first target object and the second target object includes: based on the first target object and the second target object.
  • the bipartite graph between the target object and the second target object uses the greedy algorithm to match the first target object with the second target object in the order of the matching degree from high to low.
  • a preset number of the second target objects that are most matched by the target object are used as second target objects that match the first target object. Based on the above configuration, the matching target object can be easily and accurately determined.
  • the determining the matched first target object and the second target object based on the bipartite graph between the first target object and the second target object may further include; in response to the The bipartite graph between the first target object and the second target object includes the bipartite graph between the human body and the human hand.
  • the greedy algorithm is used to select at most two types that best match the first target object as the first human hand.
  • the type of is the second target object of the face. Based on the above configuration, different matching quantity values can be set adaptively for different types of second target objects, and the adaptability is better.
  • the determining the matching first target object and the second target object based on the bipartite graph between the first target object and the second target object further includes: responding to any one of the The first target object determines a preset number of matching second target objects, no longer matches the remaining second target objects for the first target object, and determines a matched first target object in response to any second target object , The second target object is no longer matched with the remaining first target objects. Based on the above configuration, the probability that the same target object is matched to multiple target objects can be reduced, and the matching accuracy can be improved.
  • the acquiring the first target object and the second target object to be matched in the input image includes at least one of the following ways: based on the detected first target object in the input image And the frame selection operation of the second target object to determine the first target object and the second target object in the input image; use a target detection neural network to detect the first target in the input image Object and the second target object; receiving location information where the first target object and the second target object in the input image are located, and determining the first target object and the second target object in the input image based on the location information target.
  • the target object to be matched can be determined in different ways, which has a better user experience.
  • the method before performing feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, the method further The method includes: adjusting the first image and the second image to preset specifications respectively, and the pairing of the first image corresponding to the first target object and the second target object in the input image Performing feature processing on the second image corresponding to the object respectively to obtain the degree of matching between the first target object in the first image and the second target object in the second image includes: adjusting the Perform feature processing on the first image and the second image of a preset specification to obtain the degree of matching between the first target object in the first image and the second target object in the second image. Based on the above configuration, it can be adapted to images of different specifications.
  • the method further includes: displaying the matched first target object and the second target object in the input image. Based on the above configuration, the matching result can be displayed intuitively, and the user experience is better.
  • the method further includes performing feature processing respectively on the first image corresponding to the first target object and the second image corresponding to the second target object through a twin neural network, Obtain the degree of matching between the first target object in the first image and the second target object in the second image. Based on the above configuration, the accuracy of feature processing can be improved, and the degree of matching can be further improved.
  • the method further includes the step of training the twin neural network, which includes: obtaining training samples, the training samples including a plurality of first training images and a plurality of second training images, the The first training image is a human body image, and the second training image is a human face image or a human hand image; the first training image and the second training image are input to the twin neural network to obtain the first training image And the predicted matching result of the second training image; based on the predicted matching result between the first training image and the second training image, determine the network loss, and adjust the twin neural network according to the network loss Network parameters until the training requirements are met. Based on the above configuration, the twin neural network can be optimized and the matching accuracy can be improved.
  • a target object matching device including:
  • An obtaining module configured to obtain a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target object includes at least one of a human hand and a human face;
  • the feature processing module is configured to perform feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image to obtain the The degree of matching between the first target object and the second target object in the second image;
  • a dichotomy module configured to establish the first target object and the second target based on the degree of matching between the first target object in the first image and the second target object in the second image Bipartite graph between objects;
  • the matching module is configured to determine the matched first target object and second target object based on the bipartite graph between the first target object and the second target object.
  • the feature processing module is further configured to perform feature extraction processing on the first image and the second image to obtain the first feature of the first image and the second image, respectively The second feature;
  • the feature processing module is further configured to perform feature fusion processing on the connection feature of the first feature and the second feature to obtain a fusion feature
  • the fusion feature is input to the fully connected layer to perform the classification process, and the degree of matching between the first target object in the first image and the second target object in the second image is obtained.
  • the dichotomy module is further configured to, when the second target object only includes a human face, based on the first target object and the second image in the first image The matching degree of the second target object in, establishing a bipartite graph between the human body and the human face;
  • the human body and the human hand are established Bipartite graph between
  • the second target object includes a human face and a human hand
  • a human body is established based on the degree of matching between the first target object in the first image and the second target object in the second image
  • the matching degree between the human body and the human face is used as the connection weight between the human body and the human face in the bipartite graph between the human body and the human face
  • the matching degree between the human body and the human hand is used as the human body The weight of the connection between the human body and the human hand in the bipartite graph between the human body and the human hand.
  • the bipartite module is further configured to establish a bipartite graph between the first target object and the second target object based on the first target object and the second target object whose matching degree is greater than a first threshold.
  • the matching module is further configured to use a greedy algorithm based on the bipartite graph between the first target object and the second target object, according to the first target object and the The matching degree of the second target object is in descending order, and a preset number of the second target objects that best match the first target object are used as the second target objects that match the first target object.
  • the matching module is further configured to use greedy when the bipartite graph between the first target object and the second target object includes a bipartite graph between a human body and a face.
  • the algorithm selects the second target object whose type is the face that best matches the first target object.
  • the matching module is further configured to no longer match the first target object with the remaining second target objects in the case that any first target object determines a preset number of matching second target objects.
  • Target audience and
  • any second target object determines a matching first target object, no other first target objects are matched for the second target object.
  • the acquiring module acquiring the first target object and the second target object to be matched in the input image includes at least one of the following methods:
  • the feature processing module is further configured to perform separate operations on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image. Before feature processing, the first image and the second image are adjusted to preset specifications respectively, and,
  • the feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first image in the first image.
  • the degree of matching between the target object and the second target object in the second image includes:
  • the device further includes a display module configured to display the matched first target object and the second target object in the input image.
  • the feature processing module is further configured to perform the feature respectively on the first image corresponding to the first target object and the second image corresponding to the second target object through a twin neural network. Processing to obtain the degree of matching between the first target object in the first image and the second target object in the second image.
  • the device further includes a training module for training the twin neural network, wherein the step of training the twin neural network includes: obtaining training samples, the training samples including a plurality of first training An image and a plurality of second training images, where the first training image is a human body image, and the second training image is a human face image or a human hand image;
  • the network loss is determined, and the network parameters of the twin neural network are adjusted according to the network loss until the training requirement is met.
  • an electronic device including:
  • a memory for storing processor executable instructions
  • the processor is configured to call instructions stored in the memory to execute the method described in any one of the first aspect.
  • a computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the method described in any one of the first aspect is implemented.
  • a computer-readable code When the computer-readable code runs in an electronic device, the processor in the electronic device executes the method.
  • the first image of the first target object and the second image of the second target object to be matched may be acquired first, where the first target object may be a human body, and the second target object may be a human face and/or Human hand, and then by performing feature processing on the first image and the second image, the degree of matching between the first target object in the first image and the second target object in the second image can be obtained, and then the first image can be determined by establishing a bipartite graph The matching result of the first target object in and the second target object in the second image.
  • the embodiment of the present disclosure first detects the matching degree between each first target object and each second target object, and constrains the detected matching degree by establishing a bipartite graph, and finally determines the first target object matching the first target object. Two target objects, making the result of final association matching more accurate.
  • Fig. 1 shows a flowchart of a target object matching method according to an embodiment of the present disclosure
  • FIG. 2 shows a schematic diagram of the location area of each target object in an input image obtained according to an embodiment of the present disclosure
  • FIG. 3 shows a flowchart of obtaining the matching degree between the first target object and the second target object through a neural network according to an embodiment of the present disclosure
  • Fig. 4 shows a schematic structural diagram of a twin neural network according to an embodiment of the present disclosure
  • FIG. 5 shows a schematic diagram of a bipartite graph between a human body and a human hand and a matching result constructed according to an embodiment of the present disclosure
  • FIG. 6 shows a flowchart of training a twin neural network according to an embodiment of the present disclosure
  • Fig. 7 shows a block diagram of a target object matching device according to an embodiment of the present disclosure
  • Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure
  • Fig. 9 shows a block diagram of another electronic device according to an embodiment of the present disclosure.
  • the embodiment of the present disclosure provides a target object matching method, which can easily obtain whether the objects in the two images match, for example, it can detect whether a face object matches a human object, or whether a human hand object matches a human object.
  • the method can be applied to any image processing equipment, for example, it can be applied to an electronic device or a server.
  • the electronic device can include terminal devices such as mobile phones, notebook computers, PADs, etc., and can also be included in smart bracelets, smart bracelets, and smart phones. Wearable devices such as watches, or other handheld devices.
  • the server may include a cloud server or a local server, etc. As long as image processing can be performed, it can be used as the execution subject of the target object matching method of the embodiment of the present disclosure.
  • Fig. 1 shows a flowchart of a target object matching method according to an embodiment of the present disclosure.
  • the target object matching method may include:
  • S10 Acquire a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target object includes at least one of a human hand and a human face;
  • the embodiments of the present disclosure can realize the matching of human face and human body and the matching of human hand and human body, that is, whether the human face and human body in the input image correspond to the same person, and whether the human hand and human body correspond to the same person.
  • the same person can match the face, hand, and human body of each character object.
  • the image of the target object to be matched in the input image can be obtained first.
  • the target object may include a human body, and at least one of a human hand and a human face.
  • the first target object includes a human body
  • the second target object includes at least one of a human face and a human hand.
  • S20 Perform feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first target object in the first image The degree of matching with the second target object in the second image;
  • the positions of the first target object and the second target to be matched in the input image can be obtained.
  • the image area corresponding to the first target object and the second target object in the input image can be determined to determine the first image corresponding to the position of the first target object in the input image, and the second target object in the input image.
  • the first image and the second image it is possible to detect the match between the first target object in the first image and the second target object in the second image by performing feature processing on the first image and the second image respectively Under the circumstances, the corresponding matching degree is obtained.
  • the acquisition of the degree of matching between the first target object and the second target object can be achieved through a neural network, and the image characteristics of the first image and the second image can be obtained respectively, and the first image can be further determined according to the image characteristics.
  • the degree of match between the target object and the second target object may include a feature extraction module, a feature fusion module, and a fully connected module.
  • the feature extraction module can perform feature extraction processing on the input first image and the second image, the feature fusion module can realize the feature fusion of the feature information of the first image and the second image, and the fully connected module can obtain the first target object and The second classification result of the second target object, that is, the matching degree between the first target object and the second target object can be obtained, where the matching degree can be a value greater than or equal to 0 and less than or equal to 1, and the greater the matching degree, the The more likely it is that a target object and a second target object correspond to the same character object.
  • the neural network can be a twin neural network, where the feature extraction module can include two feature extraction branches, the processing operations and parameters on the two feature extraction branches are all the same, and the two feature extraction branches can be used to extract the first Characteristic information of an image and a second image.
  • the matching degree detection is realized by the twin neural network, which can improve the accuracy of the detected matching degree.
  • a bipartite graph between the first target object and the second target object may be established.
  • the input image may include at least one person object, which may include at least one first target object and at least one second target object.
  • a bipartite graph between each first target object and each second target object can be established, where the first target object and the second target object They can be respectively used as two point sets in the bipartite graph, where the matching degree between the first target object and the second target object is used as the weight of each connection between the two point sets.
  • different bipartite graphs can be established according to the type of the second target object.
  • the obtained bipartite graph is the bipartite graph between the human body and the face.
  • the obtained bipartite graph is the bipartite graph between the human body and the human hand.
  • the obtained bipartite graph is the bipartite graph between the human body and the human face and the bipartite graph between the human body and the human hand.
  • the second target object matching the first target object can be determined according to the bipartite graph, that is, the second target object that matches the first target object can be determined.
  • the weight of the connection between the first target object and the second target object in the bipartite graph is the matching degree between the first target object and the second target object, and the embodiment of the present disclosure may vary from high to low according to the matching degree. Sequence, determine the second target object matched by the first target object.
  • the bipartite graph is a bipartite graph between a human body and a face
  • the most matching person can be determined for each human body (the first target object) Face (second target object).
  • the bipartite graph is a bipartite graph between the human body and the human body, based on the order of the matching degree from high to low, for each human body (first target object), at most two most matching human hands (second target) can be determined.
  • Object is a bipartite graph between a human body and a face, based on the order of the matching degree from high to low.
  • the embodiment of the present disclosure may use the greedy algorithm to obtain the second target object matched by the first target object.
  • any first target object matches the corresponding second target object it is no longer the first target object.
  • a target object and a second target object perform matching of other objects.
  • the embodiments of the present disclosure can first predict the matching degree between each first target object and the second target object in the input image, and use the method of establishing a bipartite graph to determine the matching result of the first target object and the second target object , Get higher precision matching results.
  • the embodiments of the present disclosure may first obtain an input image, where the input image may be any image including a human object, and the method of obtaining the input image may include at least one of the following methods: collecting the input image through an image acquisition device, and receiving transmission from other devices The input image, read the input image from the memory.
  • the image acquisition device may be any device with an image acquisition function, such as a camera, a video camera, a mobile phone, or a computer, etc., but the present disclosure does not specifically limit this.
  • the storage can be a local storage or a cloud storage.
  • the first target object and the second target object to be matched in the input image can be further obtained, such as obtaining the location area where the first target object and the second target object are located.
  • the embodiments of the present disclosure may input an input image into a neural network capable of realizing the detection of a target object, and the target object may include a human body, a human face, and a human hand.
  • the input image can be input into a neural network capable of detecting the target object, and after the detection of the neural network, the location area where the first target object is located and the location area where the second target object is located in the input image can be obtained.
  • the position area of each first target object and the second target object can be represented in the form of a detection frame in the input image.
  • the category information (human body, human face, or human hand) of the target object corresponding to each detection frame may be included.
  • the location area where the first target object and the second target object are located can be determined by the positions corresponding to the detection frame, and the types of the first target object and the second target object can be determined by the identification.
  • the neural network that performs the detection of the target object in the embodiment of the present disclosure may be a regional candidate network (RPN), or may also be a target recognition convolutional neural network (RCNN), but the present disclosure does not specifically limit this. In this way, all the first target objects and the second target objects in the input image can be easily and accurately identified.
  • the first target object and the second target object in the input image can also be determined according to the received frame selection operation for the input image, that is, the embodiment of the present disclosure can receive the frame selection operation input by the user, where The frame selection operation is to frame the first target object and the second target object to be matched from the input image, that is, frame selection of the location area corresponding to the first target object and the second target object, and the location area determined by the frame selection operation
  • the shape of may be a rectangle, or may also be other shapes, which is not specifically limited in the present disclosure. Among them, when receiving the frame selection operation, the category of the object corresponding to each frame selection area, such as a human body, a face, or a human hand, can also be received.
  • the first target object and the second target object to be matched can be determined based on the user's selection.
  • at least one first target object and at least one second target object in the input image can be used as the first target object to be matched.
  • the first target and the second target have better flexibility and applicability.
  • the position information for the first target object and the second target object may also be directly received, for example, the vertex coordinates and height values of the corresponding position areas of the first target object and the second target object may be received,
  • the corresponding location area can be determined.
  • the location information of the location area may also be expressed in other ways.
  • the first target object and the second target object to be matched can be determined based on the position information sent by the user. For example, at least one first target object and at least one second target object in the input image can be used as the target object to be matched.
  • the matched first target object and second target object have better flexibility and applicability.
  • Fig. 2 shows a schematic diagram of the location area of each target object in an input image obtained according to an embodiment of the present disclosure.
  • A1 and B1 respectively represent the location areas of the first target objects A and B, where the first target object is a human body.
  • A2 and B2 respectively represent the location area of the second target object whose type is human face, and
  • A3 and A4 represent the location area of the second target object whose type is human hand.
  • first target object and the second target object to be matched all human bodies, faces, and hands can be used as the first target object and the second target object to be matched, and in the embodiment of the present disclosure, only a part of the first target object and the second target object in the input image may be used as the first target object and the second target object.
  • the first target object and the second target object to be matched are not illustrated by examples here.
  • the embodiment of the present disclosure may execute the above-mentioned feature processing through a neural network, and obtain the matching degree between the corresponding first target object and the second target object.
  • Fig. 3 shows a flow chart of obtaining the matching degree between the first target object and the second target object through a neural network according to an embodiment of the present disclosure.
  • feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain
  • the degree of matching between the first target object in the first image and the second target object in the second image may include:
  • S21 Perform feature extraction processing on the first image and the second image to obtain the first feature of the first image and the second feature of the second image respectively;
  • feature extraction processing can be performed on the image regions of the first target object and the second target object in the input pusher, where the image region corresponding to the position of the first target object is the first image, The image area corresponding to the position of the second target object is the second image.
  • feature extraction processing of the first image and the second image may be performed.
  • the feature extraction process can be performed by the feature extraction module of the neural network.
  • the feature extraction module may include a feature extraction branch, which can be used to perform feature extraction processing of the first image and the second image respectively, in the case that multiple first target objects and multiple second target objects are included , You can also perform feature extraction processing on multiple first images and second images.
  • the feature extraction module may also include two feature extraction branches.
  • the two feature extraction branches may have the same network structure or different network structures. As long as the feature extraction can be performed, they can be used as an embodiment of the present disclosure.
  • the first image and the second image can be input into the two feature extraction branches in a one-to-one correspondence, for example, by performing feature extraction processing on the first image through one feature extraction branch, to obtain The first feature corresponding to the first image is subjected to feature extraction processing on the second image through another feature extraction branch to obtain the second feature corresponding to the second image.
  • it may also include at least three feature extraction branches for performing feature extraction processing of the first image and the second image, which is not specifically limited in the present disclosure.
  • FIG. 4 shows a schematic structural diagram of the twin neural network according to an embodiment of the present disclosure.
  • the feature extraction module of the embodiment of the present disclosure may include two feature extraction branches, and the structures and parameters of the two feature extraction branches of the twin neural network are completely the same.
  • the feature extraction branch may include a residual network, that is, the feature extraction module of the embodiment of the present disclosure may be composed of a residual network, and the residual module performs feature extraction processing on the first image and the second image to extract feature information in the image .
  • the residual network may be resnet18, but the present disclosure does not specifically limit this.
  • the feature extraction module may also be other network modules capable of performing feature extraction, and the present disclosure does not specifically limit this.
  • the first image I1 may be an image corresponding to a human body region
  • the second image I2 may be an image corresponding to a human face region or a second image of a human hand region.
  • each of the first and second images can be input into two feature extraction branches, respectively, and feature extraction processing can be performed.
  • the embodiment of the present disclosure may also only input one image to the feature extraction branch at a time, perform feature extraction of the two images, and when the matching degree of the target object in the two images is obtained, input the next required image. Perform a matching pair detection of the first image and the second image.
  • the implementation of the present disclosure can also assign an identifier to each image, and at the same time, it can also identify the type of target object included in the image. That is, in the embodiment of the present disclosure, each of the first image and the second image may include an image. The identifier and the type identifier are used for subsequent processing to distinguish each image and the type of the target object in the image.
  • the first image and the second image may be adjusted to images of preset specifications.
  • the first image and the second image can be adjusted to a size of a preset specification, such as 224*224 (but not as a specific limitation of the present disclosure) through reduction processing, enlargement processing, up-sampling, or down-sampling processing, and then
  • the first image and the second image adjusted to the preset specifications are input to the neural network to perform feature extraction, and the corresponding first feature and second feature are obtained.
  • S22 Perform classification processing on the connection feature of the first feature and the second feature, to obtain the information of the first target object in the first image and the second target object in the second image suitability.
  • the embodiments of the present disclosure may perform feature fusion processing on the connection features of the first feature and the second feature to obtain the fusion feature; and input the fusion feature to the fully connected layer to perform the classification process, The degree of matching between the first target object in the first image and the second target object in the second image is obtained.
  • the first feature and the second feature obtained by the embodiment of the present disclosure may be respectively expressed in the form of a matrix or a vector, and the scale of the first feature and the second feature may be the same.
  • the obtained first feature and the second feature can be connected, for example, in the channel direction to obtain a connection feature, where the connection can be performed by a connection function (concat function).
  • feature fusion processing can be performed on the connection feature, for example, at least one layer of convolution operation can be performed to realize the feature fusion processing.
  • residual error processing of connected features may be executed by a residual module (resnet_block) to perform feature fusion processing to obtain fused features.
  • the classification prediction of the matching degree is performed based on the fusion feature, in which the classification result of whether the first target object and the second target object are matched can be obtained, and the corresponding matching degree can be obtained.
  • the classification prediction in which the matching is performed can be realized by the fully connected layer (FC), that is, the fusion feature can be input to the fully connected layer, and the above prediction result can be output through the processing of the fully connected layer, that is, the first target object
  • the degree of matching with the second target object, and the matching result determined based on the degree of matching.
  • the matching degree when the matching degree is higher than the first threshold, it can be determined that the first target object matches the second target object.
  • the matching result can be the first identifier, such as "1", and when the matching degree is less than the first threshold
  • the matching result may be the second identifier, such as "0".
  • the above-mentioned first identifier and second identifier may be different identifiers, which are respectively used to indicate the matching result of the first target object and the second target object belonging to the same person object and not belonging to the same person object.
  • the matching degree between the first target object and the second target object to be matched in the input image is obtained, the matching degree between the first target object and the second target object can be established correspondingly according to the obtained matching degree. binary picture.
  • the first target object and the second target object may be constructed as the vertex sets V and E in the bipartite graph, and the connection between the vertices, that is, each edge in the bipartite graph may be the first object corresponding to the two vertices.
  • the degree of matching between a target object and a second target object is an undirected graph, in which the vertex set can be divided into two disjoint subsets, and the two vertices attached to each edge of the graph belong to these two disjoint A subset of.
  • the first target object and the second target object may be constructed as the vertex sets V and E in the bipartite graph, and the connection between the vertices, that is, each edge in the bipartite graph may be the first object corresponding
  • the corresponding bipartite graph may be established according to the type of the second target object to be matched in the input image. For example, when the second target object to be matched in the input image only includes a human face, the relationship between the human body and the human face can be established based on the degree of matching between the first target object in the first image and the second target object in the second image. Bipartite graph between.
  • a bipartite graph between the human body and the human hand can be established based on the degree of matching between the first target object in the first image and the second target object in the second image ;
  • the human body and the human face can be established based on the degree of matching between the first target object in the first image and the second target object in the second image.
  • the bipartite graph between the human body and the human hand that is, the bipartite graph between the human body and the human hand can be established by using each first target object and the second target object whose type is the human hand, using each first target object and type Create a bipartite graph between the human body and the human face for the second target object of the human face.
  • the matching degree between the human body and the face can be used as the connection weight between the human body and the face in the bipartite graph between the human body and the face, and the matching degree between the human body and the human hand As the connection weight between the human body and the human hand in the bipartite graph between the human body and the human hand.
  • the embodiment of the present disclosure may regard the first target object and the second target object as the point set of each vertex in the bipartite graph, and the point set is divided into three categories: human body, human face, and human hand. Furthermore, a bipartite graph can be established for the human face and the human hand respectively, and the weight of the corresponding edge between the two vertices is the matching degree between the first target object and the second target object corresponding to the corresponding two vertices output by the neural network.
  • each first target whose matching degree is higher than the first threshold can be selected.
  • Object and a second target object, and a bipartite graph between the first target object and the second target object is determined based on the first target object and the second target object whose matching degree is higher than the first threshold.
  • the second target object is not used to form a bipartite graph.
  • the first target object is not used to form a bipartite graph between the human body and the face
  • the first target object is not used to form a bipartite graph between the human body and the human hand.
  • the structure of the bipartite graph can be simplified, and the matching efficiency of the first target object and the second target object can be accelerated in general.
  • the greedy algorithm can be used to obtain the first target object of each human body type.
  • the preset number can be different values.
  • the preset number can be 2
  • the preset number can be 1.
  • different preset numbers of values can be selected according to the types of different target objects, which is not specifically limited in the present disclosure.
  • the embodiments of the present disclosure may use a greedy algorithm to determine the matching situation between the first target object and the second target object. That is, the second target object is matched to the corresponding first target object in the order of matching degree from high to low, and if the number of second target objects matched by a first target object reaches the preset number, the first target is terminated The matching procedure of the second target object of the object, that is, no longer matches any other second target objects for the first target object. In addition, if the second target object is determined to be a second target object matched by any of the first target objects, the matching procedure of the second target object is terminated, that is, the second target object is no longer matched with any other first targets Object.
  • the matching procedure in the process of determining the second target object matched by the first target object according to the order of the matching degree, if it is iterated to a match between the first target object and the second target object If the degree is lower than the first threshold, the matching procedure can be terminated at this time. For example, taking the bipartite graph between human body and face as an example, suppose the order of matching degree from high to low is that the matching degree of X1 and Y1 is 90%, the matching degree of X2 and Y2 is 80%, and the matching degree of X2 and Y1 is The degree is 50% and the matching degree of X1 and Y2 is 30%, and the first threshold may be 60%.
  • X1 and X2 respectively represent two first target objects
  • Y1 and Y2 respectively represent two second target objects.
  • the first target object X1 and the second target object Y1 with a matching degree of 90% can be matched.
  • the first target object X2 and the second target object Y2 with a matching degree of 80% are determined to be matched, and then since the next matching degree is 50%, which is less than the first threshold, the matching process can be terminated at this time .
  • it can be determined that the faces of the first target objects X1 and X2 are respectively matched to be Y1 and Y2.
  • the matching process is terminated by setting the first threshold, but it is not a specific limitation of the present disclosure. In other embodiments, it may only be based on the relationship between the first target object and the second target object. In the order of the matching degree from high to low, at most a preset number of second target objects are matched for each first target object.
  • the maximum preset number of second target objects here means that when the second target object is a human hand, each person object can match two hands, but because of the setting of the first threshold during the matching process, and Influenced by the number of second target objects in the input image, there may be a second target object whose first target object is matched with only one human hand type.
  • the second target object is a human hand as an example.
  • FIG. 5 shows a schematic diagram of the bipartite graph and the matching result between the human body and the human hand constructed according to an embodiment of the present disclosure. Among them, FIG. 5 shows a diagram based on the first target object and the second target object.
  • the bipartite graph between the human body and the human hand constructed by the matching degree between the target objects.
  • the human body and the human hand can be regarded as the set of two types of vertices of the bipartite graph respectively.
  • P1, P2, and P3 respectively represent three first target objects, that is, three human bodies.
  • H1, H2, H3, H4, and H5 respectively represent five second target objects whose types are human hands.
  • the connecting line between any two first target objects and the second target object can be expressed as the degree of matching between the first target object and the second target object.
  • each first target object can be assigned matching second target objects in the order of matching degree from high to low, wherein each first target object is matched with at most two second targets Objects, when a second target object is confirmed as matching a first target object according to the order of matching degree, the second target object may no longer be matched to the other first target objects at this time, and at the same time It is judged whether the number of second target objects matched by the first target object reaches the preset number, and if so, the first target object is no longer matched with the remaining second target objects.
  • the preset number is not reached, it can be based on In the order of the matching degree from high to low, when the second target object of the next matching degree is matched with the corresponding first target object, it can be determined whether the second target object is determined to be the second target matched by the remaining first target objects Object, and whether the number of second target objects matched by the first target object reaches the preset number, for example, the second target object is not matched to any first target object, and the second target object matched by the first target object is smaller than the preset number. If the number is set, it is determined that the first target object matches the second target object.
  • the termination condition may include at least one of the following: matching a corresponding second target object for each first target object, executing and completing the above matching process based on the first target object and the second target object with the lowest matching degree, and the matching degree being less than The first threshold.
  • the location area of the matched first target object and the second target object may be displayed.
  • the embodiment of the present disclosure may use the same display state to display the bounding box of the location area where the matched first target object and the second target object are located, and the bounding box may be the detection frame of each location area obtained in step S10.
  • the matching bounding boxes of the location areas of the first target object and the second target object may be displayed in the same color, but this is not a specific limitation of the present disclosure.
  • the line width of the display frame can be used to distinguish the human body frame, the hand frame and the face frame corresponding to different character objects, for example, so as to conveniently distinguish the matching results.
  • the second target object that best matches each first target object can be selected by establishing a bipartite graph, so as to improve the matching accuracy between the target objects.
  • the embodiment of the present disclosure can be applied to a neural network, for example, can be applied to a twin neural network.
  • the embodiment of the present disclosure can execute the first image corresponding to the location area of the first target object through the twin neural network.
  • the second image corresponding to the position area of the second target object is respectively subjected to feature processing to obtain the degree of matching between the first target object in the first image and the second target object in the second image.
  • Fig. 6 shows a flowchart of training a twin neural network according to an embodiment of the present disclosure.
  • the steps of training the twin neural network can include:
  • training samples include multiple first training images and multiple second training images, the first training images are human body images, and the second training images are human face images or human hand images;
  • the first training image and the second training image may be image regions captured from multiple images, or they may be corresponding types of targets identified from multiple images by means of target detection.
  • the image area of the object may also be any image including a human body, a human hand, or a human face, which is not specifically limited in the present disclosure.
  • S52 Input the first training image and the second training image to the twin neural network to obtain a predicted matching result of the first training image and the second training image;
  • the feature extraction of the first training image and the second training image, as well as feature connection, feature fusion, and classification processing are performed through the twin neural network, and the final prediction is between the first training image and the second training image.
  • the matching result between the first training image and the second training image can be determined according to the matching degree.
  • the matching result can be expressed as a first identifier and a second identifier.
  • the first identifier is 1 and the second identifier is 0, which is used to indicate the matching result of the first training image and the second training image matching or not matching.
  • the matching result can be determined according to the comparison result of the matching degree and the first threshold. If the matching degree is greater than the first threshold, it is determined that the matching result of the corresponding first training image and the second training image is a match, which can be expressed as the first ID, otherwise it is the second ID.
  • S53 Based on the predicted matching result between the first training image and the second training image, adjust the network parameters of the twin neural network until the training requirement is met.
  • the real matching result of the first training image and the second training image can be used as supervision, and the network loss can be determined according to the predicted matching result between the first training image and the second training image and the real matching result.
  • the network loss can be determined based on the difference between the two matching results.
  • the parameters of the twin neural network can be adjusted according to the network loss.
  • the obtained network loss is less than the loss threshold, it is determined that the training requirements are met. At this time, the training can be terminated. If the obtained network loss is greater than or equal to the loss threshold, the network parameters are adjusted according to the network loss and the first training images are re-predicted The matching result between the second training image and the second training image until the obtained network loss is less than the loss threshold.
  • the loss threshold may be a preset value, such as 1%, but it is not a specific limitation of the present disclosure, and may also be other values.
  • the following examples illustrate the specific process of the embodiments of the present disclosure.
  • the two branches of the network extract the features of the human body and the face or the hand respectively.
  • the extracted feature maps of the human body and the face or the hand are connected, and then enter the network for binary classification and scoring.
  • the score is 0-1 In between, if the human body matches the face or hand, the score is close to 1, otherwise close to 0.
  • the two branches of the network use resnet18 as the extracted feature, and the obtained feature maps are combined together, and then pass through a resnet_block convolution layer, and finally through a fully connected layer for classification to obtain the matching degree.
  • the point set is divided into three categories-human body, human face, and human hand.
  • a fully connected bipartite graph is established for the human face and human hand respectively, and the weight of the corresponding edge is the score (matching degree) output by the network.
  • Rule constraints on the bipartite graph a human body matches at most two human hands, and a human body matches at most one face.
  • Sort the scores use the greedy algorithm to match from high to low, remove all the extra illegal edges, and iterate until the end of the match.
  • the embodiments of the present disclosure can learn more association relationships in complex scenarios by using the twin network.
  • the embodiment of the present disclosure uses a bipartite graph to constrain the network output result during the final association, so that the accuracy of the final result is higher.
  • the first image of the first target object and the second image of the second target object to be matched may be acquired first, where the first target object may be a human body, and the second target object may be Is a human face and/or human hand, and then by performing feature processing on the first image and the second image, the degree of matching between the first target object in the first image and the second target object in the second image can be obtained, and then the bipartite graph is established by The method determines the matching result of the first target object in the first image and the second target object in the second image.
  • the embodiment of the present disclosure first detects the matching degree between each first target object and each second target object, and constrains the detected matching degree by establishing a bipartite graph, and finally determines the first target object matching the first target object. Two target objects, making the result of final association matching more accurate.
  • the present disclosure also provides target object devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any target object matching method provided in the present disclosure.
  • target object devices electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any target object matching method provided in the present disclosure.
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • Fig. 7 shows a block diagram of a target object matching device according to an embodiment of the present disclosure.
  • the target object matching device includes:
  • the acquiring module 10 is configured to acquire a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target object includes at least one of a human hand and a human face;
  • the feature processing module 20 is configured to perform feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain The degree of matching between the first target object and the second target object in the second image;
  • the dichotomy module 30 is configured to establish the first target object and the second target object based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the target objects;
  • the matching module 40 is configured to determine the matched first target object and the second target object based on the bipartite graph between the first target object and the second target object.
  • the feature processing module is further configured to perform feature extraction processing on the first image and the second image to obtain the first feature of the first image and the second image, respectively The second feature;
  • the feature processing module is further configured to perform feature fusion processing on the connection feature of the first feature and the second feature to obtain a fusion feature
  • the fusion feature is input to the fully connected layer to perform the classification process, and the degree of matching between the first target object in the first image and the second target object in the second image is obtained.
  • the dichotomy module is further configured to, when the second target object only includes a human face, based on the first target object and the second image in the first image The matching degree of the second target object in, establishing a bipartite graph between the human body and the human face;
  • the human body and the human hand are established Bipartite graph between
  • the second target object includes a human face and a human hand
  • a human body is established based on the degree of matching between the first target object in the first image and the second target object in the second image
  • the matching degree between the human body and the human face is used as the connection weight between the human body and the human face in the bipartite graph between the human body and the human face
  • the matching degree between the human body and the human hand is used as the human body The weight of the connection between the human body and the human hand in the bipartite graph between the human body and the human hand.
  • the bipartite module is further configured to establish a bipartite graph between the first target object and the second target object based on the first target object and the second target object whose matching degree is greater than a first threshold.
  • the matching module is further configured to use a greedy algorithm based on the bipartite graph between the first target object and the second target object, according to the first target object and the The matching degree of the second target object is in descending order, and a preset number of the second target objects that best match the first target object are used as the second target objects that match the first target object.
  • the matching module is further configured to use greedy when the bipartite graph between the first target object and the second target object includes a bipartite graph between a human body and a face.
  • the algorithm selects the second target object whose type is the face that best matches the first target object.
  • the matching module is further configured to no longer match the first target object with the remaining second target objects in the case that any first target object determines a preset number of matching second target objects.
  • Target audience and
  • any second target object determines a matching first target object, no other first target objects are matched for the second target object.
  • the acquiring module acquiring the first target object and the second target object to be matched in the input image includes at least one of the following methods:
  • the feature processing module is further configured to perform separate operations on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image. Before feature processing, the first image and the second image are adjusted to preset specifications respectively, and,
  • the feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first image in the first image.
  • the degree of matching between the target object and the second target object in the second image includes:
  • the device further includes a display module configured to display the matched first target object and the second target object in the input image.
  • the feature processing module is further configured to perform the feature respectively on the first image corresponding to the first target object and the second image corresponding to the second target object through a twin neural network. Processing to obtain the degree of matching between the first target object in the first image and the second target object in the second image.
  • the device further includes a training module for training the twin neural network, wherein the step of training the twin neural network includes: obtaining training samples, the training samples including a plurality of first training An image and a plurality of second training images, where the first training image is a human body image, and the second training image is a human face image or a human hand image;
  • the network loss is determined, and the network parameters of the twin neural network are adjusted according to the network loss until the training requirement is met.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
  • the computer-readable storage medium may be a volatile storage medium or a non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured as the above method.
  • the electronic device can be provided as a terminal, server or other form of device.
  • the embodiment of the present disclosure also provides a computer-readable code.
  • the processor in the electronic device executes the above-mentioned method.
  • Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
  • the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
  • the processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
  • the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
  • the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
  • the memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method operating on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • the power supply component 806 provides power for various components of the electronic device 800.
  • the power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
  • the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 810 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
  • the audio component 810 further includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
  • the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include but are not limited to: home button, volume button, start button, and lock button.
  • the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
  • the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
  • the component is the display and the keypad of the electronic device 800.
  • the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
  • the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
  • the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
  • the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
  • the electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
  • the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • ASIC application-specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable logic devices
  • FPGA field-available A programmable gate array
  • controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
  • Fig. 9 shows a block diagram of another electronic device according to an embodiment of the present disclosure.
  • the electronic device 1900 may be provided as a server.
  • the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932 for storing instructions executable by the processing component 1922, such as application programs.
  • the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described methods.
  • the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to the network, and an input output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
  • the present disclosure may be a system, method and/or computer program product.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
  • Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to access the Internet). connection).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
  • the computer-readable program instructions are executed to realize various aspects of the present disclosure.
  • These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
  • Executable instructions may also occur in a different order than the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a target object matching method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring a first target object and a second target object to be matched in an input image; processing features of a first image corresponding to the first target object and a second image corresponding to the second target object in the input image respectively to obtain a matching degree of the first target object in the first image and the second target object in the second image; establishing a bipartite graph between the first target object and the second target object on the basis of the matching degree of the first target object in the first image and the second target object in the second image; and on the basis of the bipartite graph between the first target object and the second target object, determining a first target object and a second target object that match. The embodiments of the present disclosure can improve the matching precision of target objects.

Description

目标对象匹配方法及装置、电子设备和存储介质Target object matching method and device, electronic equipment and storage medium
本公开要求在2019年09月18日提交中国专利局、申请号为201910882691.5、申请名称为“目标对象匹配方法及装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201910882691.5, and the application name is "target object matching method and device, electronic equipment and storage medium" on September 18, 2019, the entire content of which is incorporated by reference Incorporated in this disclosure.
技术领域Technical field
本公开涉及计算机视觉技术领域,尤其涉及一种目标对象匹配方法及装置、电子设备和存储介质。The present disclosure relates to the field of computer vision technology, and in particular to a target object matching method and device, electronic equipment and storage medium.
背景技术Background technique
人脸人体匹配或者人手人体匹配是用于确定一张图片中的人体与人脸或者人手是否为匹配的,因为在一张图像中有时会有很多人,其中每个人的动作、大小可能都不一样,甚至会出现人和人相互重叠的情况,种种原因造成了将人体人脸以及人体人手匹配起来有很大的挑战性。Human face matching or human hand matching is used to determine whether the human body in a picture matches the human face or human hand, because there are sometimes many people in an image, and each of them may have different actions and sizes. Similarly, there may even be situations where people and people overlap each other. Various reasons make it very challenging to match human faces and human hands.
发明内容Summary of the invention
本公开提出了一种目标对象匹配的技术方案。The present disclosure proposes a technical solution for target object matching.
根据本公开的一方面,提供了一种目标对象匹配方法,其包括:获取输入图像中待匹配的第一目标对象和第二目标对象,所述第一目标对象包括人体,所述第二目标对象包括人手和人脸中的至少一种;对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度;基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图;基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象。基于上述配置,可以提高目标对象之间的匹配精度,而且适用于多人存在重合区域的场景,具有更好的适用性。According to an aspect of the present disclosure, there is provided a target object matching method, which includes: acquiring a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target The object includes at least one of a human hand and a human face; feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain all The degree of matching between the first target object in the first image and the second target object in the second image; based on the first target object in the first image and the second image The matching degree of the second target object in, establish a bipartite graph between the first target object and the second target object; based on the bipartite graph between the first target object and the second target object Figure, determine the matched first target object and second target object. Based on the above configuration, the matching accuracy between target objects can be improved, and it is suitable for scenes where multiple people have overlapping areas, and has better applicability.
在一些可能的实施方式中,所述对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:对所述第一图像和所述第二图像执行特征提取处理,分别得到所述第一图像的第一特征和所述第二图像的第二特征;对所述第一特征和所述第二特征的连接特征执行分类处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。基于上述配置,可以方便获得两个目标对象之间的匹配度,并且该过程中可以得到高精度的特征以及得到精确的匹配度。In some possible implementation manners, the first image corresponding to the first target object and the second image corresponding to the second target object in the input image are respectively subjected to feature processing to obtain the first image The degree of matching between the first target object in an image and the second target object in the second image includes: performing feature extraction processing on the first image and the second image to obtain all The first feature of the first image and the second feature of the second image; perform classification processing on the connection feature of the first feature and the second feature to obtain the first feature in the first image The degree of matching between the target object and the second target object in the second image. Based on the above configuration, the degree of matching between two target objects can be easily obtained, and high-precision features and accurate matching degree can be obtained in the process.
在一些可能的实施方式中,所述对所述第一特征和所述第二特征的连接特征执行分类处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:对所述第一特征和所述第二特征的连接特征执行特征融合处理,得到融合特征;将所述融合特征输入至全连接层执行所述分类处理,得到所述第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度。基于上述配置,通过融合处理可以提高分类效率和分类精度。In some possible implementation manners, the classification processing is performed on the connection features of the first feature and the second feature to obtain the first target object in the first image and the second image The degree of matching of the second target object includes: performing feature fusion processing on the connection feature of the first feature and the second feature to obtain a fusion feature; inputting the fusion feature to a fully connected layer to execute the The classification process obtains the degree of matching between the first target object in the first image and the second target object in the second image. Based on the above configuration, classification efficiency and classification accuracy can be improved through fusion processing.
在一些可能的实施方式中,所述基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图,包括:响应于所述第二目标对象仅包括人脸,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图;响应于所述第二目标对象仅包括人手,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人手之间的二分图;响应于所述第二目标对象包括人脸和人手,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图以及人体和人手之间的二分图;其中,将人体和人脸之间的匹配度作为所述人体和人脸之间的二分图中人体和人脸之间的连接权值,以及将人体和人手之间的匹配度作为所述人体和人手之间的二分图中人体和人手之间的连接权值。基于上述配置,可以通过建立二分图的方式方便的构建目标对象之间的关系。In some possible implementation manners, the establishing the first target object and the second target object based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the second target object includes: in response to the second target object including only a human face, based on the first target object in the first image and all the images in the second image According to the matching degree of the second target object, a bipartite graph between the human body and the face is established; in response to the second target object including only human hands, based on the first target object and the first target object in the first image According to the matching degree of the second target object in the two images, a bipartite graph between the human body and the human hand is established; in response to the second target object including the human face and the human hand, based on the first image in the first image The degree of matching between the target object and the second target object in the second image is to establish the bipartite graph between the human body and the human face and the bipartite graph between the human body and the human hand; The matching degree is used as the connection weight between the human body and the human face in the bipartite graph between the human body and the human face, and the matching degree between the human body and the human hand is taken as the human body sum in the bipartite graph between the human body and the human hand. The weight of the connection between the hands. Based on the above configuration, the relationship between the target objects can be conveniently constructed by establishing a bipartite graph.
在一些可能的实施方式中,所述基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图,包括:基于匹 配度大于第一阈值的第一目标对象和第二目标对象,建立所述第一目标对象和第二目标对象之间的二分图。基于上述配置,可以简化二分图结构,提高匹配效率。In some possible implementation manners, the establishing the first target object and the second target object based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the second target objects includes: establishing a bipartite graph between the first target object and the second target object based on the first target object and the second target object whose matching degree is greater than a first threshold. Based on the above configuration, the bipartite graph structure can be simplified and the matching efficiency can be improved.
在一些可能的实施方式中,所述基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象,包括:基于所述第一目标对象和所述第二目标对象之间的二分图,利用贪吃算法,按照所述第一目标对象和所述第二目标对象的匹配度从高到低的顺序,将与所述第一目标对象最匹配的预设数量个所述第二目标对象作为与所述第一目标对象匹配的第二目标对象。基于上述配置,可以方便且精确的确定匹配的目标对象。In some possible implementation manners, the determining the matched first target object and the second target object based on the bipartite graph between the first target object and the second target object includes: based on the first target object and the second target object. The bipartite graph between the target object and the second target object uses the greedy algorithm to match the first target object with the second target object in the order of the matching degree from high to low. A preset number of the second target objects that are most matched by the target object are used as second target objects that match the first target object. Based on the above configuration, the matching target object can be easily and accurately determined.
在一些可能的实施方式中,所述基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象,还包括;响应于所述第一目标对象和所述第二目标对象之间的二分图包括人体和人手之间的二分图,利用贪心算法,选择出与所述第一目标对象最匹配的至多两个类型为人手的第二目标对象;响应于所述第一目标对象和所述第二目标对象之间的二分图包括人体和人脸之间的二分图,利用贪心算法,选择出与所述第一目标对象最匹配的类型为人脸的第二目标对象。基于上述配置,可以适应性的为不同类型的第二目标对象设定不同的匹配数量值,适应性更好。In some possible implementation manners, the determining the matched first target object and the second target object based on the bipartite graph between the first target object and the second target object may further include; in response to the The bipartite graph between the first target object and the second target object includes the bipartite graph between the human body and the human hand. The greedy algorithm is used to select at most two types that best match the first target object as the first human hand. Two target objects; in response to the bipartite graph between the first target object and the second target object including a bipartite graph between a human body and a face, using a greedy algorithm to select the best match with the first target object The type of is the second target object of the face. Based on the above configuration, different matching quantity values can be set adaptively for different types of second target objects, and the adaptability is better.
在一些可能的实施方式中,所述基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象,还包括:响应于任一第一目标对象确定出匹配的预设数量个第二目标对象,不再为所述第一目标对象匹配其余第二目标对象,以及响应于任一第二目标对象确定出匹配的第一目标对象,不再为所述第二目标对象匹配其余第一目标对象。基于上述配置,可以降低同一目标对象匹配给多个目标对象的概率,提高匹配精度。In some possible implementation manners, the determining the matching first target object and the second target object based on the bipartite graph between the first target object and the second target object further includes: responding to any one of the The first target object determines a preset number of matching second target objects, no longer matches the remaining second target objects for the first target object, and determines a matched first target object in response to any second target object , The second target object is no longer matched with the remaining first target objects. Based on the above configuration, the probability that the same target object is matched to multiple target objects can be reduced, and the matching accuracy can be improved.
在一些可能的实施方式中,所述获取输入图像中待匹配的第一目标对象和第二目标对象,包括以下方式中的至少一种:基于检测到的针对输入图像中所述第一目标对象和所述第二目标对象的框选操作,确定所述输入图像中的所述第一目标对象和所述第二目标对象;利用目标检测神经网络检测所述输入图像中的所述第一目标对象和所述第二目标对象;接收输入图像中所述第一目标对象和第二目标对象所在的位置信息,基于所述位置信息确定所述输入图像中的所述第一目标对象和第二目标对象。基于上述配置可以通过不同的方式确定待匹配的目标对象,具有更好的用户体验。In some possible implementation manners, the acquiring the first target object and the second target object to be matched in the input image includes at least one of the following ways: based on the detected first target object in the input image And the frame selection operation of the second target object to determine the first target object and the second target object in the input image; use a target detection neural network to detect the first target in the input image Object and the second target object; receiving location information where the first target object and the second target object in the input image are located, and determining the first target object and the second target object in the input image based on the location information target. Based on the above configuration, the target object to be matched can be determined in different ways, which has a better user experience.
在一些可能的实施方式中,在对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理之前,所述方法还包括:将所述第一图像和所述第二图像分别调整为预设规格,并且,所述对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:对所述调整为预设规格的所述第一图像和所述第二图像执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的第二目标对象的匹配度。基于上述配置,可以适应于不同规格的图像。In some possible implementation manners, before performing feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, the method further The method includes: adjusting the first image and the second image to preset specifications respectively, and the pairing of the first image corresponding to the first target object and the second target object in the input image Performing feature processing on the second image corresponding to the object respectively to obtain the degree of matching between the first target object in the first image and the second target object in the second image includes: adjusting the Perform feature processing on the first image and the second image of a preset specification to obtain the degree of matching between the first target object in the first image and the second target object in the second image. Based on the above configuration, it can be adapted to images of different specifications.
在一些可能的实施方式中,所述方法还包括:在所述输入图像中显示匹配的所述第一目标对象和所述第二目标对象。基于上述配置,可以直观的显示出匹配结果,用户体验更好。In some possible implementation manners, the method further includes: displaying the matched first target object and the second target object in the input image. Based on the above configuration, the matching result can be displayed intuitively, and the user experience is better.
在一些可能的实施方式中,所述方法还包括,通过孪生神经网络执行所述对所述第一目标对象对应的第一图像和所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。基于上述配置,可以提高特征处理的精度,进一步提高匹配度。In some possible implementation manners, the method further includes performing feature processing respectively on the first image corresponding to the first target object and the second image corresponding to the second target object through a twin neural network, Obtain the degree of matching between the first target object in the first image and the second target object in the second image. Based on the above configuration, the accuracy of feature processing can be improved, and the degree of matching can be further improved.
在一些可能的实施方式中,所述方法还包括训练所述孪生神经网络的步骤,其包括:获得训练样本,所述训练样本包括多个第一训练图像和多个第二训练图像,所述第一训练图像为人体图像,所述第二训练图像为人脸图像或者人手图像;将所述第一训练图像和所述第二训练图像输入至所述孪生神经网络,得到所述第一训练图像和所述第二训练图像的预测匹配结果;基于所述第一训练图像和所述第二训练图像之间的预测匹配结果,确定网络损失,并根据所述网络损失调整所述孪生神经网络的网络参数,直至满足训练要求。基于上述配置,可以优化孪生神经网络,提高匹配精度。In some possible implementation manners, the method further includes the step of training the twin neural network, which includes: obtaining training samples, the training samples including a plurality of first training images and a plurality of second training images, the The first training image is a human body image, and the second training image is a human face image or a human hand image; the first training image and the second training image are input to the twin neural network to obtain the first training image And the predicted matching result of the second training image; based on the predicted matching result between the first training image and the second training image, determine the network loss, and adjust the twin neural network according to the network loss Network parameters until the training requirements are met. Based on the above configuration, the twin neural network can be optimized and the matching accuracy can be improved.
根据本公开的第二方面,提供了一种目标对象匹配装置,包括:According to a second aspect of the present disclosure, there is provided a target object matching device, including:
获取模块,用于获取输入图像中待匹配的第一目标对象和第二目标对象,所述第一目标对象包括 人体,所述第二目标对象包括人手和人脸中的至少一种;An obtaining module, configured to obtain a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target object includes at least one of a human hand and a human face;
特征处理模块,用于对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度;The feature processing module is configured to perform feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image to obtain the The degree of matching between the first target object and the second target object in the second image;
二分模块,用于基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图;A dichotomy module, configured to establish the first target object and the second target based on the degree of matching between the first target object in the first image and the second target object in the second image Bipartite graph between objects;
匹配模块,用于基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象。The matching module is configured to determine the matched first target object and second target object based on the bipartite graph between the first target object and the second target object.
在一些可能的实施方式中,所述特征处理模块还用于对所述第一图像和所述第二图像执行特征提取处理,分别得到所述第一图像的第一特征和所述第二图像的第二特征;In some possible implementation manners, the feature processing module is further configured to perform feature extraction processing on the first image and the second image to obtain the first feature of the first image and the second image, respectively The second feature;
对所述第一特征和所述第二特征的连接特征执行分类处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。Perform classification processing on the connection feature of the first feature and the second feature to obtain the degree of matching between the first target object in the first image and the second target object in the second image .
在一些可能的实施方式中,所述特征处理模块还用于对所述第一特征和所述第二特征的连接特征执行特征融合处理,得到融合特征;In some possible implementation manners, the feature processing module is further configured to perform feature fusion processing on the connection feature of the first feature and the second feature to obtain a fusion feature;
将所述融合特征输入至全连接层执行所述分类处理,得到所述第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度。The fusion feature is input to the fully connected layer to perform the classification process, and the degree of matching between the first target object in the first image and the second target object in the second image is obtained.
在一些可能的实施方式中,所述二分模块还用于在所述第二目标对象仅包括人脸的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图;In some possible implementation manners, the dichotomy module is further configured to, when the second target object only includes a human face, based on the first target object and the second image in the first image The matching degree of the second target object in, establishing a bipartite graph between the human body and the human face;
载所述第二目标对象仅包括人手的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人手之间的二分图;In the case where the second target object only includes a human hand, based on the degree of matching between the first target object in the first image and the second target object in the second image, the human body and the human hand are established Bipartite graph between
在所述第二目标对象包括人脸和人手的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图以及人体和人手之间的二分图;In the case that the second target object includes a human face and a human hand, a human body is established based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the human body and the human face and the bipartite graph between the human body and the human hand;
其中,将人体和人脸之间的匹配度作为所述人体和人脸之间的二分图中人体和人脸之间的连接权值,以及将人体和人手之间的匹配度作为所述人体和人手之间的二分图中人体和人手之间的连接权值。Wherein, the matching degree between the human body and the human face is used as the connection weight between the human body and the human face in the bipartite graph between the human body and the human face, and the matching degree between the human body and the human hand is used as the human body The weight of the connection between the human body and the human hand in the bipartite graph between the human body and the human hand.
在一些可能的实施方式中,所述二分模块还用于基于匹配度大于第一阈值的第一目标对象和第二目标对象,建立所述第一目标对象和第二目标对象之间的二分图。In some possible implementation manners, the bipartite module is further configured to establish a bipartite graph between the first target object and the second target object based on the first target object and the second target object whose matching degree is greater than a first threshold. .
在一些可能的实施方式中,所述匹配模块还用于基于所述第一目标对象和所述第二目标对象之间的二分图,利用贪吃算法,按照所述第一目标对象和所述第二目标对象的匹配度从高到低的顺序,将与所述第一目标对象最匹配的预设数量个所述第二目标对象作为与所述第一目标对象匹配的第二目标对象。In some possible implementation manners, the matching module is further configured to use a greedy algorithm based on the bipartite graph between the first target object and the second target object, according to the first target object and the The matching degree of the second target object is in descending order, and a preset number of the second target objects that best match the first target object are used as the second target objects that match the first target object.
在一些可能的实施方式中,所述匹配模块还用于在所述第一目标对象和所述第二目标对象之间的二分图包括人体和人脸之间的二分图的情况下,利用贪心算法,选择出与所述第一目标对象最匹配的类型为人脸的第二目标对象。In some possible implementation manners, the matching module is further configured to use greedy when the bipartite graph between the first target object and the second target object includes a bipartite graph between a human body and a face. The algorithm selects the second target object whose type is the face that best matches the first target object.
在一些可能的实施方式中,所述匹配模块还用于在任一第一目标对象确定出匹配的预设数量个第二目标对象的情况下,不再为所述第一目标对象匹配其余第二目标对象,以及In some possible implementation manners, the matching module is further configured to no longer match the first target object with the remaining second target objects in the case that any first target object determines a preset number of matching second target objects. Target audience, and
在任一第二目标对象确定出匹配的第一目标对象的情况下,不再为所述第二目标对象匹配其余第一目标对象。In the case that any second target object determines a matching first target object, no other first target objects are matched for the second target object.
在一些可能的实施方式中,所述获取模块获取输入图像中待匹配的第一目标对象和第二目标对象,包括以下方式中的至少一种:In some possible implementation manners, the acquiring module acquiring the first target object and the second target object to be matched in the input image includes at least one of the following methods:
基于检测到的针对输入图像中所述第一目标对象和所述第二目标对象的框选操作,确定所述输入图像中的所述第一目标对象和所述第二目标对象;Determine the first target object and the second target object in the input image based on the detected frame selection operations on the first target object and the second target object in the input image;
利用目标检测神经网络检测所述输入图像中的所述第一目标对象和所述第二目标对象;Using a target detection neural network to detect the first target object and the second target object in the input image;
接收输入图像中所述第一目标对象和第二目标对象所在的位置信息,基于所述位置信息确定所述输入图像中的所述第一目标对象和第二目标对象。Receive location information where the first target object and the second target object in the input image are located, and determine the first target object and the second target object in the input image based on the location information.
在一些可能的实施方式中,所述特征处理模块还用于在对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理之前,将所述第一图像和所述第二图像分别调整为预设规格,并且,In some possible implementation manners, the feature processing module is further configured to perform separate operations on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image. Before feature processing, the first image and the second image are adjusted to preset specifications respectively, and,
所述对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:The feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first image in the first image. The degree of matching between the target object and the second target object in the second image includes:
对所述调整为预设规格的所述第一图像和所述第二图像执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的第二目标对象的匹配度。Perform feature processing on the first image and the second image adjusted to the preset specifications to obtain the first target object in the first image and the second target object in the second image The matching degree.
在一些可能的实施方式中,所述装置还包括显示模块,用于在所述输入图像中显示匹配的所述第一目标对象和所述第二目标对象。In some possible implementation manners, the device further includes a display module configured to display the matched first target object and the second target object in the input image.
在一些可能的实施方式中,所述特征处理模块还用于通过孪生神经网络执行所述对所述第一目标对象对应的第一图像和所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。In some possible implementation manners, the feature processing module is further configured to perform the feature respectively on the first image corresponding to the first target object and the second image corresponding to the second target object through a twin neural network. Processing to obtain the degree of matching between the first target object in the first image and the second target object in the second image.
在一些可能的实施方式中,所述装置还包括训练模块,用于训练所述孪生神经网络,其中训练所述孪生神经网络的步骤包括:获得训练样本,所述训练样本包括多个第一训练图像和多个第二训练图像,所述第一训练图像为人体图像,所述第二训练图像为人脸图像或者人手图像;In some possible implementation manners, the device further includes a training module for training the twin neural network, wherein the step of training the twin neural network includes: obtaining training samples, the training samples including a plurality of first training An image and a plurality of second training images, where the first training image is a human body image, and the second training image is a human face image or a human hand image;
将所述第一训练图像和所述第二训练图像输入至所述孪生神经网络,得到所述第一训练图像和所述第二训练图像的预测匹配结果;Inputting the first training image and the second training image to the twin neural network to obtain a predicted matching result of the first training image and the second training image;
基于所述第一训练图像和所述第二训练图像之间的预测匹配结果,确定网络损失,并根据所述网络损失调整所述孪生神经网络的网络参数,直至满足训练要求。Based on the predicted matching result between the first training image and the second training image, the network loss is determined, and the network parameters of the twin neural network are adjusted according to the network loss until the training requirement is met.
根据本公开的第三方面,提供了一种电子设备,其包括:According to a third aspect of the present disclosure, there is provided an electronic device including:
处理器;processor;
用于存储处理器可执行指令的存储器;A memory for storing processor executable instructions;
其中,所述处理器被配置为调用所述存储器存储的指令,以执行第一方面中任意一项所述的方法。Wherein, the processor is configured to call instructions stored in the memory to execute the method described in any one of the first aspect.
根据本公开的第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现第一方面中任意一项所述的方法。According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the method described in any one of the first aspect is implemented.
根据本公开的第五方面,提供了一种计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行第一方面中任意一项所述的方法。According to a fifth aspect of the present disclosure, there is provided a computer-readable code. When the computer-readable code runs in an electronic device, the processor in the electronic device executes the method.
在本公开实施例中,可以首先获取待匹配的第一目标对象的第一图像和第二目标对象的第二图像,其中第一目标对象可以为人体,第二目标对象可以为人脸和/或人手,而后通过对第一图像和第二图像执行特征处理,可以得到第一图像中第一目标对象和第二图像中第二目标对象的匹配度,进而通过建立二分图的方式确定第一图像中的第一目标对象和第二图像中的第二目标对象的匹配结果。本公开实施例首先检测各第一目标对象和各第二目标对象之间的匹配度,并通过建立二分图的方式对上述检测到的匹配度进行约束,最终确定与第一目标对象匹配的第二目标对象,使得最终关联匹配的结果精度更高。In the embodiment of the present disclosure, the first image of the first target object and the second image of the second target object to be matched may be acquired first, where the first target object may be a human body, and the second target object may be a human face and/or Human hand, and then by performing feature processing on the first image and the second image, the degree of matching between the first target object in the first image and the second target object in the second image can be obtained, and then the first image can be determined by establishing a bipartite graph The matching result of the first target object in and the second target object in the second image. The embodiment of the present disclosure first detects the matching degree between each first target object and each second target object, and constrains the detected matching degree by establishing a bipartite graph, and finally determines the first target object matching the first target object. Two target objects, making the result of final association matching more accurate.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, rather than limiting the present disclosure.
根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。According to the following detailed description of exemplary embodiments with reference to the accompanying drawings, other features and aspects of the present disclosure will become clear.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。The drawings here are incorporated into the specification and constitute a part of the specification. These drawings illustrate embodiments that conform to the present disclosure, and are used together with the specification to explain the technical solutions of the present disclosure.
图1示出根据本公开实施例的一种目标对象匹配方法的流程图;Fig. 1 shows a flowchart of a target object matching method according to an embodiment of the present disclosure;
图2示出根据本公开实施例获得的输入图像中各目标对象的位置区域的示意图;FIG. 2 shows a schematic diagram of the location area of each target object in an input image obtained according to an embodiment of the present disclosure;
图3示出根据本公开实施例通过神经网络得到第一目标对象和第二目标对象的匹配度的流程图;FIG. 3 shows a flowchart of obtaining the matching degree between the first target object and the second target object through a neural network according to an embodiment of the present disclosure;
图4示出根据本公开实施例的孪生神经网络的结构示意图;Fig. 4 shows a schematic structural diagram of a twin neural network according to an embodiment of the present disclosure;
图5示出根据本公开实施例的构建的人体和人手之间的二分图以及匹配结果的示意图;FIG. 5 shows a schematic diagram of a bipartite graph between a human body and a human hand and a matching result constructed according to an embodiment of the present disclosure;
图6示出根据本公开实施例训练孪生神经网络的流程图;FIG. 6 shows a flowchart of training a twin neural network according to an embodiment of the present disclosure;
图7示出根据本公开实施例的一种目标对象匹配装置的框图;Fig. 7 shows a block diagram of a target object matching device according to an embodiment of the present disclosure;
图8示出根据本公开实施例的一种电子设备的框图;Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure;
图9示出根据本公开实施例的另一种电子设备的框图。Fig. 9 shows a block diagram of another electronic device according to an embodiment of the present disclosure.
具体实施方式detailed description
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。Hereinafter, various exemplary embodiments, features, and aspects of the present disclosure will be described in detail with reference to the drawings. The same reference numerals in the drawings indicate elements with the same or similar functions. Although various aspects of the embodiments are shown in the drawings, unless otherwise noted, the drawings are not necessarily drawn to scale.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The dedicated word "exemplary" here means "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" need not be construed as being superior or better than other embodiments.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。The term "and/or" in this article is only an association relationship describing the associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, exist alone B these three situations. In addition, the term "at least one" in this document means any one or any combination of at least two of the multiple, for example, including at least one of A, B, and C, may mean including A, Any one or more elements selected in the set formed by B and C.
另外,为了更好地说明本公开,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。In addition, in order to better explain the present disclosure, numerous specific details are given in the following specific embodiments. Those skilled in the art should understand that the present disclosure can also be implemented without certain specific details. In some instances, the methods, means, elements, and circuits that are well known to those skilled in the art have not been described in detail in order to highlight the gist of the present disclosure.
本公开实施例提供了一种目标对象匹配方法,该方法可以方便的得到两个图像中的对象是否匹配,例如可以检测出人脸对象和人体对象是否匹配,或者检测人手对象与人体对象是否匹配。其中,该方法可以应用在任意的图像处理设备中,例如可以应用在电子设备、或者服务器中,其中,电子设备可以包括手机、笔记本电脑、PAD等终端设备,也可以包括在智能手环、智能手表等可佩戴设备,或者也可以为其他的手持设备等。服务器可以包括云端服务器或者本地服务器等。只要能够执行图像处理,即可以作为本公开实施例的目标对象匹配方法的执行主体。The embodiment of the present disclosure provides a target object matching method, which can easily obtain whether the objects in the two images match, for example, it can detect whether a face object matches a human object, or whether a human hand object matches a human object. . Among them, the method can be applied to any image processing equipment, for example, it can be applied to an electronic device or a server. The electronic device can include terminal devices such as mobile phones, notebook computers, PADs, etc., and can also be included in smart bracelets, smart bracelets, and smart phones. Wearable devices such as watches, or other handheld devices. The server may include a cloud server or a local server, etc. As long as image processing can be performed, it can be used as the execution subject of the target object matching method of the embodiment of the present disclosure.
图1示出根据本公开实施例的一种目标对象匹配方法的流程图,如图1所示,所述目标对象匹配方法可以包括:Fig. 1 shows a flowchart of a target object matching method according to an embodiment of the present disclosure. As shown in Fig. 1, the target object matching method may include:
S10:获取输入图像中待匹配的第一目标对象和第二目标对象,所述第一目标对象包括人体,所述第二目标对象包括人手和人脸中的至少一种;S10: Acquire a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target object includes at least one of a human hand and a human face;
在一些可能的实施方式中,本公开实施例可以实现人脸和人体的匹配以及人手和人体的匹配,即确定输入图像中的人脸与人体是否对应于同一人,以及人手和人体是否对应于同一人,从而可以实现针对每个人物对象的人脸、人手以及人体的匹配。其中,可以首先获得输入图像中待匹配的目标对象的图像。目标对象可以包括人体,以及人手和人脸中的至少一种。例如,可以对输入图像执行目标检测处理,检测出输入图像中的各目标对象,即首获得输入图像中待匹配的第一目标对象和第二目标对象,例如获得第一目标对象和第二目标图像在输入图像中的位置。进而可以确定第一目标对象对应的图像区域和第二目标对象对应的图像区域。其中,第一目标对象包括人体,第二目标对象包括人脸和人手中的至少一种。In some possible implementations, the embodiments of the present disclosure can realize the matching of human face and human body and the matching of human hand and human body, that is, whether the human face and human body in the input image correspond to the same person, and whether the human hand and human body correspond to the same person. The same person can match the face, hand, and human body of each character object. Among them, the image of the target object to be matched in the input image can be obtained first. The target object may include a human body, and at least one of a human hand and a human face. For example, you can perform target detection processing on the input image to detect each target object in the input image, that is, first obtain the first target object and the second target object to be matched in the input image, for example, obtain the first target object and the second target The position of the image in the input image. Furthermore, the image area corresponding to the first target object and the image area corresponding to the second target object can be determined. Wherein, the first target object includes a human body, and the second target object includes at least one of a human face and a human hand.
S20:对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度;S20: Perform feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first target object in the first image The degree of matching with the second target object in the second image;
在一些可能的实施方式中,在获得输入图像中待匹配的第一目标对象和第二目标对象的情况下,即可以获知待匹配的第一目标对象和第二目标分别在输入图像中的位置的情况下,可以确定第一目标对象以及第二目标对象在输入图像中所对应的的图像区域,即可以确定输入图像中第一目标对象的位置对应的第一图像,以及输入图像中第二目标对象的位置对应的第二图像,其中第一图像和第二图像 分别为输入图像中的一部分图像区域。In some possible implementation manners, when the first target object and the second target object to be matched in the input image are obtained, the positions of the first target object and the second target to be matched in the input image can be obtained. In the case of the input image, the image area corresponding to the first target object and the second target object in the input image can be determined to determine the first image corresponding to the position of the first target object in the input image, and the second target object in the input image. The second image corresponding to the position of the target object, where the first image and the second image are respectively a part of the image area in the input image.
在得到第一图像和第二图像的情况下,可以通过分别对第一图像和第二图像执行特征处理,检测第一图像中的第一目标对象和第二图像中的第二目标对象的匹配情况,得到相应的匹配度。In the case of obtaining the first image and the second image, it is possible to detect the match between the first target object in the first image and the second target object in the second image by performing feature processing on the first image and the second image respectively Under the circumstances, the corresponding matching degree is obtained.
在一些可能的实施方式中,可以通过神经网络实现上述第一目标对象和第二目标对象的匹配度的获取,可以分别得到第一图像和第二图像的图像特征,进一步根据图像特征确定第一目标对象和第二目标对象之间的匹配度。在一个示例中,神经网络可以包括特征提取模块、特征融合模块以及全连接模块。通过特征提取模块可以对输入的第一图像和第二图像执行特征提取处理,特征融合模块可以实现第一图像和第二图像的特征信息的特征融合,以及全连接模块可以得到第一目标对象和第二目标对象的二分类结果,即可以得到第一目标对象和第二目标对象的匹配度,其中该匹配度可以为大于或者等于0且小于或者等于1的数值,匹配度越大,表示第一目标对象和第二目标对象对应于同一人物对象的可能性就越大。In some possible implementation manners, the acquisition of the degree of matching between the first target object and the second target object can be achieved through a neural network, and the image characteristics of the first image and the second image can be obtained respectively, and the first image can be further determined according to the image characteristics. The degree of match between the target object and the second target object. In an example, the neural network may include a feature extraction module, a feature fusion module, and a fully connected module. The feature extraction module can perform feature extraction processing on the input first image and the second image, the feature fusion module can realize the feature fusion of the feature information of the first image and the second image, and the fully connected module can obtain the first target object and The second classification result of the second target object, that is, the matching degree between the first target object and the second target object can be obtained, where the matching degree can be a value greater than or equal to 0 and less than or equal to 1, and the greater the matching degree, the The more likely it is that a target object and a second target object correspond to the same character object.
在一个示例中,神经网络可以为孪生神经网络,其中特征提取模块可以包括两个特征提取分支,两个特征提取分支上的处理操作以及参数全部相同,通过该两个特征提取分支可以分别提取第一图像和第二图像的特征信息。通过孪生神经网络实现匹配度的检测,可以提高检测到的匹配度的精确度。In an example, the neural network can be a twin neural network, where the feature extraction module can include two feature extraction branches, the processing operations and parameters on the two feature extraction branches are all the same, and the two feature extraction branches can be used to extract the first Characteristic information of an image and a second image. The matching degree detection is realized by the twin neural network, which can improve the accuracy of the detected matching degree.
S30:基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图。S30: Based on the degree of matching between the first target object in the first image and the second target object in the second image, establish a relationship between the first target object and the second target object Bipartite graph.
在一些可能的实施方式中,在得到第一目标对象和第二目标对象的匹配度的情况下,可以建立第一目标对象和第二目标对象之间的二分图。其中,在输入图像中可以包括至少一个人物对象,其中可以包括至少一个第一目标对象,以及至少一个第二目标对象。通过每个第一目标对象和每个第二目标对象之间的匹配度,可以建立各个第一目标对象和各第二目标对象之间的二分图,其中,第一目标对象和第二目标对象可以分别作为二分图中的两个点集,其中第一目标对象和第二目标对象之间的匹配度作为两个点集之间的各连接权重。In some possible implementation manners, when the degree of matching between the first target object and the second target object is obtained, a bipartite graph between the first target object and the second target object may be established. Wherein, the input image may include at least one person object, which may include at least one first target object and at least one second target object. Through the degree of matching between each first target object and each second target object, a bipartite graph between each first target object and each second target object can be established, where the first target object and the second target object They can be respectively used as two point sets in the bipartite graph, where the matching degree between the first target object and the second target object is used as the weight of each connection between the two point sets.
例如,可以根据第二目标对象的类型,建立不同的二分图。在第二目标对象的类型为人脸时,得到的二分图即为人体和人脸之间的二分图,在第二目标对象的类型为人手时,得到的二分图即为人体和人手之间的二分图,在第二目标对象包括人脸和人手时,得到的二分图即为人体和人脸之间的二分图以及人体和人手之间的二分图。For example, different bipartite graphs can be established according to the type of the second target object. When the type of the second target object is a face, the obtained bipartite graph is the bipartite graph between the human body and the face. When the type of the second target object is a human hand, the obtained bipartite graph is the bipartite graph between the human body and the human hand. A bipartite graph. When the second target object includes a human face and a human hand, the obtained bipartite graph is the bipartite graph between the human body and the human face and the bipartite graph between the human body and the human hand.
S40:基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象。S40: Based on the bipartite graph between the first target object and the second target object, determine the matched first target object and the second target object.
在一些可能的实施方式中,在得到第一目标对象和第二目标对象之间的二分图的情况下,即可以根据该二分图确定与第一目标对象匹配的第二目标对象,即确定出与第一目标对象对应于相同人物对象的第二目标对象。In some possible implementation manners, when a bipartite graph between the first target object and the second target object is obtained, the second target object matching the first target object can be determined according to the bipartite graph, that is, the second target object that matches the first target object can be determined. A second target object corresponding to the same character object as the first target object.
其中,如上所述,二分图中第一目标对象和第二目标对象之间的连接权重为第一目标对象和第二目标对象的匹配度,本公开实施例可以按照匹配度从高到低的顺序,确定第一目标对象所匹配的第二目标对象。Wherein, as described above, the weight of the connection between the first target object and the second target object in the bipartite graph is the matching degree between the first target object and the second target object, and the embodiment of the present disclosure may vary from high to low according to the matching degree. Sequence, determine the second target object matched by the first target object.
在一个示例中,在二分图为人体和人脸之间的二分图的情况下,可以基于匹配度从高到低的顺序,为每个人体(第一目标对象)确定出一个最为匹配的人脸(第二目标对象)。在二分图为人体和人体之间的二分图的情况下,可以基于匹配度从高到低的顺序,为每个人体(第一目标对象)确定出至多两个最为匹配的人手(第二目标对象)。In one example, in the case where the bipartite graph is a bipartite graph between a human body and a face, based on the order of the matching degree from high to low, the most matching person can be determined for each human body (the first target object) Face (second target object). In the case that the bipartite graph is a bipartite graph between the human body and the human body, based on the order of the matching degree from high to low, for each human body (first target object), at most two most matching human hands (second target) can be determined. Object).
其中,本公开实施例可以利用贪吃算法得到上述第一目标对象匹配的第二目标对象,其中,在任一第一目标对象匹配出对应的第二目标对象的情况下,则不再为该第一目标对象和第二目标对象执行其他对象的匹配。Among them, the embodiment of the present disclosure may use the greedy algorithm to obtain the second target object matched by the first target object. In the case where any first target object matches the corresponding second target object, it is no longer the first target object. A target object and a second target object perform matching of other objects.
基于上述配置,本公开实施例可以首先预测输入图像中各第一目标对象和第二目标对象之间的匹配度,并利用建立二分图的方式确定第一目标对象和第二目标对象的匹配结果,得到精度更高的匹配结果。Based on the above configuration, the embodiments of the present disclosure can first predict the matching degree between each first target object and the second target object in the input image, and use the method of establishing a bipartite graph to determine the matching result of the first target object and the second target object , Get higher precision matching results.
下面结合附图对本公开实施例进行详细说明。本公开实施例可以首先获得输入图像,其中输入图 像可以为任意包括人物对象的图像,其中获得输入图像的方式可以包括以下方式中的至少一种:通过图像采集设备采集输入图像、接收其他设备传输的输入图像、从存储器中读取输入图像。其中图像采集设备可以为任意具有图像采集功能的设备,如可以为照相机、摄像机、手机或者电脑等,但本公开对此不作具体限定。另外存储器可以为本地存储器或者云存储器。上述仅为示例性说明获得输入图像的方式,在其他实施例中也可以通过其他方式获得输入图像,本公开对此不作具体限定。The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. The embodiments of the present disclosure may first obtain an input image, where the input image may be any image including a human object, and the method of obtaining the input image may include at least one of the following methods: collecting the input image through an image acquisition device, and receiving transmission from other devices The input image, read the input image from the memory. The image acquisition device may be any device with an image acquisition function, such as a camera, a video camera, a mobile phone, or a computer, etc., but the present disclosure does not specifically limit this. In addition, the storage can be a local storage or a cloud storage. The foregoing is only an exemplary description of the manner of obtaining the input image, and in other embodiments, the input image may also be obtained in other manners, which is not specifically limited in the present disclosure.
在获得输入图像的情况下,即可以进一步获得输入图像中待匹配的第一目标对象和第二目标对象,如获得第一目标对象和第二目标对象所在的位置区域。本公开实施例可以将输入图像输入至能够实现目标对象的检测的神经网络中,该目标对象可以包括人体、人脸和人手。例如可以将输入图像输入至能够执行目标对象的检测的神经网络中,经过该神经网络的检测,可以得到输入图像中的第一目标对象所在的位置区域,以及第二目标对象所在的位置区域,其中,可以在输入图像中以检测框的形式表示各第一目标对象和第二目标对象的位置区域。另外,可以包括各检测框对应的目标对象的类别信息(人体、人脸或者人手)。通过上述检测框对应的位置即可以确定第一目标对象和第二目标对象所在的位置区域,通过标识可以确定第一目标对象和第二目标对象的类型。例如,本公开实施例执行目标对象的检测的神经网络可以为区域候选网络(RPN),或者也可以为目标识别卷积神经网络(RCNN),但本公开对此不作具体限定。通过该种方式可以方便且精确的识别出输入图像中所有的第一目标对象和第二目标对象。In the case of obtaining the input image, the first target object and the second target object to be matched in the input image can be further obtained, such as obtaining the location area where the first target object and the second target object are located. The embodiments of the present disclosure may input an input image into a neural network capable of realizing the detection of a target object, and the target object may include a human body, a human face, and a human hand. For example, the input image can be input into a neural network capable of detecting the target object, and after the detection of the neural network, the location area where the first target object is located and the location area where the second target object is located in the input image can be obtained. Wherein, the position area of each first target object and the second target object can be represented in the form of a detection frame in the input image. In addition, the category information (human body, human face, or human hand) of the target object corresponding to each detection frame may be included. The location area where the first target object and the second target object are located can be determined by the positions corresponding to the detection frame, and the types of the first target object and the second target object can be determined by the identification. For example, the neural network that performs the detection of the target object in the embodiment of the present disclosure may be a regional candidate network (RPN), or may also be a target recognition convolutional neural network (RCNN), but the present disclosure does not specifically limit this. In this way, all the first target objects and the second target objects in the input image can be easily and accurately identified.
在一些可能的实施方式中,也可以根据接收的针对输入图像的框选操作确定输入图像中的第一目标对象和第二目标对象,即本公开实施例可以接收用户输入的框选操作,其中该框选操作是从输入图像中框选出待匹配的第一目标对象和第二目标对象,即框选出第一目标对象和第二目标对象对应的位置区域,框选操作确定的位置区域的形状可以为矩形,或者也可以为其他形状,本公开对此不作具体限定。其中,在接收框选操作时还可以接收每个框选区域对应的对象的类别,如人体、人脸或者人手。通过该种方式,可以基于用户的选择,确定待匹配的第一目标对象和第二目标对象,例如可以将输入图像中的至少一个第一目标对象和至少一个第二目标对象作为待匹配的第一目标对象和第二目标对象,具有更好的灵活性和适用性。In some possible implementation manners, the first target object and the second target object in the input image can also be determined according to the received frame selection operation for the input image, that is, the embodiment of the present disclosure can receive the frame selection operation input by the user, where The frame selection operation is to frame the first target object and the second target object to be matched from the input image, that is, frame selection of the location area corresponding to the first target object and the second target object, and the location area determined by the frame selection operation The shape of may be a rectangle, or may also be other shapes, which is not specifically limited in the present disclosure. Among them, when receiving the frame selection operation, the category of the object corresponding to each frame selection area, such as a human body, a face, or a human hand, can also be received. In this way, the first target object and the second target object to be matched can be determined based on the user's selection. For example, at least one first target object and at least one second target object in the input image can be used as the first target object to be matched. The first target and the second target have better flexibility and applicability.
在一些可能的实施方式中,也可以直接接收针对第一目标对象和第二目标对象的位置信息,例如可以接收第一目标对象和第二目标对象的相应位置区域的顶点坐标,以及高度值,从而可以确定相应位置区域。或者也可以接收相应位置区域对应的两个顶角的坐标,即可以确定第一目标对象和第二目标对象在输入图像中的位置区域,即得到输入图像中的第一目标对象和第二目标对象。上述仅为示例性说明,在其他实施例中也可以通过其他方式表示位置区域的位置信息。通过该种方式,可以基于用户的发送的位置信息,确定待匹配的第一目标对象和第二目标对象,例如可以将输入图像中的至少一个第一目标对象和至少一个第二目标对象作为待匹配的第一目标对象和第二目标对象,具有更好的灵活性和适用性。In some possible implementation manners, the position information for the first target object and the second target object may also be directly received, for example, the vertex coordinates and height values of the corresponding position areas of the first target object and the second target object may be received, Thus, the corresponding location area can be determined. Or you can receive the coordinates of the two vertex corners corresponding to the corresponding location area to determine the location area of the first target object and the second target object in the input image, that is, to obtain the first target object and the second target in the input image Object. The foregoing is only an exemplary description, and in other embodiments, the location information of the location area may also be expressed in other ways. In this way, the first target object and the second target object to be matched can be determined based on the position information sent by the user. For example, at least one first target object and at least one second target object in the input image can be used as the target object to be matched. The matched first target object and second target object have better flexibility and applicability.
通过上述配置可以确定输入图像中目标对象所在的位置区域,可以根据该位置区域得到输入图像中各第一目标对象的第一图像,以及各第二目标对象的第二图像。图2示出根据本公开实施例获得的输入图像中各目标对象的位置区域的示意图。其中,A1和B1分别表示第一目标对象A和B的位置区域,其中第一目标对象为人体。A2和B2分别表示类型为人脸的第二目标对象的位置区域,A3和A4表示类型为人手的第二目标对象的位置区域。图2中可以将全部人体、人脸以及人手均作为待匹配的第一目标对象和第二目标对象,本公开实施例也可以仅将输入图像中的一部分第一目标对象和第二目标对象作为待匹配的第一目标对象和第二目标对象,在此不做举例说明。Through the above configuration, the location area of the target object in the input image can be determined, and the first image of each first target object and the second image of each second target object in the input image can be obtained according to the location area. Fig. 2 shows a schematic diagram of the location area of each target object in an input image obtained according to an embodiment of the present disclosure. Among them, A1 and B1 respectively represent the location areas of the first target objects A and B, where the first target object is a human body. A2 and B2 respectively represent the location area of the second target object whose type is human face, and A3 and A4 represent the location area of the second target object whose type is human hand. In FIG. 2, all human bodies, faces, and hands can be used as the first target object and the second target object to be matched, and in the embodiment of the present disclosure, only a part of the first target object and the second target object in the input image may be used as the first target object and the second target object. The first target object and the second target object to be matched are not illustrated by examples here.
在得到待匹配的第一目标对象和第二目标对象的情况下,即可以通过对第一目标对象和第二目标对象对应的图像区域进行特征处理,预测第一目标对象和第二目标对象之间的匹配度。其中,本公开实施例可以通过神经网络执行上述特征处理,并得到相应的第一目标对象和第二目标对象之间的匹配度。图3示出根据本公开实施例通过神经网络得到第一目标对象和第二目标对象的匹配度的流程图。In the case of obtaining the first target object and the second target object to be matched, it is possible to predict the difference between the first target object and the second target object by performing feature processing on the image regions corresponding to the first target object and the second target object. The degree of match between. Among them, the embodiment of the present disclosure may execute the above-mentioned feature processing through a neural network, and obtain the matching degree between the corresponding first target object and the second target object. Fig. 3 shows a flow chart of obtaining the matching degree between the first target object and the second target object through a neural network according to an embodiment of the present disclosure.
如图3所示,本公开实施例中的对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第 二图像中的所述第二目标对象的匹配度,可以包括:As shown in FIG. 3, in the embodiment of the present disclosure, feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain The degree of matching between the first target object in the first image and the second target object in the second image may include:
S21:对所述第一图像和所述第二图像执行特征提取处理,分别得到所述第一图像的第一特征和所述第二图像的第二特征;S21: Perform feature extraction processing on the first image and the second image to obtain the first feature of the first image and the second feature of the second image respectively;
在一些可能的实施方式中,可以对第一目标对象和第二目标对象在输入推那个中的图像区域执行特征提取处理,其中第一目标对象的位置对应的图像区域即为第一图像,第二目标对象的位置对应的图像区域即为第二图像。在确定第一图像和第二图像的情况下,可以执行第一图像和第二图像的特征提取处理。其中,可以通过神经网络的特征提取模块执行特征提取处理。其中,特征提取模块可以包括一个特征提取分支,利用该特征提取分支可以分别执行第一图像和第二图像的特征提取处理,在包括多个第一目标对象以及多个第二目标对象的情况下,还可以对多个第一图像和第二图像执行特征提取处理。另外,特征提取模块也可以包括两个特征提取分支,该两个特征提取分支可以具有相同的网络结构,也可以为不同的网络结构,只要能够执行特征提取,即可以作为本公开实施例。在包括两个特征提取分支的情况下,可以分别将第一图像和第二图像一一对应的输入至两个特征提取分支中,例如通过一个特征提取分支对第一图像执行特征提取处理,得到第一图像对应的第一特征,通过另一个特征提取分支对第二图像执行特征提取处理,得到第二图像对应的第二特征。在其他实施例中,也可以包括至少三个特征提取分支,用于执行第一图像和第二图像的特征提取处理,本公开对此不作具体限定。通过上述方式可以精确的实现特征处理,以及匹配度的确定。In some possible implementations, feature extraction processing can be performed on the image regions of the first target object and the second target object in the input pusher, where the image region corresponding to the position of the first target object is the first image, The image area corresponding to the position of the second target object is the second image. In the case of determining the first image and the second image, feature extraction processing of the first image and the second image may be performed. Among them, the feature extraction process can be performed by the feature extraction module of the neural network. Wherein, the feature extraction module may include a feature extraction branch, which can be used to perform feature extraction processing of the first image and the second image respectively, in the case that multiple first target objects and multiple second target objects are included , You can also perform feature extraction processing on multiple first images and second images. In addition, the feature extraction module may also include two feature extraction branches. The two feature extraction branches may have the same network structure or different network structures. As long as the feature extraction can be performed, they can be used as an embodiment of the present disclosure. In the case of including two feature extraction branches, the first image and the second image can be input into the two feature extraction branches in a one-to-one correspondence, for example, by performing feature extraction processing on the first image through one feature extraction branch, to obtain The first feature corresponding to the first image is subjected to feature extraction processing on the second image through another feature extraction branch to obtain the second feature corresponding to the second image. In other embodiments, it may also include at least three feature extraction branches for performing feature extraction processing of the first image and the second image, which is not specifically limited in the present disclosure. Through the above method, the feature processing and the determination of the matching degree can be accurately realized.
下面以孪生神经网络为例进行说明,图4示出根据本公开实施例的孪生神经网络的结构示意图。本公开实施例的特征提取模块可以包括两个特征提取分支,孪生神经网络的两个特征提取分支的结构和参数完全相同。其中,特征提取分支可以包括残差网络,即本公开实施例的特征提取模块可以由残差网络构成,通过残差模块对第一图像和第二图像执行特征提取处理,提取图像中的特征信息。其中,残差网络可以为resnet18,但本公开对此不作具体限定,另外特征提取模块也可以为其他能够执行特征提取的网络模块,本公开对此也不作具体限定。如图4所示,第一图像I1可以为对应于人体区域的图像,第二图像I2可以为对应于人脸区域的图像,或者人手区域的第二图像。在存在多个第一图像和第二图像的情况下,可以分别将各第一图像和第二图像输入至两个特征提取分支中,执行特征提取处理。或者,本公开实施例也可以每次仅向特征提取分支分别输入一个图像,执行该两个图像的特征提取,并在得到两个图像中目标对象的匹配度的情况下,再输入下一次需要执行匹配对检测的第一图像和第二图像。The following takes the twin neural network as an example for description. FIG. 4 shows a schematic structural diagram of the twin neural network according to an embodiment of the present disclosure. The feature extraction module of the embodiment of the present disclosure may include two feature extraction branches, and the structures and parameters of the two feature extraction branches of the twin neural network are completely the same. Among them, the feature extraction branch may include a residual network, that is, the feature extraction module of the embodiment of the present disclosure may be composed of a residual network, and the residual module performs feature extraction processing on the first image and the second image to extract feature information in the image . The residual network may be resnet18, but the present disclosure does not specifically limit this. In addition, the feature extraction module may also be other network modules capable of performing feature extraction, and the present disclosure does not specifically limit this. As shown in FIG. 4, the first image I1 may be an image corresponding to a human body region, and the second image I2 may be an image corresponding to a human face region or a second image of a human hand region. When there are multiple first images and second images, each of the first and second images can be input into two feature extraction branches, respectively, and feature extraction processing can be performed. Alternatively, the embodiment of the present disclosure may also only input one image to the feature extraction branch at a time, perform feature extraction of the two images, and when the matching degree of the target object in the two images is obtained, input the next required image. Perform a matching pair detection of the first image and the second image.
另外,本公开实施还可以为每个图像分配标识,同时也可以对图像中包括的目标对象的类型进行标识,即本公开实施例中,每个第一图像以及第二图像都可以包括有图像标识以及类型标识,用以后续处理区分各图像,以及图像中的目标对象的类型。In addition, the implementation of the present disclosure can also assign an identifier to each image, and at the same time, it can also identify the type of target object included in the image. That is, in the embodiment of the present disclosure, each of the first image and the second image may include an image. The identifier and the type identifier are used for subsequent processing to distinguish each image and the type of the target object in the image.
另外,在一些可能的实施方式中,在得到各第一目标对象的第一图像以及各第二目标对象的第二图像时,可以将第一图像和第二图像调整为预设规格的图像。例如可以通过缩小处理、放大处理、升采样、或者降采样处理等,将第一图像和第二图像调整到预设规格的尺寸,比如224*224(但不作为本公开的具体限定),而后将调整为预设规格的第一图像和第二图像输入至神经网网络执行特征提取,得到相应的第一特征和第二特征。In addition, in some possible implementation manners, when the first image of each first target object and the second image of each second target object are obtained, the first image and the second image may be adjusted to images of preset specifications. For example, the first image and the second image can be adjusted to a size of a preset specification, such as 224*224 (but not as a specific limitation of the present disclosure) through reduction processing, enlargement processing, up-sampling, or down-sampling processing, and then The first image and the second image adjusted to the preset specifications are input to the neural network to perform feature extraction, and the corresponding first feature and second feature are obtained.
S22:对所述第一特征和所述第二特征的连接特征执行分类处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。S22: Perform classification processing on the connection feature of the first feature and the second feature, to obtain the information of the first target object in the first image and the second target object in the second image suitability.
在一些可能的实施方式中,本公开实施例可以对第一特征和第二特征的连接特征执行特征融合处理,得到融合特征;以及将所述融合特征输入至全连接层执行所述分类处理,得到所述第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度。In some possible implementation manners, the embodiments of the present disclosure may perform feature fusion processing on the connection features of the first feature and the second feature to obtain the fusion feature; and input the fusion feature to the fully connected layer to perform the classification process, The degree of matching between the first target object in the first image and the second target object in the second image is obtained.
其中,本公开实施例得到的第一特征和第二特征可以分别表示为矩阵或者向量的形式,该第一特征和第二特征的尺度可以相同。而后可以将得到的第一特征和第二特征进行连接,例如在通道方向上连接得到连接特征,其中连接可以通过连接函数(concat函数)执行。在得到第一特征和第二特征的连接特征的情况下,可以对该连接特征执行特征融合处理,如可以执行至少一层的卷积操作实现该特征融合处理。例如本公开实施例可以通过残差模块(resnet_block)执行连接特征的残差处理,以执行 特征融合处理得到融合特征。而后基于融合特征执行匹配度的分类预测,其中可以得到第一目标对象和第二目标对象是否匹配的分类结果,以及可以得到对应的匹配度。Wherein, the first feature and the second feature obtained by the embodiment of the present disclosure may be respectively expressed in the form of a matrix or a vector, and the scale of the first feature and the second feature may be the same. Then, the obtained first feature and the second feature can be connected, for example, in the channel direction to obtain a connection feature, where the connection can be performed by a connection function (concat function). In the case where the connection feature of the first feature and the second feature is obtained, feature fusion processing can be performed on the connection feature, for example, at least one layer of convolution operation can be performed to realize the feature fusion processing. For example, in the embodiment of the present disclosure, residual error processing of connected features may be executed by a residual module (resnet_block) to perform feature fusion processing to obtain fused features. Then, the classification prediction of the matching degree is performed based on the fusion feature, in which the classification result of whether the first target object and the second target object are matched can be obtained, and the corresponding matching degree can be obtained.
在一个示例中,其中执行匹配的分类预测可以通过全连接层(FC)实现,即可以将融合特征输入至全连接层,通过全连接层的处理可以输出得到上述预测结果,即第一目标对象和第二目标对象的匹配度,以及基于该匹配度确定的是否匹配的匹配结果。其中,可以在匹配度高于第一阈值的情况下,确定第一目标对象和第二目标对象匹配,此时匹配结果可以为第一标识,如“1”,而在匹配度小于第一阈值的情况下,确定第一目标对象和第二目标对象不匹配,此时匹配结果可以为第二标识,如“0”。上述第一标识和第二标识可以为不同的标识,分别用于表示第一目标对象和第二目标对象属于同一人物对象和不属于同一人物对象的匹配结果。In one example, the classification prediction in which the matching is performed can be realized by the fully connected layer (FC), that is, the fusion feature can be input to the fully connected layer, and the above prediction result can be output through the processing of the fully connected layer, that is, the first target object The degree of matching with the second target object, and the matching result determined based on the degree of matching. Wherein, when the matching degree is higher than the first threshold, it can be determined that the first target object matches the second target object. At this time, the matching result can be the first identifier, such as "1", and when the matching degree is less than the first threshold In the case of, it is determined that the first target object and the second target object do not match, and the matching result may be the second identifier, such as "0". The above-mentioned first identifier and second identifier may be different identifiers, which are respectively used to indicate the matching result of the first target object and the second target object belonging to the same person object and not belonging to the same person object.
在得到输入图像中待匹配的各第一目标对象和第二目标对象之间的匹配度的情况下,即可以根据该得到的匹配度对应的建立第一目标对象和第二目标对象之间的二分图。When the matching degree between the first target object and the second target object to be matched in the input image is obtained, the matching degree between the first target object and the second target object can be established correspondingly according to the obtained matching degree. binary picture.
其中,G=(V,E)是一个无向图,其中顶点集可分割为两个互不相交的子集,并且图中每条边依附的两个顶点都分属于这两个互不相交的子集。本公开实施例中,可以将第一目标对象和第二目标对象构造为二分图中的顶点集V和E,各顶点之间的连接即二分图中的各边可以为两个顶点对应的第一目标对象和第二目标对象之间的匹配度。Among them, G=(V, E) is an undirected graph, in which the vertex set can be divided into two disjoint subsets, and the two vertices attached to each edge of the graph belong to these two disjoint A subset of. In the embodiment of the present disclosure, the first target object and the second target object may be constructed as the vertex sets V and E in the bipartite graph, and the connection between the vertices, that is, each edge in the bipartite graph may be the first object corresponding to the two vertices. The degree of matching between a target object and a second target object.
在一些可能的实施方式中,可以根据输入图像中执行待匹配处理的第二目标对象的类型来建立相应的二分图。例如,在输入图像中待匹配的第二目标对象仅包括人脸时,可以基于第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度,建立人体和人脸之间的二分图。在输入图像中待匹配的第二目标对象仅包括人手时,可以基于第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度,建立人体和人手之间的二分图;以及在输入图像中待匹配的第二目标对象包括人脸和人手时,可以基于第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度,建立人体和人脸之间的二分图以及人体和人手之间的二分图,即可以利用各第一目标对象与类型为人手的第二目标对象建立人体和人手之间的二分图,利用各第一目标对象与类型为人脸的第二目标对象建立人体和人脸之间的二分图。其中在各二分图中,可以将人体和人脸之间的匹配度作为人体和人脸之间的二分图中人体和人脸之间的连接权值,以及将人体和人手之间的匹配度作为所述人体和人手之间的二分图中人体和人手之间的连接权值。In some possible implementation manners, the corresponding bipartite graph may be established according to the type of the second target object to be matched in the input image. For example, when the second target object to be matched in the input image only includes a human face, the relationship between the human body and the human face can be established based on the degree of matching between the first target object in the first image and the second target object in the second image. Bipartite graph between. When the second target object to be matched in the input image includes only a human hand, a bipartite graph between the human body and the human hand can be established based on the degree of matching between the first target object in the first image and the second target object in the second image ; And when the second target object to be matched in the input image includes a human face and a human hand, the human body and the human face can be established based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the human body and the human hand, that is, the bipartite graph between the human body and the human hand can be established by using each first target object and the second target object whose type is the human hand, using each first target object and type Create a bipartite graph between the human body and the human face for the second target object of the human face. Among them, in each bipartite graph, the matching degree between the human body and the face can be used as the connection weight between the human body and the face in the bipartite graph between the human body and the face, and the matching degree between the human body and the human hand As the connection weight between the human body and the human hand in the bipartite graph between the human body and the human hand.
也就是说,本公开实施例可以将第一目标对象和第二目标对象作为二分图中的各顶点的点集,该点集分为三类:人体、人脸和人手。进而可以对人体人脸、人体人手分别建立二分图,两个顶点之间相应边的权值为神经网络输出的相应两个顶点对应的第一目标对象和第二目标对象之间的匹配度。That is to say, the embodiment of the present disclosure may regard the first target object and the second target object as the point set of each vertex in the bipartite graph, and the point set is divided into three categories: human body, human face, and human hand. Furthermore, a bipartite graph can be established for the human face and the human hand respectively, and the weight of the corresponding edge between the two vertices is the matching degree between the first target object and the second target object corresponding to the corresponding two vertices output by the neural network.
在此需要说明的是,本公开实施例在获得每个第一目标对象与每个第二目标对象之间的匹配度的情况下,可以选择出匹配度高于第一阈值的各第一目标对象和第二目标对象,并基于匹配度高于第一阈值的第一目标对象和第二目标对象确定第一目标对象和第二目标对象之间的二分图。It should be noted here that, in the case of obtaining the matching degree between each first target object and each second target object in the embodiments of the present disclosure, each first target whose matching degree is higher than the first threshold can be selected. Object and a second target object, and a bipartite graph between the first target object and the second target object is determined based on the first target object and the second target object whose matching degree is higher than the first threshold.
其中,针对每个第一目标对象,如果存在一第二目标对象与所有的第一目标对象之间的匹配度都低于第一阈值,则该第二目标对象不用于形成二分图。反之,如果存在一第一目标对象与所有的人脸类型的第二目标对象之间的匹配度都低于第一阈值,则该第一目标对象不用于形成人体和人脸之间的二分图,如果存在一第一目标对象与所有的人体类型的第二目标对象之间的匹配度都低于第一阈值,则该第一目标对象不用于形成人体和人手之间的二分图。Wherein, for each first target object, if there is a matching degree between a second target object and all first target objects that are lower than the first threshold, the second target object is not used to form a bipartite graph. Conversely, if there is a matching degree between a first target object and all face types of second target objects that are lower than the first threshold, then the first target object is not used to form a bipartite graph between the human body and the face If there is a matching degree between a first target object and all second target objects of human body types that are lower than the first threshold, the first target object is not used to form a bipartite graph between the human body and the human hand.
通过第一阈值的设定,可以简化二分图的结构,通时可以加快第一目标对象和第二目标对象的匹配效率。Through the setting of the first threshold, the structure of the bipartite graph can be simplified, and the matching efficiency of the first target object and the second target object can be accelerated in general.
在得到第一目标对象和第二目标对象的二分图的情况下,可以基于第一目标对象和第二目标对象之间的二分图,利用贪吃算法,得到与各人体类型的第一目标对象匹配的至多预设数量个第二目标对象。其中,针对不同类型的第二目标对象,预设数量可以为不同的数值,例如在第二目标对象为人手的情况下,预设数量可以为2,在第二目标对象为人脸的情况下,该预设数量可以为1。具体可以根据不同的目标对象的类型选取不同的预设数量的值,本公开对此不作具体限定。In the case of obtaining the bipartite graph of the first target object and the second target object, based on the bipartite graph between the first target object and the second target object, the greedy algorithm can be used to obtain the first target object of each human body type. At most a preset number of matching second target objects. Among them, for different types of second target objects, the preset number can be different values. For example, when the second target object is a human hand, the preset number can be 2, and when the second target object is a face, The preset number can be 1. Specifically, different preset numbers of values can be selected according to the types of different target objects, which is not specifically limited in the present disclosure.
其中,可以按照匹配度从高到低的顺序,确定第一目标对象匹配的至多预设数量个第二目标对象。 本公开实施例可以利用贪吃算法,确定第一目标对象和第二目标对象的匹配情况。即按照匹配度从高到低的顺序,将第二目标对象匹配给对应的第一目标对象,如果一第一目标对象匹配的第二目标对象的数量达到预设数量,则终止该第一目标对象的第二目标对象的匹配程序,即不再为该第一目标对象匹配任何其余的第二目标对象。另外,如果第二目标对象被确定为任一第一目标对象匹配的第二目标对象,则终止该第二目标对象的匹配程序,即不再为该第二目标对象匹配任何其余的第一目标对象。Wherein, at most a preset number of second target objects matched by the first target object may be determined according to the order of the matching degree from high to low. The embodiments of the present disclosure may use a greedy algorithm to determine the matching situation between the first target object and the second target object. That is, the second target object is matched to the corresponding first target object in the order of matching degree from high to low, and if the number of second target objects matched by a first target object reaches the preset number, the first target is terminated The matching procedure of the second target object of the object, that is, no longer matches any other second target objects for the first target object. In addition, if the second target object is determined to be a second target object matched by any of the first target objects, the matching procedure of the second target object is terminated, that is, the second target object is no longer matched with any other first targets Object.
在一些可能的实施方式中,在按照匹配度从高到低的顺序确定第一目标对象匹配的第二目标对象的过程中,如果迭代到一第一目标对象和第二目标对象之间的匹配度低于第一阈值,则此时可以终止匹配程序。例如,以人体和人脸之间的二分图为例,假设匹配度从高到低的顺序为X1和Y1的匹配度为90%、X2和Y2的匹配度为80%、X2和Y1的匹配度为50%以及X1和Y2的匹配度为30%,以及第一阈值可以为60%。其中,X1和X2分别表示两个第一目标对象,Y1和Y2分别表示两个第二目标对象,按照匹配度的顺序可以将90%的匹配度的第一目标对象X1和第二目标对象Y1确定为匹配的,将80%的匹配度的第一目标对象X2和第二目标对象Y2确定为匹配的,而后由于下一个匹配度为50%,其小于第一阈值,此时可以终止匹配过程。通过上述即可以确定出第一目标对象X1和X2分别匹配的人脸为Y1和Y2。In some possible implementations, in the process of determining the second target object matched by the first target object according to the order of the matching degree, if it is iterated to a match between the first target object and the second target object If the degree is lower than the first threshold, the matching procedure can be terminated at this time. For example, taking the bipartite graph between human body and face as an example, suppose the order of matching degree from high to low is that the matching degree of X1 and Y1 is 90%, the matching degree of X2 and Y2 is 80%, and the matching degree of X2 and Y1 is The degree is 50% and the matching degree of X1 and Y2 is 30%, and the first threshold may be 60%. Among them, X1 and X2 respectively represent two first target objects, and Y1 and Y2 respectively represent two second target objects. According to the order of matching degree, the first target object X1 and the second target object Y1 with a matching degree of 90% can be matched. Determined to be matched, the first target object X2 and the second target object Y2 with a matching degree of 80% are determined to be matched, and then since the next matching degree is 50%, which is less than the first threshold, the matching process can be terminated at this time . Through the above, it can be determined that the faces of the first target objects X1 and X2 are respectively matched to be Y1 and Y2.
上述仅为示例性说明,通过第一阈值的设置来终止匹配的过程,但不作为本公开的具体限定,在其他实施例中,也可以只根据各第一目标对象和第二目标对象之间的匹配度从高到低的顺序,为各第一目标对象匹配出至多预设数量个第二目标对象。这里的至多预设数量个第二目标对象是指,在第二目标对象为人手时,由于每个人物对象可以匹配两只手,但是由于在匹配的过程中,由于第一阈值的设置,以及输入图像中第二目标对象的数量的影响,可能存在第一目标对象只被匹配出一个人手类型的第二目标对象。The foregoing is only an exemplary description. The matching process is terminated by setting the first threshold, but it is not a specific limitation of the present disclosure. In other embodiments, it may only be based on the relationship between the first target object and the second target object. In the order of the matching degree from high to low, at most a preset number of second target objects are matched for each first target object. The maximum preset number of second target objects here means that when the second target object is a human hand, each person object can match two hands, but because of the setting of the first threshold during the matching process, and Influenced by the number of second target objects in the input image, there may be a second target object whose first target object is matched with only one human hand type.
下面以第二目标对象为人手举例说明,图5示出根据本公开实施例的构建的人体和人手之间的二分图以及匹配结果的示意图,其中,图5表示基于第一目标对象和第二目标对象之间的匹配度构建的人体和人手之间的二分图。其中,可以将人体和人手分别作为二分图的两类顶点的集合。其中P1、P2和P3分别表示三个第一目标对象,即三个人体。H1、H2、H3、H4和H5分别表示五个类型为人手的第二目标对象。任意两个第一目标对象和第二目标对象之间的连接线,可以表示为第一目标对象和第二目标对象之间的匹配度。In the following, the second target object is a human hand as an example. FIG. 5 shows a schematic diagram of the bipartite graph and the matching result between the human body and the human hand constructed according to an embodiment of the present disclosure. Among them, FIG. 5 shows a diagram based on the first target object and the second target object. The bipartite graph between the human body and the human hand constructed by the matching degree between the target objects. Among them, the human body and the human hand can be regarded as the set of two types of vertices of the bipartite graph respectively. Among them, P1, P2, and P3 respectively represent three first target objects, that is, three human bodies. H1, H2, H3, H4, and H5 respectively represent five second target objects whose types are human hands. The connecting line between any two first target objects and the second target object can be expressed as the degree of matching between the first target object and the second target object.
基于该人体和人手之间的二分图,可以按照匹配度从高到低的顺序为各第一目标对象分配匹配的第二目标对象,其中为每个第一目标对象最多匹配两个第二目标对象,在按照匹配度从高到低的顺序,将一第二目标对象确认为与一第一目标对象匹配时,此时可以不再将该第二目标对象匹配给其余第一目标对象,同时判断该第一目标对象所匹配的第二目标对象的数量是否达到预设数量,如达到,则不再为该第一目标对象匹配其余的第二目标对象,如未达到预设数量,可以基于匹配度从高到低的顺序,执行下一匹配度的第二目标对象与相应的第一目标对象的匹配时,可以确定第二目标对象是否确定为其余第一目标对象所匹配的第二目标对象,以及该第一目标对象所匹配的第二目标对象的数量是否达到预设数量,如第二目标对象未匹配给任何第一目标对象,以及第一目标对象匹配的第二目标对象小于预设数量,则确定为该第一目标对象和第二目标对象匹配。依次类推,针对每个匹配度所对应的第一目标对象和第二目标对象可以重复迭代执行上述过程,直至满足终止条件。其中终止条件可以包括以下至少一种:为每个第一目标对象匹配出相应的第二目标对象、基于匹配度最低的第一目标对象和第二目标对象执行完成上述匹配过程,以及匹配度小于第一阈值。Based on the bipartite graph between the human body and the human hand, each first target object can be assigned matching second target objects in the order of matching degree from high to low, wherein each first target object is matched with at most two second targets Objects, when a second target object is confirmed as matching a first target object according to the order of matching degree, the second target object may no longer be matched to the other first target objects at this time, and at the same time It is judged whether the number of second target objects matched by the first target object reaches the preset number, and if so, the first target object is no longer matched with the remaining second target objects. If the preset number is not reached, it can be based on In the order of the matching degree from high to low, when the second target object of the next matching degree is matched with the corresponding first target object, it can be determined whether the second target object is determined to be the second target matched by the remaining first target objects Object, and whether the number of second target objects matched by the first target object reaches the preset number, for example, the second target object is not matched to any first target object, and the second target object matched by the first target object is smaller than the preset number. If the number is set, it is determined that the first target object matches the second target object. By analogy, for the first target object and the second target object corresponding to each matching degree, the foregoing process can be repeated iteratively until the termination condition is satisfied. The termination condition may include at least one of the following: matching a corresponding second target object for each first target object, executing and completing the above matching process based on the first target object and the second target object with the lowest matching degree, and the matching degree being less than The first threshold.
针对人体和人脸之间的二分图确定第一目标对象匹配的第二目标对象的过程与上述相似,在此不做重复说明。The process of determining the second target object matched by the first target object for the bipartite graph between the human body and the face is similar to the above, and will not be repeated here.
另外,本公开实施例在得到与各第一目标对象匹配的第二目标对象的情况下,可以显示该匹配的第一目标对象和第二目标对象的位置区域。例如,本公开实施例可以利用相同显示状态显示所匹配的第一目标对象和第二目标对象所在的位置区域的边界框,该边界框可以为步骤S10中得到的各位置区域的检测框。在一个示例中,可以按照相同颜色显示匹配的第一目标对象和第二目标对象的位置区域的边界框,但不作为本公开的具体限定。如图2所示,针对每个人物对象,可以利用显示框的线条宽 度区分对应于不同人物对象的人体框、人手框以及人脸框,例如,从而方便的区分匹配结果。In addition, in the embodiment of the present disclosure, when a second target object that matches each first target object is obtained, the location area of the matched first target object and the second target object may be displayed. For example, the embodiment of the present disclosure may use the same display state to display the bounding box of the location area where the matched first target object and the second target object are located, and the bounding box may be the detection frame of each location area obtained in step S10. In an example, the matching bounding boxes of the location areas of the first target object and the second target object may be displayed in the same color, but this is not a specific limitation of the present disclosure. As shown in Figure 2, for each character object, the line width of the display frame can be used to distinguish the human body frame, the hand frame and the face frame corresponding to different character objects, for example, so as to conveniently distinguish the matching results.
基于本公开实施例的上述配置,可以通过建立二分图的方式,选择出与各第一目标对象最为匹配的第二目标对象,提高目标对象之间的匹配精度。Based on the above configuration of the embodiments of the present disclosure, the second target object that best matches each first target object can be selected by establishing a bipartite graph, so as to improve the matching accuracy between the target objects.
如上所述,本公开实施例可以应用在神经网络中,例如可以应用在孪生神经网络中,例如本公开实施例可以通过孪生神经网络执行对所述第一目标对象的位置区域对应的第一图像和所述第二目标对象的位置区域对应的第二图像分别执行特征处理,得到所述第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度。As described above, the embodiment of the present disclosure can be applied to a neural network, for example, can be applied to a twin neural network. For example, the embodiment of the present disclosure can execute the first image corresponding to the location area of the first target object through the twin neural network. The second image corresponding to the position area of the second target object is respectively subjected to feature processing to obtain the degree of matching between the first target object in the first image and the second target object in the second image.
图6示出根据本公开实施例训练孪生神经网络的流程图。其中,训练孪生神经网络的步骤可以包括:Fig. 6 shows a flowchart of training a twin neural network according to an embodiment of the present disclosure. Among them, the steps of training the twin neural network can include:
S51:获得训练样本,所述训练样本包括多个第一训练图像和多个第二训练图像,所述第一训练图像为人体图像,所述第二训练图像为人脸图像或者人手图像;S51: Obtain training samples, where the training samples include multiple first training images and multiple second training images, the first training images are human body images, and the second training images are human face images or human hand images;
在一些可能的实施方式中,其中第一训练图像和第二训练图像可以为从多个图像中截取的图像区域,也可以为通过目标检测的方式从多个图像中识别出的相应类型的目标对象的图像区域,或者也可以为任意的包括人体、人手或者人脸的图像,本公开对此不作具体限定。In some possible implementations, the first training image and the second training image may be image regions captured from multiple images, or they may be corresponding types of targets identified from multiple images by means of target detection. The image area of the object may also be any image including a human body, a human hand, or a human face, which is not specifically limited in the present disclosure.
S52:将所述第一训练图像和所述第二训练图像输入至所述孪生神经网络,得到所述第一训练图像和所述第二训练图像的预测匹配结果;S52: Input the first training image and the second training image to the twin neural network to obtain a predicted matching result of the first training image and the second training image;
在一些可能的实施方式中,通过孪生神经网络执行第一训练图像和第二训练图像的特征提取,以及特征连接、特征融合和分类处理,最终预测得到第一训练图像和第二训练图像之间的匹配度,而后可以根据该匹配度确定第一训练图像和第二训练图像之间的匹配结果。该匹配结果可以表示成第一标识和第二标识,如第一标识为1,第二标识为0,用于表示第一训练图像和第二训练图像匹配或者不匹配的匹配结果。具体可以根据匹配度与第一阈值的比较结果确定匹配结果,如匹配度大于第一阈值,则确定相应的第一训练图像和第二训练图像的匹配结果为匹配,此时可以表示为第一标识,否则表示为第二标识。In some possible implementations, the feature extraction of the first training image and the second training image, as well as feature connection, feature fusion, and classification processing are performed through the twin neural network, and the final prediction is between the first training image and the second training image. Then, the matching result between the first training image and the second training image can be determined according to the matching degree. The matching result can be expressed as a first identifier and a second identifier. For example, the first identifier is 1 and the second identifier is 0, which is used to indicate the matching result of the first training image and the second training image matching or not matching. Specifically, the matching result can be determined according to the comparison result of the matching degree and the first threshold. If the matching degree is greater than the first threshold, it is determined that the matching result of the corresponding first training image and the second training image is a match, which can be expressed as the first ID, otherwise it is the second ID.
S53:基于所述第一训练图像和所述第二训练图像之间的预测匹配结果,调整所述孪生神经网络的网络参数,直至满足训练要求。S53: Based on the predicted matching result between the first training image and the second training image, adjust the network parameters of the twin neural network until the training requirement is met.
本公开实施例中,第一训练图像和第二训练图像的真实匹配结果可以作为监督,进而可以根据第一训练图像和第二训练图像之间的预测匹配结果以及真实匹配结果确定网络损失,该网络损失可以根据两个匹配结果之间的差异确定。In the embodiment of the present disclosure, the real matching result of the first training image and the second training image can be used as supervision, and the network loss can be determined according to the predicted matching result between the first training image and the second training image and the real matching result. The network loss can be determined based on the difference between the two matching results.
在得到网络损失的情况下,可以根据网络损失调整孪生神经网络的参数,如卷积参数等。在得到的网络损失小于损失阈值的情况下,确定满足训练要求,此时可以终止训练,如果得到的网络损失大于或者等于损失阈值,则根据该网络损失调整网络参数,重新预测各第一训练图像和第二训练图像之间的匹配结果,直至得到的网络损失小于损失阈值。其中损失阈值可以为预先设定的值,如可以为1%,但不作为本公开的具体限定,也可以为其他的数值。通过上述方式可以实现孪生神经网络的优化,提高特征处理和匹配精度。When the network loss is obtained, the parameters of the twin neural network, such as convolution parameters, can be adjusted according to the network loss. When the obtained network loss is less than the loss threshold, it is determined that the training requirements are met. At this time, the training can be terminated. If the obtained network loss is greater than or equal to the loss threshold, the network parameters are adjusted according to the network loss and the first training images are re-predicted The matching result between the second training image and the second training image until the obtained network loss is less than the loss threshold. The loss threshold may be a preset value, such as 1%, but it is not a specific limitation of the present disclosure, and may also be other values. Through the above method, the twin neural network can be optimized, and the accuracy of feature processing and matching can be improved.
为了更加清楚的体现本公开实施例,下面举例说明本公开实施例的具体过程。首先可以将从输入图像中的抠出的人体图片和人脸图片/人手图片都调整到一个固定的大小,比如224*224,而后将各图片分别输入到孪生网络的两个特征提取分支中。网络的两个分支分别提取人体和人脸或人手的特征,在两个分支的最后对提取的人体和人脸或人手的特征图进行连接,再进入网络进行二分类打分,分数在0-1之间,如果人体和人脸或人手匹配那么分数就接近1,否则接近0。以图4为例,网络的两个分支分别用resnet18作为提取特征,将得到的特征图并在一起,再经过一个resnet_block卷积层,最后通过一个全连接层进行分类,得到匹配度。而后将点集分为三类——人体,人脸,人手。对人体人脸、人体人手分别建立全连接二分图,相应边的权值为网络输出的分数(匹配度)。对二分图进行规则约束,一个人体最多匹配两个人手,一个人体最多匹配一个人脸。对分数进行排序,利用贪心算法,由高到低依次进行匹配,把多余不合法的边全部去掉,不断迭代直到匹配结束。本公开实施例使用孪生网络可以学习到更多复杂场景下的关联关系。另外,本公开实施例在最终关联的时候使用了二分图对网络 输出的结果进行了约束,使得最终结果的精度更高。In order to more clearly embody the embodiments of the present disclosure, the following examples illustrate the specific process of the embodiments of the present disclosure. First, you can adjust the extracted human body picture and face picture/hand picture from the input image to a fixed size, such as 224*224, and then input each picture into the two feature extraction branches of the Siamese network. The two branches of the network extract the features of the human body and the face or the hand respectively. At the end of the two branches, the extracted feature maps of the human body and the face or the hand are connected, and then enter the network for binary classification and scoring. The score is 0-1 In between, if the human body matches the face or hand, the score is close to 1, otherwise close to 0. Taking Figure 4 as an example, the two branches of the network use resnet18 as the extracted feature, and the obtained feature maps are combined together, and then pass through a resnet_block convolution layer, and finally through a fully connected layer for classification to obtain the matching degree. Then the point set is divided into three categories-human body, human face, and human hand. A fully connected bipartite graph is established for the human face and human hand respectively, and the weight of the corresponding edge is the score (matching degree) output by the network. Rule constraints on the bipartite graph, a human body matches at most two human hands, and a human body matches at most one face. Sort the scores, use the greedy algorithm to match from high to low, remove all the extra illegal edges, and iterate until the end of the match. The embodiments of the present disclosure can learn more association relationships in complex scenarios by using the twin network. In addition, the embodiment of the present disclosure uses a bipartite graph to constrain the network output result during the final association, so that the accuracy of the final result is higher.
综上所述,在本公开实施例中,可以首先获取待匹配的第一目标对象的第一图像和第二目标对象的第二图像,其中第一目标对象可以为人体,第二目标对象可以为人脸和/或人手,而后通过对第一图像和第二图像执行特征处理,可以得到第一图像中第一目标对象和第二图像中第二目标对象的匹配度,进而通过建立二分图的方式确定第一图像中的第一目标对象和第二图像中的第二目标对象的匹配结果。本公开实施例首先检测各第一目标对象和各第二目标对象之间的匹配度,并通过建立二分图的方式对上述检测到的匹配度进行约束,最终确定与第一目标对象匹配的第二目标对象,使得最终关联匹配的结果精度更高。In summary, in the embodiments of the present disclosure, the first image of the first target object and the second image of the second target object to be matched may be acquired first, where the first target object may be a human body, and the second target object may be Is a human face and/or human hand, and then by performing feature processing on the first image and the second image, the degree of matching between the first target object in the first image and the second target object in the second image can be obtained, and then the bipartite graph is established by The method determines the matching result of the first target object in the first image and the second target object in the second image. The embodiment of the present disclosure first detects the matching degree between each first target object and each second target object, and constrains the detected matching degree by establishing a bipartite graph, and finally determines the first target object matching the first target object. Two target objects, making the result of final association matching more accurate.
可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例,限于篇幅,本公开不再赘述。It can be understood that the various method embodiments mentioned in the present disclosure can be combined with each other to form a combined embodiment without violating the principle and logic. The length is limited, and the details of this disclosure will not be repeated.
此外,本公开还提供了目标对象装置、电子设备、计算机可读存储介质、程序,上述均可用来实现本公开提供的任一种目标对象匹配方法,相应技术方案和描述和参见方法部分的相应记载,不再赘述。In addition, the present disclosure also provides target object devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any target object matching method provided in the present disclosure. For the corresponding technical solutions and descriptions, refer to the corresponding methods in the method section. Record, not repeat it.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above-mentioned methods of the specific implementation, the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process. The specific execution order of each step should be based on its function and possibility. The inner logic is determined.
图7示出根据本公开实施例的一种目标对象匹配装置的框图,如图7所示,所述目标对象匹配装置包括:Fig. 7 shows a block diagram of a target object matching device according to an embodiment of the present disclosure. As shown in Fig. 7, the target object matching device includes:
获取模块10,用于获取输入图像中待匹配的第一目标对象和第二目标对象,所述第一目标对象包括人体,所述第二目标对象包括人手和人脸中的至少一种;The acquiring module 10 is configured to acquire a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target object includes at least one of a human hand and a human face;
特征处理模块20,用于对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度;The feature processing module 20 is configured to perform feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain The degree of matching between the first target object and the second target object in the second image;
二分模块30,用于基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图;The dichotomy module 30 is configured to establish the first target object and the second target object based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the target objects;
匹配模块40,用于基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象。The matching module 40 is configured to determine the matched first target object and the second target object based on the bipartite graph between the first target object and the second target object.
在一些可能的实施方式中,所述特征处理模块还用于对所述第一图像和所述第二图像执行特征提取处理,分别得到所述第一图像的第一特征和所述第二图像的第二特征;In some possible implementation manners, the feature processing module is further configured to perform feature extraction processing on the first image and the second image to obtain the first feature of the first image and the second image, respectively The second feature;
对所述第一特征和所述第二特征的连接特征执行分类处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。Perform classification processing on the connection feature of the first feature and the second feature to obtain the degree of matching between the first target object in the first image and the second target object in the second image .
在一些可能的实施方式中,所述特征处理模块还用于对所述第一特征和所述第二特征的连接特征执行特征融合处理,得到融合特征;In some possible implementation manners, the feature processing module is further configured to perform feature fusion processing on the connection feature of the first feature and the second feature to obtain a fusion feature;
将所述融合特征输入至全连接层执行所述分类处理,得到所述第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度。The fusion feature is input to the fully connected layer to perform the classification process, and the degree of matching between the first target object in the first image and the second target object in the second image is obtained.
在一些可能的实施方式中,所述二分模块还用于在所述第二目标对象仅包括人脸的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图;In some possible implementation manners, the dichotomy module is further configured to, when the second target object only includes a human face, based on the first target object and the second image in the first image The matching degree of the second target object in, establishing a bipartite graph between the human body and the human face;
载所述第二目标对象仅包括人手的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人手之间的二分图;In the case where the second target object only includes a human hand, based on the degree of matching between the first target object in the first image and the second target object in the second image, the human body and the human hand are established Bipartite graph between
在所述第二目标对象包括人脸和人手的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图以及人体和人手之间的二分图;In the case that the second target object includes a human face and a human hand, a human body is established based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the human body and the human face and the bipartite graph between the human body and the human hand;
其中,将人体和人脸之间的匹配度作为所述人体和人脸之间的二分图中人体和人脸之间的连接权值,以及将人体和人手之间的匹配度作为所述人体和人手之间的二分图中人体和人手之间的连接权 值。Wherein, the matching degree between the human body and the human face is used as the connection weight between the human body and the human face in the bipartite graph between the human body and the human face, and the matching degree between the human body and the human hand is used as the human body The weight of the connection between the human body and the human hand in the bipartite graph between the human body and the human hand.
在一些可能的实施方式中,所述二分模块还用于基于匹配度大于第一阈值的第一目标对象和第二目标对象,建立所述第一目标对象和第二目标对象之间的二分图。In some possible implementation manners, the bipartite module is further configured to establish a bipartite graph between the first target object and the second target object based on the first target object and the second target object whose matching degree is greater than a first threshold. .
在一些可能的实施方式中,所述匹配模块还用于基于所述第一目标对象和所述第二目标对象之间的二分图,利用贪吃算法,按照所述第一目标对象和所述第二目标对象的匹配度从高到低的顺序,将与所述第一目标对象最匹配的预设数量个所述第二目标对象作为与所述第一目标对象匹配的第二目标对象。In some possible implementation manners, the matching module is further configured to use a greedy algorithm based on the bipartite graph between the first target object and the second target object, according to the first target object and the The matching degree of the second target object is in descending order, and a preset number of the second target objects that best match the first target object are used as the second target objects that match the first target object.
在一些可能的实施方式中,所述匹配模块还用于在所述第一目标对象和所述第二目标对象之间的二分图包括人体和人脸之间的二分图的情况下,利用贪心算法,选择出与所述第一目标对象最匹配的类型为人脸的第二目标对象。In some possible implementation manners, the matching module is further configured to use greedy when the bipartite graph between the first target object and the second target object includes a bipartite graph between a human body and a face. The algorithm selects the second target object whose type is the face that best matches the first target object.
在一些可能的实施方式中,所述匹配模块还用于在任一第一目标对象确定出匹配的预设数量个第二目标对象的情况下,不再为所述第一目标对象匹配其余第二目标对象,以及In some possible implementation manners, the matching module is further configured to no longer match the first target object with the remaining second target objects in the case that any first target object determines a preset number of matching second target objects. Target audience, and
在任一第二目标对象确定出匹配的第一目标对象的情况下,不再为所述第二目标对象匹配其余第一目标对象。In the case that any second target object determines a matching first target object, no other first target objects are matched for the second target object.
在一些可能的实施方式中,所述获取模块获取输入图像中待匹配的第一目标对象和第二目标对象,包括以下方式中的至少一种:In some possible implementation manners, the acquiring module acquiring the first target object and the second target object to be matched in the input image includes at least one of the following methods:
基于检测到的针对输入图像中所述第一目标对象和所述第二目标对象的框选操作,确定所述输入图像中的所述第一目标对象和所述第二目标对象;Determine the first target object and the second target object in the input image based on the detected frame selection operations on the first target object and the second target object in the input image;
利用目标检测神经网络检测所述输入图像中的所述第一目标对象和所述第二目标对象;Using a target detection neural network to detect the first target object and the second target object in the input image;
接收输入图像中所述第一目标对象和第二目标对象所在的位置信息,基于所述位置信息确定所述输入图像中的所述第一目标对象和第二目标对象。Receive location information where the first target object and the second target object in the input image are located, and determine the first target object and the second target object in the input image based on the location information.
在一些可能的实施方式中,所述特征处理模块还用于在对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理之前,将所述第一图像和所述第二图像分别调整为预设规格,并且,In some possible implementation manners, the feature processing module is further configured to perform separate operations on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image. Before feature processing, the first image and the second image are adjusted to preset specifications respectively, and,
所述对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:The feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first image in the first image. The degree of matching between the target object and the second target object in the second image includes:
对所述调整为预设规格的所述第一图像和所述第二图像执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的第二目标对象的匹配度。Perform feature processing on the first image and the second image adjusted to the preset specifications to obtain the first target object in the first image and the second target object in the second image The matching degree.
在一些可能的实施方式中,所述装置还包括显示模块,用于在所述输入图像中显示匹配的所述第一目标对象和所述第二目标对象。In some possible implementation manners, the device further includes a display module configured to display the matched first target object and the second target object in the input image.
在一些可能的实施方式中,所述特征处理模块还用于通过孪生神经网络执行所述对所述第一目标对象对应的第一图像和所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。In some possible implementation manners, the feature processing module is further configured to perform the feature respectively on the first image corresponding to the first target object and the second image corresponding to the second target object through a twin neural network. Processing to obtain the degree of matching between the first target object in the first image and the second target object in the second image.
在一些可能的实施方式中,所述装置还包括训练模块,用于训练所述孪生神经网络,其中训练所述孪生神经网络的步骤包括:获得训练样本,所述训练样本包括多个第一训练图像和多个第二训练图像,所述第一训练图像为人体图像,所述第二训练图像为人脸图像或者人手图像;In some possible implementation manners, the device further includes a training module for training the twin neural network, wherein the step of training the twin neural network includes: obtaining training samples, the training samples including a plurality of first training An image and a plurality of second training images, where the first training image is a human body image, and the second training image is a human face image or a human hand image;
将所述第一训练图像和所述第二训练图像输入至所述孪生神经网络,得到所述第一训练图像和所述第二训练图像的预测匹配结果;Inputting the first training image and the second training image to the twin neural network to obtain a predicted matching result of the first training image and the second training image;
基于所述第一训练图像和所述第二训练图像之间的预测匹配结果,确定网络损失,并根据所述网络损失调整所述孪生神经网络的网络参数,直至满足训练要求。Based on the predicted matching result between the first training image and the second training image, the network loss is determined, and the network parameters of the twin neural network are adjusted according to the network loss until the training requirement is met.
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。In some embodiments, the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments. For specific implementation, refer to the description of the above method embodiments. For brevity, here No longer.
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。计算机可读存储介质可以是易失性存储介质或非易失性计算机 可读存储介质。The embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor. The computer-readable storage medium may be a volatile storage medium or a non-volatile computer-readable storage medium.
本公开实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为上述方法。An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured as the above method.
电子设备可以被提供为终端、服务器或其它形态的设备。The electronic device can be provided as a terminal, server or other form of device.
本公开实施例还提供了一种计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述方法。The embodiment of the present disclosure also provides a computer-readable code. When the computer-readable code runs in an electronic device, the processor in the electronic device executes the above-mentioned method.
图8示出根据本公开实施例的一种电子设备的框图。例如,电子设备800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等终端。Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
参照图8,电子设备800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806,多媒体组件808,音频组件810,输入/输出(I/O)的接口812,传感器组件814,以及通信组件816。8, the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
处理组件802通常控制电子设备800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。The processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method. In addition, the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components. For example, the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method operating on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc. The memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
电源组件806为电子设备800的各种组件提供电力。电源组件806可以包括电源管理***,一个或多个电源,及其他与为电子设备800生成、管理和分配电力相关联的组件。The power supply component 806 provides power for various components of the electronic device 800. The power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
多媒体组件808包括在所述电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当电子设备800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜***或具有焦距和光学变焦能力。The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当电子设备800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal. The received audio signal may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, the audio component 810 further includes a speaker for outputting audio signals.
I/O接口812为处理组件802和***接口模块之间提供接口,上述***接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module. The above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include but are not limited to: home button, volume button, start button, and lock button.
传感器组件814包括一个或多个传感器,用于为电子设备800提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备800的打开/关闭状态,组件的相对定位,例如所述组件为电子设备800的显示器和小键盘,传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变,用户与电子设备800接触的存在或不存在,电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。The sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation. For example, the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components. For example, the component is the display and the keypad of the electronic device 800. The sensor component 814 can also detect the electronic device 800 or the electronic device 800. The position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800. The sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact. The sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信 组件816经由广播信道接收来自外部广播管理***的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,电子设备800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器804,上述计算机程序指令可由电子设备800的处理器820执行以完成上述方法。In an exemplary embodiment, there is also provided a non-volatile computer-readable storage medium, such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
图9示出根据本公开实施例的另一种电子设备的框图。例如,电子设备1900可以被提供为一服务器。参照图9,电子设备1900包括处理组件1922,其进一步包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法。Fig. 9 shows a block diagram of another electronic device according to an embodiment of the present disclosure. For example, the electronic device 1900 may be provided as a server. 9, the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932 for storing instructions executable by the processing component 1922, such as application programs. The application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 1922 is configured to execute instructions to perform the above-described methods.
电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理,一个有线或无线网络接口1950被配置为将电子设备1900连接到网络,和一个输入输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作***,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。The electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to the network, and an input output (I/O) interface 1958 . The electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1932,上述计算机程序指令可由电子设备1900的处理组件1922执行以完成上述方法。In an exemplary embodiment, a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
本公开可以是***、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。The present disclosure may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。The computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon The protruding structure in the hole card or the groove, and any suitable combination of the above. The computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可 编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。The computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages. Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages. Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to access the Internet). connection). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions. The computer-readable program instructions are executed to realize various aspects of the present disclosure.
这里参照根据本公开实施例的方法、装置(***)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Various aspects of the present disclosure are described herein with reference to flowcharts and/or block diagrams of methods, apparatuses (systems) and computer program products according to embodiments of the present disclosure. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions on a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , So that the instructions executed on the computer, other programmable data processing apparatus, or other equipment realize the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本公开的多个实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings show the possible implementation architecture, functions, and operations of the system, method, and computer program product according to multiple embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function. Executable instructions. In some alternative implementations, the functions marked in the block may also occur in a different order than the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。The embodiments of the present disclosure have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the described embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements of the technologies in the market, or to enable other ordinary skilled in the art to understand the embodiments disclosed herein.

Claims (29)

  1. 一种目标对象匹配方法,其特征在于,包括:A target object matching method is characterized in that it includes:
    获取输入图像中待匹配的第一目标对象和第二目标对象,所述第一目标对象包括人体,所述第二目标对象包括人手和人脸中的至少一种;Acquiring a first target object and a second target object to be matched in the input image, the first target object includes a human body, and the second target object includes at least one of a human hand and a human face;
    对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度;Perform feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first target object in the first image The degree of matching with the second target object in the second image;
    基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图;Based on the degree of matching between the first target object in the first image and the second target object in the second image, a dichotomy between the first target object and the second target object is established Figure;
    基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象。Based on the bipartite graph between the first target object and the second target object, the matched first target object and the second target object are determined.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:The method according to claim 1, wherein the first image corresponding to the first target object and the second image corresponding to the second target object in the input image are respectively subjected to feature processing , Obtaining the degree of matching between the first target object in the first image and the second target object in the second image includes:
    对所述第一图像和所述第二图像执行特征提取处理,分别得到所述第一图像的第一特征和所述第二图像的第二特征;Performing feature extraction processing on the first image and the second image to obtain the first feature of the first image and the second feature of the second image respectively;
    对所述第一特征和所述第二特征的连接特征执行分类处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。Perform classification processing on the connection feature of the first feature and the second feature to obtain the degree of matching between the first target object in the first image and the second target object in the second image .
  3. 根据权利要求2所述的方法,其特征在于,所述对所述第一特征和所述第二特征的连接特征执行分类处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:The method according to claim 2, wherein the classification process is performed on the connection feature of the first feature and the second feature to obtain the first target object and the first feature in the first image. The matching degree of the second target object in the second image includes:
    对所述第一特征和所述第二特征的连接特征执行特征融合处理,得到融合特征;Perform feature fusion processing on the connection feature of the first feature and the second feature to obtain a fusion feature;
    将所述融合特征输入至全连接层执行所述分类处理,得到所述第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度。The fusion feature is input to the fully connected layer to perform the classification process, and the degree of matching between the first target object in the first image and the second target object in the second image is obtained.
  4. 根据权利要求1-3中任意一项所述的方法,其特征在于,所述基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图,包括:The method according to any one of claims 1 to 3, wherein the method based on the first target object in the first image and the second target object in the second image The matching degree, establishing a bipartite graph between the first target object and the second target object, includes:
    响应于所述第二目标对象仅包括人脸,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图;In response to the second target object including only a human face, based on the degree of matching between the first target object in the first image and the second target object in the second image, a human body and a human face are established Bipartite graph between
    响应于所述第二目标对象仅包括人手,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人手之间的二分图;In response to the second target object including only a human hand, based on the degree of matching between the first target object in the first image and the second target object in the second image, a relationship between the human body and the human hand is established Bipartite graph
    响应于所述第二目标对象包括人脸和人手,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图以及人体和人手之间的二分图;In response to the second target object including a human face and a human hand, based on the degree of matching between the first target object in the first image and the second target object in the second image, the human body and the human are established The bipartite graph between the face and the bipartite graph between the human body and the human hand;
    其中,将人体和人脸之间的匹配度作为所述人体和人脸之间的二分图中人体和人脸之间的连接权值,以及将人体和人手之间的匹配度作为所述人体和人手之间的二分图中人体和人手之间的连接权值。Wherein, the matching degree between the human body and the human face is used as the connection weight between the human body and the human face in the bipartite graph between the human body and the human face, and the matching degree between the human body and the human hand is used as the human body The weight of the connection between the human body and the human hand in the bipartite graph between the human body and the human hand.
  5. 根据权利要求1-4中任意一项所述的方法,其特征在于,所述基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图,包括:The method according to any one of claims 1 to 4, wherein the method based on the first target object in the first image and the second target object in the second image The matching degree, establishing a bipartite graph between the first target object and the second target object, includes:
    基于匹配度大于第一阈值的第一目标对象和第二目标对象,建立所述第一目标对象和第二目标对象之间的二分图。Based on the first target object and the second target object whose matching degree is greater than the first threshold, a bipartite graph between the first target object and the second target object is established.
  6. 根据权利要求1-5中任意一项所述的方法,其特征在于,所述基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象,包括:The method according to any one of claims 1-5, wherein the first target object and the second target object that are matched are determined based on the bipartite graph between the first target object and the second target object. Two target objects, including:
    基于所述第一目标对象和所述第二目标对象之间的二分图,利用贪吃算法,按照所述第一目标对象和所述第二目标对象的匹配度从高到低的顺序,将与所述第一目标对象最匹配的预设数量个所述第二目标对象作为与所述第一目标对象匹配的第二目标对象。Based on the bipartite graph between the first target object and the second target object, using a greedy algorithm, according to the order of the matching degree between the first target object and the second target object from high to low, A preset number of the second target objects that best match the first target object are used as second target objects that match the first target object.
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象,还包括;The method according to claim 6, wherein the determining a matching first target object and a second target object based on the bipartite graph between the first target object and the second target object, further comprising ;
    响应于所述第一目标对象和所述第二目标对象之间的二分图包括人体和人手之间的二分图,利用贪心算法,选择出与所述第一目标对象最匹配的至多两个类型为人手的第二目标对象;In response to the bipartite graph between the first target object and the second target object including a bipartite graph between a human body and a human hand, using a greedy algorithm to select at most two types that best match the first target object Be the second target of manpower;
    响应于所述第一目标对象和所述第二目标对象之间的二分图包括人体和人脸之间的二分图,利用贪心算法,选择出与所述第一目标对象最匹配的类型为人脸的第二目标对象。In response to the bipartite graph between the first target object and the second target object including a bipartite graph between a human body and a human face, a greedy algorithm is used to select the type that best matches the first target object as a human face The second target object.
  8. 根据权利要求6或7所述的方法,其特征在于,所述基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象,还包括:The method according to claim 6 or 7, wherein the first target object and the second target object that are matched are determined based on the bipartite graph between the first target object and the second target object, Also includes:
    响应于任一第一目标对象确定出匹配的预设数量个第二目标对象,不再为所述第一目标对象匹配其余第二目标对象,以及In response to determining a preset number of matched second target objects in response to any first target object, no longer matching the remaining second target objects for the first target object, and
    响应于任一第二目标对象确定出匹配的第一目标对象,不再为所述第二目标对象匹配其余第一目标对象。In response to determining a matched first target object in response to any second target object, no other first target objects are matched for the second target object.
  9. 根据权利要求1-8中任意一项所述的方法,其特征在于,所述获取输入图像中待匹配的第一目标对象和第二目标对象,包括以下方式中的至少一种:8. The method according to any one of claims 1-8, wherein the acquiring the first target object and the second target object to be matched in the input image includes at least one of the following methods:
    基于检测到的针对输入图像中所述第一目标对象和所述第二目标对象的框选操作,确定所述输入图像中的所述第一目标对象和所述第二目标对象;Determine the first target object and the second target object in the input image based on the detected frame selection operations on the first target object and the second target object in the input image;
    利用目标检测神经网络检测所述输入图像中的所述第一目标对象和所述第二目标对象;Using a target detection neural network to detect the first target object and the second target object in the input image;
    接收输入图像中所述第一目标对象和第二目标对象所在的位置信息,基于所述位置信息确定所述输入图像中的所述第一目标对象和第二目标对象。Receive location information where the first target object and the second target object in the input image are located, and determine the first target object and the second target object in the input image based on the location information.
  10. 根据权利要求1-9中任意一项所述的方法,其特征在于,在对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理之前,所述方法还包括:The method according to any one of claims 1-9, wherein, in the input image, a first image corresponding to the first target object and a second image corresponding to the second target object are compared. Before performing feature processing on the images respectively, the method further includes:
    将所述第一图像和所述第二图像分别调整为预设规格,并且,The first image and the second image are adjusted to preset specifications respectively, and,
    所述对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:The feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first image in the first image. The degree of matching between the target object and the second target object in the second image includes:
    对所述调整为预设规格的所述第一图像和所述第二图像执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的第二目标对象的匹配度。Perform feature processing on the first image and the second image adjusted to the preset specifications to obtain the first target object in the first image and the second target object in the second image The matching degree.
  11. 根据权利要求1-10中任意一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-10, wherein the method further comprises:
    在所述输入图像中显示匹配的所述第一目标对象和所述第二目标对象。The matched first target object and the second target object are displayed in the input image.
  12. 根据权利要求1-11中任意一项所述的方法,其特征在于,所述方法还包括,通过孪生神经网络执行所述对所述第一目标对象对应的第一图像和所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。The method according to any one of claims 1-11, wherein the method further comprises performing the first image and the second target corresponding to the first target object through a twin neural network The second image corresponding to the object performs feature processing respectively to obtain the degree of matching between the first target object in the first image and the second target object in the second image.
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括训练所述孪生神经网络的步骤,其包括:The method according to claim 12, wherein the method further comprises the step of training the twin neural network, which comprises:
    获得训练样本,所述训练样本包括多个第一训练图像和多个第二训练图像,所述第一训练图像为人体图像,所述第二训练图像为人脸图像或者人手图像;Obtaining training samples, the training samples including a plurality of first training images and a plurality of second training images, the first training images are human body images, and the second training images are human face images or human hand images;
    将所述第一训练图像和所述第二训练图像输入至所述孪生神经网络,得到所述第一训练图像和所述第二训练图像的预测匹配结果;Inputting the first training image and the second training image to the twin neural network to obtain a predicted matching result of the first training image and the second training image;
    基于所述第一训练图像和所述第二训练图像之间的预测匹配结果,确定网络损失,并根据所述网络损失调整所述孪生神经网络的网络参数,直至满足训练要求。Based on the predicted matching result between the first training image and the second training image, the network loss is determined, and the network parameters of the twin neural network are adjusted according to the network loss until the training requirement is met.
  14. 一种目标对象匹配装置,其特征在于,包括:A target object matching device is characterized in that it comprises:
    获取模块,用于获取输入图像中待匹配的第一目标对象和第二目标对象,所述第一目标对象包括人体,所述第二目标对象包括人手和人脸中的至少一种;An acquiring module, configured to acquire a first target object and a second target object to be matched in an input image, the first target object includes a human body, and the second target object includes at least one of a human hand and a human face;
    特征处理模块,用于对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对 象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度;The feature processing module is configured to perform feature processing on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image to obtain the The degree of matching between the first target object and the second target object in the second image;
    二分模块,用于基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立所述第一目标对象和所述第二目标对象之间的二分图;A dichotomy module, configured to establish the first target object and the second target based on the degree of matching between the first target object in the first image and the second target object in the second image Bipartite graph between objects;
    匹配模块,用于基于所述第一目标对象和所述第二目标对象之间的二分图,确定匹配的第一目标对象和第二目标对象。The matching module is configured to determine the matched first target object and second target object based on the bipartite graph between the first target object and the second target object.
  15. 根据权利要求14所述的装置,其特征在于,所述特征处理模块还用于对所述第一图像和所述第二图像执行特征提取处理,分别得到所述第一图像的第一特征和所述第二图像的第二特征;The device according to claim 14, wherein the feature processing module is further configured to perform feature extraction processing on the first image and the second image to obtain the first feature and the second image of the first image, respectively. The second feature of the second image;
    对所述第一特征和所述第二特征的连接特征执行分类处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。Perform classification processing on the connection feature of the first feature and the second feature to obtain the degree of matching between the first target object in the first image and the second target object in the second image .
  16. 根据权利要求15所述的装置,其特征在于,所述特征处理模块还用于对所述第一特征和所述第二特征的连接特征执行特征融合处理,得到融合特征;The device according to claim 15, wherein the feature processing module is further configured to perform feature fusion processing on the connection feature of the first feature and the second feature to obtain the fused feature;
    将所述融合特征输入至全连接层执行所述分类处理,得到所述第一图像中的第一目标对象和第二图像中的第二目标对象的匹配度。The fusion feature is input to the fully connected layer to perform the classification process, and the degree of matching between the first target object in the first image and the second target object in the second image is obtained.
  17. 根据权利要求14-16中任意一项所述的装置,其特征在于,所述二分模块还用于在所述第二目标对象仅包括人脸的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图;The device according to any one of claims 14-16, wherein the dichotomy module is further configured to, in the case that the second target object only includes a human face, based on all the data in the first image The degree of matching between the first target object and the second target object in the second image to establish a bipartite graph between the human body and the face;
    载所述第二目标对象仅包括人手的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人手之间的二分图;In the case where the second target object only includes a human hand, based on the degree of matching between the first target object in the first image and the second target object in the second image, the human body and the human hand are established Bipartite graph between
    在所述第二目标对象包括人脸和人手的情况下,基于所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,建立人体和人脸之间的二分图以及人体和人手之间的二分图;In the case that the second target object includes a human face and a human hand, a human body is established based on the degree of matching between the first target object in the first image and the second target object in the second image The bipartite graph between the human body and the human face and the bipartite graph between the human body and the human hand;
    其中,将人体和人脸之间的匹配度作为所述人体和人脸之间的二分图中人体和人脸之间的连接权值,以及将人体和人手之间的匹配度作为所述人体和人手之间的二分图中人体和人手之间的连接权值。Wherein, the matching degree between the human body and the human face is used as the connection weight between the human body and the human face in the bipartite graph between the human body and the human face, and the matching degree between the human body and the human hand is used as the human body The weight of the connection between the human body and the human hand in the bipartite graph between the human body and the human hand.
  18. 根据权利要求14-17中任意一项所述的装置,其特征在于,所述二分模块还用于基于匹配度大于第一阈值的第一目标对象和第二目标对象,建立所述第一目标对象和第二目标对象之间的二分图。The device according to any one of claims 14-17, wherein the dichotomy module is further configured to establish the first target based on the first target object and the second target object whose matching degree is greater than a first threshold The bipartite graph between the object and the second target object.
  19. 根据权利要求14-18中任意一项所述的装置,其特征在于,所述匹配模块还用于基于所述第一目标对象和所述第二目标对象之间的二分图,利用贪吃算法,按照所述第一目标对象和所述第二目标对象的匹配度从高到低的顺序,将与所述第一目标对象最匹配的预设数量个所述第二目标对象作为与所述第一目标对象匹配的第二目标对象。The device according to any one of claims 14-18, wherein the matching module is further configured to use a greedy algorithm based on the bipartite graph between the first target object and the second target object , According to the order of the degree of matching between the first target object and the second target object from high to low, the preset number of the second target objects that best match the first target object are used as the The second target object matched by the first target object.
  20. 根据权利要求19所述的装置,其特征在于,所述匹配模块还用于在所述第一目标对象和所述第二目标对象之间的二分图包括人体和人脸之间的二分图的情况下,利用贪心算法,选择出与所述第一目标对象最匹配的类型为人脸的第二目标对象。The device according to claim 19, wherein the matching module is further configured to determine the bipartite graph between the first target object and the second target object including a bipartite graph between a human body and a face. In this case, a greedy algorithm is used to select the second target object whose type is the face that best matches the first target object.
  21. 根据权利要求19或20所述的装置,其特征在于,所述匹配模块还用于在任一第一目标对象确定出匹配的预设数量个第二目标对象的情况下,不再为所述第一目标对象匹配其余第二目标对象,以及The device according to claim 19 or 20, wherein the matching module is further configured to no longer be the first target object when any first target object determines a preset number of matching second target objects One target object matches the remaining second target objects, and
    在任一第二目标对象确定出匹配的第一目标对象的情况下,不再为所述第二目标对象匹配其余第一目标对象。In the case that any second target object determines a matching first target object, no other first target objects are matched for the second target object.
  22. 根据权利要求14-21中任意一项所述的装置,其特征在于,所述获取模块获取输入图像中待匹配的第一目标对象和第二目标对象,包括以下方式中的至少一种:The device according to any one of claims 14-21, wherein the acquiring module acquires the first target object and the second target object to be matched in the input image, including at least one of the following methods:
    基于检测到的针对输入图像中所述第一目标对象和所述第二目标对象的框选操作,确定所述输入图像中的所述第一目标对象和所述第二目标对象;Determine the first target object and the second target object in the input image based on the detected frame selection operations on the first target object and the second target object in the input image;
    利用目标检测神经网络检测所述输入图像中的所述第一目标对象和所述第二目标对象;Using a target detection neural network to detect the first target object and the second target object in the input image;
    接收输入图像中所述第一目标对象和第二目标对象所在的位置信息,基于所述位置信息确定所述 输入图像中的所述第一目标对象和第二目标对象。Receiving position information where the first target object and the second target object in the input image are located, and determining the first target object and the second target object in the input image based on the position information.
  23. 根据权利要求14-22中任意一项所述的装置,其特征在于,所述特征处理模块还用于在对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理之前,将所述第一图像和所述第二图像分别调整为预设规格,并且,The device according to any one of claims 14-22, wherein the feature processing module is further configured to compare the first image corresponding to the first target object in the input image and the Before performing feature processing on the second image corresponding to the second target object, respectively, the first image and the second image are adjusted to preset specifications, and,
    所述对所述输入图像中与所述第一目标对象对应的第一图像和与所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度,包括:The feature processing is performed on the first image corresponding to the first target object and the second image corresponding to the second target object in the input image, respectively, to obtain the first image in the first image. The degree of matching between the target object and the second target object in the second image includes:
    对所述调整为预设规格的所述第一图像和所述第二图像执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的第二目标对象的匹配度。Perform feature processing on the first image and the second image adjusted to the preset specifications to obtain the first target object in the first image and the second target object in the second image The matching degree.
  24. 根据权利要求14-23中任意一项所述的装置,其特征在于,所述装置还包括显示模块,用于在所述输入图像中显示匹配的所述第一目标对象和所述第二目标对象。The device according to any one of claims 14-23, wherein the device further comprises a display module for displaying the matched first target object and the second target in the input image Object.
  25. 根据权利要求14-24中任意一项所述的装置,其特征在于,所述特征处理模块还用于通过孪生神经网络执行所述对所述第一目标对象对应的第一图像和所述第二目标对象对应的第二图像分别执行特征处理,得到所述第一图像中的所述第一目标对象和所述第二图像中的所述第二目标对象的匹配度。The device according to any one of claims 14-24, wherein the feature processing module is further configured to execute the first image corresponding to the first target object and the first image through a twin neural network. The second images corresponding to the two target objects are respectively subjected to feature processing to obtain the degree of matching between the first target object in the first image and the second target object in the second image.
  26. 根据权利要求25所述的装置,其特征在于,所述装置还包括训练模块,用于训练所述孪生神经网络,其中训练所述孪生神经网络的步骤包括:获得训练样本,所述训练样本包括多个第一训练图像和多个第二训练图像,所述第一训练图像为人体图像,所述第二训练图像为人脸图像或者人手图像;The device according to claim 25, wherein the device further comprises a training module for training the twin neural network, wherein the step of training the twin neural network comprises: obtaining training samples, the training samples comprising A plurality of first training images and a plurality of second training images, the first training image is a human body image, and the second training image is a human face image or a human hand image;
    将所述第一训练图像和所述第二训练图像输入至所述孪生神经网络,得到所述第一训练图像和所述第二训练图像的预测匹配结果;Inputting the first training image and the second training image to the twin neural network to obtain a predicted matching result of the first training image and the second training image;
    基于所述第一训练图像和所述第二训练图像之间的预测匹配结果,确定网络损失,并根据所述网络损失调整所述孪生神经网络的网络参数,直至满足训练要求。Based on the predicted matching result between the first training image and the second training image, the network loss is determined, and the network parameters of the twin neural network are adjusted according to the network loss until the training requirement is met.
  27. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器;processor;
    用于存储处理器可执行指令的存储器;A memory for storing processor executable instructions;
    其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至13中任意一项所述的方法。Wherein, the processor is configured to call instructions stored in the memory to execute the method according to any one of claims 1 to 13.
  28. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至13中任意一项所述的方法。A computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method according to any one of claims 1 to 13 when the computer program instructions are executed by a processor.
  29. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现如权利要求1-13中的任一项所述的方法。A computer program, comprising computer-readable code, characterized in that, when the computer-readable code is run in an electronic device, a processor in the electronic device executes for implementing any of claims 1-13. The method described in one item.
PCT/CN2020/092332 2019-09-18 2020-05-26 Target object matching method and apparatus, electronic device and storage medium WO2021051857A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022504597A JP7262659B2 (en) 2019-09-18 2020-05-26 Target object matching method and device, electronic device and storage medium
SG11202110892SA SG11202110892SA (en) 2019-09-18 2020-05-26 Target object matching method and apparatus, electronic device and storage medium
KR1020227011057A KR20220053670A (en) 2019-09-18 2020-05-26 Target-object matching method and apparatus, electronic device and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910882691.5 2019-09-18
CN201910882691.5A CN110674719B (en) 2019-09-18 2019-09-18 Target object matching method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021051857A1 true WO2021051857A1 (en) 2021-03-25

Family

ID=69076784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092332 WO2021051857A1 (en) 2019-09-18 2020-05-26 Target object matching method and apparatus, electronic device and storage medium

Country Status (6)

Country Link
JP (1) JP7262659B2 (en)
KR (1) KR20220053670A (en)
CN (1) CN110674719B (en)
SG (1) SG11202110892SA (en)
TW (1) TWI747325B (en)
WO (1) WO2021051857A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113205138A (en) * 2021-04-30 2021-08-03 四川云从天府人工智能科技有限公司 Human face and human body matching method, equipment and storage medium
US20210406614A1 (en) * 2020-06-30 2021-12-30 The Nielsen Company (Us), Llc Methods, systems, articles of manufacture, and apparatus to classify labels based on images using artificial intelligence
CN115731436A (en) * 2022-09-21 2023-03-03 东南大学 Highway vehicle image retrieval method based on deep learning fusion model

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674719B (en) * 2019-09-18 2022-07-26 北京市商汤科技开发有限公司 Target object matching method and device, electronic equipment and storage medium
CN111476214A (en) * 2020-05-21 2020-07-31 北京爱笔科技有限公司 Image area matching method and related device
CN111680646B (en) * 2020-06-11 2023-09-22 北京市商汤科技开发有限公司 Action detection method and device, electronic equipment and storage medium
KR20220098309A (en) * 2020-12-29 2022-07-12 센스타임 인터내셔널 피티이. 리미티드. Object detection method, apparatus and electronic device
JP2023511242A (en) * 2020-12-31 2023-03-17 商▲湯▼国▲際▼私人有限公司 METHOD, APPARATUS, DEVICE AND RECORDING MEDIUM FOR RELATED OBJECT DETECTION IN IMAGE
CN112801141B (en) * 2021-01-08 2022-12-06 吉林大学 Heterogeneous image matching method based on template matching and twin neural network optimization
WO2022195338A1 (en) * 2021-03-17 2022-09-22 Sensetime International Pte. Ltd. Methods, apparatuses, devices and storage media for detecting correlated objects involved in image
AU2021204584A1 (en) * 2021-03-17 2022-10-06 Sensetime International Pte. Ltd. Methods, apparatuses, devices and storage media for detecting correlated objects involved in image
CN114051632A (en) 2021-06-22 2022-02-15 商汤国际私人有限公司 Human body and human hand association method, device, equipment and storage medium
WO2022096957A1 (en) * 2021-06-22 2022-05-12 Sensetime International Pte. Ltd. Body and hand association method and apparatus, device, and storage medium
CN115827925A (en) * 2023-02-21 2023-03-21 中国第一汽车股份有限公司 Target association method and device, electronic equipment and storage medium
CN116309449B (en) * 2023-03-14 2024-04-09 浙江医准智能科技有限公司 Image processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100103221A (en) * 2009-03-13 2010-09-27 노틸러스효성 주식회사 Automatic teller machine for preventing illegal finance transaction and method of controlling the same
CN108509896A (en) * 2018-03-28 2018-09-07 腾讯科技(深圳)有限公司 A kind of trace tracking method, device and storage medium
CN109657524A (en) * 2017-10-11 2019-04-19 高德信息技术有限公司 A kind of image matching method and device
CN110674719A (en) * 2019-09-18 2020-01-10 北京市商汤科技开发有限公司 Target object matching method and device, electronic equipment and storage medium

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011070629A (en) 2009-08-25 2011-04-07 Dainippon Printing Co Ltd Advertising effect measurement system and advertising effect measurement device
US8564534B2 (en) * 2009-10-07 2013-10-22 Microsoft Corporation Human tracking system
US8543598B2 (en) 2010-03-01 2013-09-24 Microsoft Corporation Semantic object characterization and search
CN104143076B (en) * 2013-05-09 2016-08-03 腾讯科技(深圳)有限公司 The matching process of face shape and system
US20190213797A1 (en) * 2018-01-07 2019-07-11 Unchartedvr Inc. Hybrid hand tracking of participants to create believable digital avatars
JP7094702B2 (en) 2018-01-12 2022-07-04 キヤノン株式会社 Image processing device and its method, program
CN110110189A (en) * 2018-02-01 2019-08-09 北京京东尚科信息技术有限公司 Method and apparatus for generating information
CN108388888B (en) * 2018-03-23 2022-04-05 腾讯科技(深圳)有限公司 Vehicle identification method and device and storage medium
CN109190454A (en) 2018-07-17 2019-01-11 北京新唐思创教育科技有限公司 The method, apparatus, equipment and medium of target person in video for identification
CN109740516B (en) * 2018-12-29 2021-05-14 深圳市商汤科技有限公司 User identification method and device, electronic equipment and storage medium
CN110070005A (en) * 2019-04-02 2019-07-30 腾讯科技(深圳)有限公司 Images steganalysis method, apparatus, storage medium and electronic equipment
CN110427908A (en) 2019-08-08 2019-11-08 北京百度网讯科技有限公司 A kind of method, apparatus and computer readable storage medium of person detecting
CN111275002A (en) 2020-02-18 2020-06-12 上海商汤临港智能科技有限公司 Image processing method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100103221A (en) * 2009-03-13 2010-09-27 노틸러스효성 주식회사 Automatic teller machine for preventing illegal finance transaction and method of controlling the same
CN109657524A (en) * 2017-10-11 2019-04-19 高德信息技术有限公司 A kind of image matching method and device
CN108509896A (en) * 2018-03-28 2018-09-07 腾讯科技(深圳)有限公司 A kind of trace tracking method, device and storage medium
CN110674719A (en) * 2019-09-18 2020-01-10 北京市商汤科技开发有限公司 Target object matching method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210406614A1 (en) * 2020-06-30 2021-12-30 The Nielsen Company (Us), Llc Methods, systems, articles of manufacture, and apparatus to classify labels based on images using artificial intelligence
US11544509B2 (en) * 2020-06-30 2023-01-03 Nielsen Consumer Llc Methods, systems, articles of manufacture, and apparatus to classify labels based on images using artificial intelligence
CN113205138A (en) * 2021-04-30 2021-08-03 四川云从天府人工智能科技有限公司 Human face and human body matching method, equipment and storage medium
CN115731436A (en) * 2022-09-21 2023-03-03 东南大学 Highway vehicle image retrieval method based on deep learning fusion model
CN115731436B (en) * 2022-09-21 2023-09-26 东南大学 Highway vehicle image retrieval method based on deep learning fusion model

Also Published As

Publication number Publication date
CN110674719A (en) 2020-01-10
KR20220053670A (en) 2022-04-29
JP2022542668A (en) 2022-10-06
CN110674719B (en) 2022-07-26
TWI747325B (en) 2021-11-21
SG11202110892SA (en) 2021-10-28
JP7262659B2 (en) 2023-04-21
TW202113757A (en) 2021-04-01

Similar Documents

Publication Publication Date Title
WO2021051857A1 (en) Target object matching method and apparatus, electronic device and storage medium
JP7238141B2 (en) METHOD AND APPARATUS, ELECTRONIC DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM FOR RECOGNIZING FACE AND HANDS
TWI724736B (en) Image processing method and device, electronic equipment, storage medium and computer program
CN110287874B (en) Target tracking method and device, electronic equipment and storage medium
WO2021031609A1 (en) Living body detection method and device, electronic apparatus and storage medium
JP6134446B2 (en) Image division method, image division apparatus, image division device, program, and recording medium
KR20210102180A (en) Image processing method and apparatus, electronic device and storage medium
CN107944447B (en) Image classification method and device
CN111340766A (en) Target object detection method, device, equipment and storage medium
CN109934275B (en) Image processing method and device, electronic equipment and storage medium
CN110532956B (en) Image processing method and device, electronic equipment and storage medium
WO2021035833A1 (en) Posture prediction method, model training method and device
CN111241887B (en) Target object key point identification method and device, electronic equipment and storage medium
CN109840917B (en) Image processing method and device and network training method and device
CN111243011A (en) Key point detection method and device, electronic equipment and storage medium
CN110659690B (en) Neural network construction method and device, electronic equipment and storage medium
CN112219224B (en) Image processing method and device, electronic equipment and storage medium
WO2019205605A1 (en) Facial feature point location method and device
CN109522937B (en) Image processing method and device, electronic equipment and storage medium
CN113065591B (en) Target detection method and device, electronic equipment and storage medium
CN111242303A (en) Network training method and device, and image processing method and device
CN111523485A (en) Pose recognition method and device, electronic equipment and storage medium
CN111259967A (en) Image classification and neural network training method, device, equipment and storage medium
CN113486830A (en) Image processing method and device, electronic equipment and storage medium
CN110633715B (en) Image processing method, network training method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20864997

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022504597

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20227011057

Country of ref document: KR

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 20864997

Country of ref document: EP

Kind code of ref document: A1