CN112184787A

CN112184787A - Image registration method and device, electronic equipment and storage medium

Info

Publication number: CN112184787A
Application number: CN202011163832.7A
Authority: CN
Inventors: 王谦; 杨帆; 吴立威
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-01-05

Abstract

The present disclosure relates to an image registration method and apparatus, an electronic device, and a storage medium, the method including: acquiring a first image and a second image which are synchronously acquired aiming at the same scene; respectively extracting image features of the first image and the second image; obtaining a conversion relation between the image coordinates of the first image and the image coordinates of the second image according to the image characteristics of the first image and the image characteristics of the second image; and determining second key points corresponding to the first key points in the second image based on the conversion relation and the image coordinates of the first key points of the first image. The embodiment of the disclosure can improve the robustness and efficiency of image registration.

Description

Image registration method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to an image registration method and apparatus, an electronic device, and a storage medium.

Background

The image registration technology of the binocular camera has important application in many fields, for example, in the field of face recognition, along with higher and higher requirements on the safety of face recognition, the face recognition of images acquired by the binocular camera gradually becomes the standard configuration of face recognition. The binocular camera can utilize the bionics principle, synchronously acquire images through the two cameras, and can also obtain the depth information of the images according to the difference of the two images, so that the binocular camera can be suitable for more application scenes. With the increasing widespread use of binocular cameras, more efficient and faster image registration techniques are urgently needed.

Disclosure of Invention

The present disclosure proposes an image registration technical solution.

According to an aspect of the present disclosure, there is provided an image registration method including:

acquiring a first image and a second image which are synchronously acquired aiming at the same scene;

respectively extracting image features of the first image and the second image;

obtaining a conversion relation between the image coordinates of the first image and the image coordinates of the second image according to the image characteristics of the first image and the image characteristics of the second image;

and determining second key points corresponding to the first key points in the second image based on the conversion relation and the image coordinates of the first key points of the first image.

In one or more possible implementations, the first image is an RGB image and the second image is a thermographic image, the method further comprising: and determining a target image area of the thermal imaging image according to a second key point corresponding to the first key point in the thermal imaging image.

In one or more possible implementations, the determining a target image region of the thermal imaging image according to a second keypoint of the thermal imaging image corresponding to the first keypoint includes: and determining a face area in the thermal imaging image according to the second key points in the thermal imaging image, which correspond to the face key points of the RGB image.

In one or more possible implementations, the method further includes: acquiring two videos acquired for the scene; according to the image quality of the image frames in the two videos, respectively extracting an image from the two videos to obtain the first image and the second image, wherein the acquisition time of the first image is the same as that of the second image.

In one or more possible implementations, the method further includes: detecting the interest region of the first image, and determining the interest region in the first image; and determining a first key point of the first image according to the interest region in the first image.

In one or more possible implementations, the obtaining a conversion relationship between the image coordinates of the first image and the image coordinates of the second image according to the image features of the first image and the image features of the second image includes: inputting the image features of the first image and the second image into a registration layer of a trained image registration network to obtain a conversion relation between the image coordinates of the first image and the image coordinates of the second image, wherein the image registration network is obtained based on back propagation of network loss, and the network loss is determined based on output results obtained by registering sample images acquired from a plurality of scenes and pre-acquired annotation information.

In one or more possible implementations, the method further includes: acquiring sample images acquired in a plurality of scenes, wherein the sample images comprise a first sample and a second sample acquired simultaneously in the same scene; inputting the first sample and the second sample into an image registration network to obtain an output result of the image registration network; determining the network loss of the image registration network according to a comparison result between the output result and the annotation information, wherein the annotation information comprises a conversion relation between the image coordinate of the first sample and the image coordinate of the second sample; carrying out back propagation on the network loss to obtain an image registration network after one round of training; and performing multiple rounds of training on the image registration network to obtain the trained image registration network.

In one or more possible implementations, the plurality of scenes includes a first scene and does not include a second scene, the method further including: under the condition that the scene is converted from a first scene to a second scene, acquiring a sample image of the second scene; updating the image registration network based on the sample image of the second scene to obtain an updated image registration network; and carrying out image registration on the two images synchronously acquired in the second scene through the updated image registration network.

According to an aspect of the present disclosure, there is provided an image registration apparatus including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a first image and a second image which are synchronously acquired aiming at the same scene;

the extraction module is used for respectively extracting the image characteristics of the first image and the image characteristics of the second image;

the first determining module is used for obtaining a conversion relation between the image coordinate of the first image and the image coordinate of the second image according to the image feature of the first image and the image feature of the second image;

a registration module for determining second keypoints in the second image corresponding to the first keypoints based on the transformation relationship and image coordinates of the first keypoints in the first image.

In one or more possible implementations, the first image is an RGB image, the second image is a thermographic image, and the apparatus further comprises: and the second determining module is used for determining a target image area of the thermal imaging image according to a second key point corresponding to the first key point in the thermal imaging image.

In one or more possible implementation manners, the first key point includes a face key point, the target image region includes a face region, and the second determining module is configured to determine the face region in the thermal imaging image according to the second key point, corresponding to the face key point of the RGB image, in the thermal imaging image.

In one or more possible implementations, the apparatus further includes: the extraction module is used for acquiring two videos collected aiming at the scene; according to the image quality of the image frames in the two videos, respectively extracting an image from the two videos to obtain the first image and the second image, wherein the acquisition time of the first image is the same as that of the second image.

In one or more possible implementations, the apparatus further includes: the detection module is used for detecting the interest region of the first image and determining the interest region in the first image; and determining a first key point of the first image according to the interest region in the first image.

In one or more possible implementations, the first determining module is configured to input the image features of the first image and the image features of the second image into a registration layer of a trained image registration network, so as to obtain a transformation relationship between the image coordinates of the first image and the image coordinates of the second image, where the image registration network is obtained based on back propagation of network loss, and the network loss is determined based on output results obtained by registering sample images acquired from a plurality of scenes and pre-acquired annotation information.

In one or more possible implementations, a training module is configured to acquire sample images acquired in multiple scenes, where the sample images include a first sample and a second sample acquired in the same scene at the same time; inputting the first sample and the second sample into an image registration network to obtain an output result of the image registration network; determining the network loss of the image registration network according to a comparison result between the output result and the annotation information, wherein the annotation information comprises a conversion relation between the image coordinate of the first sample and the image coordinate of the second sample; carrying out back propagation on the network loss to obtain an image registration network after one round of training; and performing multiple rounds of training on the image registration network to obtain the trained image registration network.

In one or more possible implementations, the plurality of scenes includes a first scene and does not include a second scene, the apparatus further comprising:

the updating module is used for acquiring a sample image of a second scene under the condition that the scene is converted from a first scene to the second scene; updating the image registration network based on the sample image of the second scene to obtain an updated image registration network; and carrying out image registration on the two images synchronously acquired in the second scene through the updated image registration network.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, a first image and a second image which are synchronously acquired for the same scene may be acquired, then image features of the first image and image features of the second image are respectively extracted, so that a conversion relationship between image coordinates of the first image and image coordinates of the second image may be obtained according to the image features of the first image and the image features of the second image, and based on the obtained conversion relationship and the image coordinates of a first key point in the first image, a second key point corresponding to the first key point in the second image may be determined, so that the first image and the second image are registered. Therefore, the conversion relation of the pixel points in the two images can be obtained under the condition that the camera does not need to be calibrated, so that the method is suitable for image registration of different scenes, and the robustness and efficiency of the image registration are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow chart of an image registration method according to an embodiment of the present disclosure.

Fig. 2 shows a flowchart of an example of acquiring a first image and a second image according to an embodiment of the present disclosure.

Fig. 3 shows a flowchart of an example of an image registration method according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of an image registration apparatus according to an embodiment of the present disclosure.

Fig. 5 shows a block diagram of an example of an electronic device in accordance with an embodiment of the present disclosure.

FIG. 6 shows a block diagram of an example of an electronic device in accordance with an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

The image registration scheme provided by the embodiment of the disclosure can be applied to indoor and outdoor scenes such as image registration, face recognition, security and the like, for example, in a face recognition scene, the face position in another image can be found through the face position in one image acquired by a binocular camera, so that the image registration scheme is not limited to only one face during face acquisition. For another example, in a human face temperature measurement scene, the human face position in the thermal imaging image can be positioned through the human face position in the RGB image, so that human face detection on the thermal imaging image can be reduced, and the human face detection efficiency of the thermal imaging image can be improved.

The image registration method provided by the embodiment of the present disclosure may be executed by a terminal device, a server, or other types of electronic devices, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the image registration method may be implemented by a processor invoking computer readable instructions stored in a memory. Alternatively, the method may be performed by a server. For example, when the image registration method is applied to a terminal device, the requirement of an embedded device for registration of two images can be met, and when the image registration method is applied to a server, the requirement of registration of a large number of images can be met.

The image registration method according to the embodiment of the present disclosure is described below by taking an electronic device as an execution subject.

Fig. 1 shows a flowchart of an image registration method according to an embodiment of the present disclosure, as shown in fig. 1, the image registration method includes:

step S11, a first image and a second image synchronously acquired for the same scene are acquired.

In the embodiment of the present disclosure, the electronic device may have a binocular camera, and may perform shooting or image acquisition on a target object in a scene where the electronic device is located, so as to obtain a first image and a second image which are synchronously acquired for the target object in the same scene. In some implementations, the electronic device may acquire the first image and the second image that are acquired by the other device synchronously with respect to the target object in the scene, for example, may acquire two images acquired by a binocular camera, or may acquire the first image and the second image that are acquired by two image acquisition devices provided in the same scene synchronously with respect to the target object. For example, in the current scene of entrance guard, binocular camera can carry out image acquisition to the pedestrian of the same row, and electronic equipment can acquire first image and the second image to pedestrian's synchronous collection. The first image and the second image may include faces of pedestrians, and the first image and the second image may be an RGB image and an infrared image, or an RGB image and a thermal infrared image, respectively.

Here, the image capturing angles of the first image and the second image may be different, so that the first image and the second image may not be completely the same, that is, it may be understood that a picture that is not present in the second image may be included in the first image, or a picture that is not present in the first image may be included in the second image.

Step S12, extracting image features of the first image and the second image, respectively.

In the embodiment of the present disclosure, after the first image and the second image are acquired, the image feature of the first image and the image feature of the second image may be extracted, respectively. In the case of extracting the image features of the first image and the image features of the second image, the image features of the first image and the image features of the second image may be extracted by some feature extraction algorithms, operators, and the like.

In some implementations, the trained image registration network may be further used to extract image features of the first image and the second image, for example, the first image and the second image may be input into the trained image registration network together, and the feature extraction layer of the image registration network is used to extract image features of the first image and the second image, respectively. The image registration network may be a deep neural network and may include a plurality of network layers, wherein a feature extraction layer may be used for image feature extraction. The feature extraction layer may include one or more network layers such as convolutional layers, pooling layers, sampling layers, etc., and the present disclosure is not limited to a specific neural network structure.

Here, in order to better extract image features, the first image and the second image may be preprocessed before extracting image features of the first image and the second image, for example, the first image and the second image may be respectively cropped or scaled according to a preset image size, the first image and the second image may be cropped or scaled to a fixed image size, and then the cropped or scaled first image and the second image may be respectively subjected to image feature extraction. Here, the preprocessing may include one or more preprocessing operations such as image cropping, image scaling, sharpening, smoothing, denoising, gray-scale adjustment, and brightness adjustment, and the disclosure is not limited to a specific preprocessing operation.

Step S13, obtaining a conversion relationship between the image coordinates of the first image and the image coordinates of the second image according to the image features of the first image and the image features of the second image.

In the embodiment of the present disclosure, the image feature of the first image and the image feature of the second image may be further processed, for example, feature fusion, feature mapping, and the like may be performed on the image feature of the first image and the image feature of the second image, so as to finally obtain a conversion relationship between the image coordinate of the first image and the image coordinate of the second image, and a pixel point representing the same object in the first image and the second image may be determined through the conversion relationship. Here, the conversion relation may be represented by one matrix, so that the conversion relation may be visually represented by the matrix.

In some implementations, the image features of the first image and the image features of the second image may be input into the registration layer of the trained image registration network, and the image features of the first image and the image features of the second image are further processed by using the registration layer of the trained image registration network, for example, normalization processing, feature fusion processing, full connection processing, and the like are performed, so that a conversion relationship between the image coordinates of the first image and the image coordinates of the second image may be obtained. Here, the trained image registration network may be obtained based on back propagation of network loss, and the network loss is determined based on a conversion result obtained by registering sample images acquired in a plurality of scenes and pre-acquired labeling information, so that the trained image registration network is suitable for a plurality of scenes, and camera calibration is not required for a single scene, thereby improving the efficiency of image registration.

Here, in the case of training an image registration network, pairs of sample images acquired at multiple scenes may be acquired, where a pair of sample images may include a first sample and a second sample acquired at the same time at the same scene. Then, a pair of sample images is input into the image registration network, that is, the first sample and the second sample are input into the image registration network, so that an output result of the image registration network can be obtained. And determining the network loss of the image registration network according to a comparison result between the obtained output result and the pre-obtained labeling information, wherein the pre-obtained standard information may include a conversion relationship between the image coordinates of the first sample and the image coordinates of the second sample. For example, the output result output by the image registration network may be compared with the conversion relationship between the pair of sample images to obtain a comparison result, for example, the comparison result between the output result and the conversion relationship may be calculated by some loss functions, and then the network loss of the image registration network may be determined by the comparison results obtained by the plurality of pairs of sample images, for example, the comparison results of the plurality of pairs of sample images may be added or weighted and averaged to obtain the network loss of the image registration network. Further, the network loss can be propagated reversely, and the network parameters of the image registration network are adjusted based on the determined network loss, so that the image registration network after a round of training can be obtained. After the image registration network is trained for multiple times, the trained image registration network can be obtained finally. By carrying out back propagation on the network loss of the image registration network, the output result of the image registration network gradually tends to the conversion relation acquired in advance, so that the trained image registration network can accurately predict the conversion relation of the image coordinate conversion between two images.

Step S14, determining a second keypoint in the second image corresponding to the first keypoint based on the transformation relation and the image coordinates of the first keypoint of the first image.

In the embodiment of the present disclosure, after determining the conversion relationship between the image coordinates of the first image and the image coordinates of the second image, the image coordinates of the first keypoint in the first image may be subjected to coordinate transformation by the conversion relationship, and the second keypoint in the second image representing the same object as the first keypoint is determined. The keypoints can be pixel points representing the target object in the image, for example, a first keypoint can represent a pixel point of the contour of the target object in the first image, and correspondingly, a second keypoint can represent a pixel point of the contour of the target object in the second image. Here, the first keypoints may include one or more keypoints, and accordingly, the second keypoints may also include one or more keypoints.

Here, the conversion relation may be expressed as a matrix, which may be expressed as follows:

wherein Mat may represent a translation relation, a_ijConversion coefficients may be represented, where i and j may be positive integers less than or equal to 3. The image registration network may output a 3 × 3 matrix as shown in the matrix Mat, and the second feature points in the second image that are the same as and correspond to the first feature points in the first image may be determined through the transformation relationship represented by the matrix Mat, so that the first image and the second image are subjected to image registration.

For example,let the coordinate of the first keypoint of the first image be (x)₁,y₁) The coordinate of the second key point of the second image is (x)₂,y₂) The coordinates of the second keypoints of the second image can be obtained from the coordinates of the first keypoints of the first image by the following equations (1) and (2).

With the above formula (1) and formula (2), the coordinates of the second keypoints in the second image representing the same object as the first keypoints can be determined through the transformation relationship represented by the matrix Mat, thereby achieving the registration of the target object in the two images. In the related art, determining the conversion relationship between the pixel points in the two images requires finding at least 4 groups of corresponding key points on the group of images, the conversion relationship can be obtained by solving the key points, and the conversion relationship needs to be determined again under the condition that the shooting angle, the distance and the position of the image acquisition device are changed. According to the embodiment of the disclosure, the conversion relation between the two images can be obtained through the neural network, so that the influence of the shooting angle, distance, position and the like of the image acquisition device is reduced, and the accuracy of image registration is improved.

The embodiment of the disclosure can obtain the conversion relation between the image coordinate of the first image and the image coordinate of the second image by using the trained image registration network, thereby reducing the influence of the position, the shooting angle and the like of the image acquisition device on the image registration. The trained image registration network can be obtained by training based on sample images acquired in a plurality of scenes, so that the trained image registration network can be suitable for image registration of a plurality of scenes, and conversion relationships under corresponding scenes are adaptively output for a plurality of different scenes.

In some implementations, the plurality of scenes may include a first scene and not include a second scene. In a case where a scene is converted from a first scene to a second scene, a sample image of the second scene may be acquired, then the image registration network may be updated based on the sample image of the second scene to obtain an updated image registration network, and further, two images synchronously acquired in the second scene may be subjected to image registration by the updated image registration network, that is, in a plurality of scenes to which the sample image used for training the image registration network belongs, excluding the second scene, in a case where the scene of image registration is converted from the first scene to the second scene, a conversion relationship between two images synchronously acquired in the second scene obtained by using the registered image registration network may not be accurate enough, so that network parameters of the image registration network may be updated by using a plurality of pairs of sample images synchronously acquired in the second scene, namely, the image registration network can be retrained, and then the conversion relation of two images synchronously acquired in the second scene is obtained through the updated image registration network, so that the requirement of applying the image registration to various scenes is further met.

Generally, once the conversion relationship is determined, it is difficult to update, and when the conversion relationship is adapted to a new scene, the image capturing device in the new scene needs to be calibrated again, and when the number of scenes is increased, each scene needs to be processed separately, which is difficult to maintain. According to the image registration scheme provided by the disclosure, under the condition of adapting to the new scene, the image registration network can be updated by using a certain number of sample images collected in the new scene, and the new scene can be adapted after the update is completed, so that the maintenance cost can be greatly reduced.

In some implementations, the first and second images may differ in image imaging modality, such that image registration may be performed for the two images of different imaging modalities. Here, the image imaging means may include at least one of Red Green Blue (RGB) imaging, thermal imaging, near infrared imaging, ultrasonic imaging, laser radar imaging, millimeter wave radar imaging, and X-ray imaging.

In one example, the first image may be an RGB image, the second image may be a thermal imaging image, and after obtaining a transformation relationship between image coordinates of the first image and image coordinates of the second image, coordinate transformation may be performed on a first key point in the RGB image through the obtained transformation relationship to determine a second key point in the thermal imaging image corresponding to the first key point, so that a target image region where a target object in the thermal imaging image is located may be determined according to the second key point in the thermal imaging image corresponding to the first key point. For example, in an industrial temperature measurement scene, a target image area where a target object in a thermal imaging image is located can be directly located through an image area where the target object in an RGB image is located, so that the accuracy of target location in the thermal imaging image can be improved.

In one example, the first keypoints may include face keypoints, the target image region may include a face region, and in the case where the target image region of the thermography image is determined based on second keypoints corresponding to the first keypoints in the thermography image, the face region of the thermography image may be determined based on the second keypoints corresponding to the face keypoints of the RGB image in the thermography image. For example, in a human face temperature measurement scene, the human face position in the thermal imaging image can be directly positioned through the human face position in the RGB image, so that the difficulty of human face detection on the thermal imaging image can be reduced, and the accuracy and efficiency of human face detection on the thermal imaging image can be improved.

In step S11 described above, the first image and the second image acquired synchronously for the same scene may be acquired, so that the first image and the second image may be subjected to image registration. The first image and the second image may be images captured for the same scene, and in some implementations, the first image and the second image may also be image frames in a video taken for the same scene.

Based on this, the image registration method may further include: acquiring two videos acquired aiming at the scene, and then respectively extracting one image from the two videos to obtain a first image and a second image, wherein the acquisition time of the first image is the same as that of the second image.

In this implementation manner, when two videos synchronously photographed for one scene are acquired, one image may be respectively extracted from the two videos according to the capturing time of the image frame, so that one image frame may be selected as the first image or the second image in each video, that is, the first image and the second image with the same capturing time may be obtained through a frame selection operation. In some implementation manners, the number of videos shot for the scene may be multiple, and when the number of videos shot is multiple, the multiple videos may be combined pairwise, and then one image frame is extracted from the two videos according to the acquisition time of the image frame, so that a first image and a second image with the same acquisition time may be obtained. The first image and the second image are obtained by extracting the image frames from the shot video, the condition of inaccurate registration caused by the fact that key points are possibly difficult to determine by the independently collected images can be reduced, and therefore the accuracy of registration of the first image and the second image can be improved by extracting the image frames from the video.

Here, after acquiring two videos collected for one scene, the two acquired videos may be further converted into a fixed video type, for example, the two videos may be converted into a video type required by a unified data interface, for example, the video is converted from a video stream into an image type, so that extraction of image frames may be facilitated, and efficiency of frame selection may be improved.

In one example, one image may be extracted from the two videos respectively according to the quality of the image frames in the two videos to obtain the first image and the second image, that is, for each of the two videos, one image frame may be extracted from each video according to the image quality of the image frame in each video to obtain the first image or the second image. Here, the image quality may be measured by any one or more of the following criteria: integrity of the region of interest, image sharpness, exposure. In the case of extracting one image frame from the video according to the image quality of the image frame, the image frame with the image quality greater than the quality threshold may be extracted from the video, that is, the image frame with the higher image quality may be selected as the first image or the second image. Here, the image quality is greater than the quality threshold, and may include one or more of the region of interest in the image frame, the image definition is greater than the definition threshold, and the exposure is within a certain exposure value range, so that the selected region of interest of the first image or the second image may be completely within the image, the image definition is higher, the exposure is moderate, and the situation that the region of interest is too dark or too bright may be reduced.

Fig. 2 shows a flowchart of an example of acquiring a first image and a second image according to an embodiment of the present disclosure, which may include the steps of:

step S201, acquiring data collected by a binocular camera;

step S202, judging whether the acquired data is of an image type;

in step S203, in the case that the acquired data is a video, a frame selection operation is performed on the video.

Step S204, a first image and a second image of the image type are acquired.

Through the steps, the first image and the second image which are acquired and synchronously acquired can be obtained, so that the first image and the second image can be subjected to image registration, and key points which represent the same correspondence in the first image and the second image are determined.

In the above step S14, a second keypoint in the second image corresponding to the first keypoint may be determined based on the transformation relationship and the image coordinates of the first keypoint of the first image, so that the first image and the second image may be registered. Here, before determining a second keypoint in the second image corresponding to the first keypoint, the first keypoint of the first image may also be acquired, and the process of acquiring the first keypoint is described below through one or more implementations.

In some implementations, the first image may be subjected to region of interest detection, a region of interest in the first image may be determined, and then the first keypoint of the first image may be determined according to the region of interest in the first image. For example, in a face keypoint detection scene, an interest region of a face in a first image in the first image may be determined, and then first keypoints such as eyebrows, eyes, a nose, and a mouth corner in the interest region are detected, so that image coordinates of the first keypoints can be obtained. Here, the region of interest may be an image region where the target object is located, for example, the target object may be a pedestrian, so that the region of interest may be an image region of the pedestrian in the first image, and for example, the target object may be a face of a person, so that the region of interest may be an image region of the face of the person in the first image. In the case of determining the region of interest in the first image, the first image may be input to a target detection network, the region of interest in which the target object in the first image is located may be determined using the target detection network, and then a point on a frame of the region of interest may be taken as a first key point, for example, in the case of the region of interest being a rectangular region, four vertices of the rectangular region may be taken as the first key point. In this way, the first key point indicating the target object in the first image can be accurately determined, so that accurate image coordinates can be provided for image registration, and the accuracy of the determined second key point is improved.

In some implementations, in the case of determining a first keypoint of the first image, the first image may be input to a keypoint detection network, and the first keypoint on the region of interest is directly determined and image coordinates of the first keypoint are obtained using the keypoint detection network. Here, the keypoint detection network may be a deep neural network, and may have a strong feature learning capability, so that the first keypoint in the first image may be detected directly using the keypoint detection network.

In some implementations, in order to determine the first keypoint in the first image more quickly and more conveniently, in some simple scenarios, for example, in some scenarios where the position of the image capturing device is fixed, the target object may be considered to be generally located within a fixed image area, and the first keypoint of the first image may be determined directly according to the preset position point, that is, the preset position point may be determined as the first keypoint. For example, a central region of the first image may be used as the region of interest, and a central point or a vertex of the central region may be used as the first keypoint of the first image, so that the first keypoint may be quickly determined and the image coordinates of the first keypoint may be quickly obtained.

The image registration method provided by the embodiment of the present disclosure is explained by an example. Fig. 3 shows a flowchart of an example of an image registration method according to an embodiment of the present disclosure.

S301, acquiring a first image and a second image which are synchronously acquired aiming at the same scene.

S302, inputting the first image into a key point detection network to obtain an interest area in the first image and a group of first key points of the interest area.

And S303, inputting the first image and the second image into an image registration network to obtain a conversion relation between the image coordinate of the first image and the image coordinate of the second image.

S304, determining second key points corresponding to the first key points in the second image based on the obtained conversion relation and the image coordinates of the group of first key points.

S305, determining a target image area of the second image according to a second key point corresponding to the first key point in the second image.

It should be noted that the present disclosure does not limit the execution sequence of the steps S302 and S303, and the step S302 may be executed first and then the step S303 is executed, or the step S303 may be executed first and then the step S302 is executed, or the step S302 and the step S303 may be executed at the same time.

The image registration scheme provided by the embodiment of the disclosure can realize registration of two images by utilizing deep learning, and can perform iterative update on an image registration network in time, so that the image registration scheme can be adapted to various scenes, and the image registration is more efficient and convenient.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an image registration apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the image registration methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the methods section are not repeated.

Fig. 4 shows a block diagram of an image registration apparatus according to an embodiment of the present disclosure, as shown in fig. 4, the apparatus including:

an obtaining module 41, configured to obtain a first image and a second image that are synchronously acquired for a same scene;

an extraction module 42, configured to extract image features of the first image and image features of the second image, respectively;

a first determining module 43, configured to obtain a conversion relationship between the image coordinate of the first image and the image coordinate of the second image according to the image feature of the first image and the image feature of the second image;

a registration module 44, configured to determine second keypoints in the second image corresponding to the first keypoints based on the transformation relation and the image coordinates of the first keypoints in the first image.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code which, when run on a device, a processor in the device executes instructions for implementing the image registration method provided in any of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed cause a computer to perform the operations of the image registration method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 5 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 5, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 6 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 6, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An image registration method, comprising:

respectively extracting image features of the first image and the second image;

2. The method of claim 1, wherein the first image is an RGB image and the second image is a thermographic image, the method further comprising:

and determining a target image area of the thermal imaging image according to a second key point corresponding to the first key point in the thermal imaging image.

3. The method of claim 2, wherein the first keypoints comprise face keypoints, wherein the target image region comprises a face region, and wherein determining the target image region of the thermographic image from second keypoints in the thermographic image that correspond to the first keypoints comprises:

and determining a face area in the thermal imaging image according to the second key points in the thermal imaging image, which correspond to the face key points of the RGB image.

4. A method according to any one of claims 1 to 3, characterized in that the method further comprises:

acquiring two videos acquired for the scene;

according to the image quality of the image frames in the two videos, respectively extracting an image from the two videos to obtain the first image and the second image, wherein the acquisition time of the first image is the same as that of the second image.

5. The method of any one of claims 1 to 4, further comprising:

detecting the interest region of the first image, and determining the interest region in the first image;

and determining a first key point of the first image according to the interest region in the first image.

6. The method according to any one of claims 1 to 5, wherein the obtaining of the conversion relationship between the image coordinates of the first image and the image coordinates of the second image according to the image features of the first image and the image features of the second image comprises:

inputting the image features of the first image and the second image into a registration layer of a trained image registration network to obtain a conversion relation between the image coordinates of the first image and the image coordinates of the second image, wherein the image registration network is obtained based on back propagation of network loss, and the network loss is determined based on output results obtained by registering sample images acquired from a plurality of scenes and pre-acquired annotation information.

7. The method of claim 6, further comprising:

acquiring sample images acquired in a plurality of scenes, wherein the sample images comprise a first sample and a second sample acquired simultaneously in the same scene;

inputting the first sample and the second sample into an image registration network to obtain an output result of the image registration network;

determining the network loss of the image registration network according to a comparison result between the output result and the annotation information, wherein the annotation information comprises a conversion relation between the image coordinate of the first sample and the image coordinate of the second sample;

carrying out back propagation on the network loss to obtain an image registration network after one round of training;

and performing multiple rounds of training on the image registration network to obtain the trained image registration network.

8. The method of claim 6 or 7, wherein the plurality of scenes includes a first scene and does not include a second scene, the method further comprising:

under the condition that the scene is converted from a first scene to a second scene, acquiring a sample image of the second scene;

updating the image registration network based on the sample image of the second scene to obtain an updated image registration network;

and carrying out image registration on the two images synchronously acquired in the second scene through the updated image registration network.

9. An image registration apparatus, comprising:

10. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 9.

11. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 9.