CN112785687A

CN112785687A - Image processing method, image processing device, electronic equipment and readable storage medium

Info

Publication number: CN112785687A
Application number: CN202110099198.3A
Authority: CN
Inventors: 王锡浩; 黄晗; 郭彦东
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2021-05-11

Abstract

The application discloses an image processing method, an image processing device, electronic equipment and a readable storage medium, and belongs to the technical field of three-dimensional imaging. The method comprises the following steps: acquiring first image data and second image data, wherein the first image data and the second image data are image data at different moments; inputting the first image data and the second image data into a feature matching network, and acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network; acquiring feature transformation data based on the matched feature points, and fusing the feature transformation data and the first image data to obtain target data; and executing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image. According to the method and the device, the precision of three-dimensional reconstruction can be improved by acquiring the matching feature points and the feature transformation data between the first image data and the second image data.

Description

Image processing method, image processing device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of three-dimensional imaging technologies, and in particular, to an image processing method and apparatus, an electronic device, and a readable storage medium.

Background

The three-dimensional reconstruction technology is a hotspot and difficulty in the frontier fields of computer vision, artificial intelligence, virtual reality and the like, is one of the major challenges faced by human beings in basic research and application research, is widely applied to the fields of cultural relic digitization, biomedical imaging, animation production, industrial measurement, immersive virtual interaction and the like, and the three-dimensional reconstruction is to establish a 3D model from input data. Therefore, how to improve the accuracy of three-dimensional reconstruction is an urgent problem to be solved.

Disclosure of Invention

The application provides an image processing method, an image processing device, an electronic device and a readable storage medium, so as to overcome the defects.

In a first aspect, an embodiment of the present application provides an image processing method, where the method includes: acquiring first image data and second image data, wherein the first image data and the second image data are image data at different moments; inputting the first image data and the second image data into a feature matching network, and acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network; acquiring feature transformation data based on the matched feature points, and fusing the feature transformation data and the first image data to obtain target data; and executing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image.

In a second aspect, an embodiment of the present application further provides an image processing apparatus, including: the device comprises a first acquisition module, a second acquisition module, a fusion module and a reconstruction module. The first acquisition module is used for acquiring first image data and second image data, wherein the first image data and the second image data are image data at different moments. And the second acquisition module is used for inputting the first image data and the second image data into a feature matching network and acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network. And the fusion module is used for acquiring feature transformation data based on the matched feature points, and fusing the feature transformation data and the first image data to obtain target data. And the reconstruction module is used for executing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image.

In a third aspect, an embodiment of the present application further provides an electronic device, including one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the above-described methods.

In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be called by a processor to execute the above method.

The image processing method, the image processing device, the electronic device and the readable storage medium provided by the embodiment of the application can improve the precision of three-dimensional reconstruction by acquiring the matching feature points between first image data and second image data and determining feature transformation data based on the acquired matching feature points, and particularly, the application can acquire the first image data and the second image data, wherein the first image data and the second image data are image data at different moments, then input the first image data and the second image data into a feature matching network, acquire at least one matching feature point between the first image data and the second image data by using the feature matching network, acquire the feature transformation data based on the matching feature points, fuse the feature transformation data and the first image data to obtain target data, and finally execute three-dimensional reconstruction operation according to the target data, and obtaining a three-dimensional image. According to the method and the device, more accurate matched feature points can be obtained by utilizing the feature matching network, so that the three-dimensional reconstruction precision can be improved to a certain extent.

Additional features and advantages of embodiments of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of embodiments of the present application. The objectives and other advantages of the embodiments of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates a method flow diagram of an image processing method provided by one embodiment of the present application;

fig. 2 shows a flowchart of step S101 in an image processing method according to an embodiment of the present application;

FIG. 3 illustrates a method flow diagram of an image processing method provided by another embodiment of the present application;

fig. 4 is a diagram illustrating an example of a process of acquiring matched feature points in an image processing method according to another embodiment of the present application;

FIG. 5 is a diagram illustrating an example of an overall framework of three-dimensional reconstruction in an image processing method according to another embodiment of the present application;

FIG. 6 is a flow chart of a method of image processing provided by yet another embodiment of the present application;

fig. 7 shows a flowchart of step S603 in an image processing method according to still another embodiment of the present application;

fig. 8 shows a block diagram of an image processing apparatus provided in an embodiment of the present application;

fig. 9 shows a block diagram of an electronic device provided in an embodiment of the present application;

fig. 10 illustrates a storage unit provided in an embodiment of the present application for storing or carrying program codes for implementing an image processing method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

The existing three-dimensional reconstruction schemes mainly comprise two schemes, one scheme is a three-dimensional reconstruction scheme based on multi-frame scanning depth fusion and free of prior information, the other scheme is a scheme for performing three-dimensional reconstruction by combining a depth learning template and a pre-scanning template and using a single color image or video frame, and for the first three-dimensional reconstruction scheme, due to the adoption of a multi-frame depth map fusion strategy, a local area of the surface of a reconstructed three-dimensional model can be over-smooth, so that the three-dimensional reconstruction model is inaccurate, meanwhile, whether the matching of key points of two adjacent frames of images is accurate or not during three-dimensional reconstruction in the first scheme can greatly influence the reconstruction effect, and the error rate of matching of the key points of the two adjacent frames of images can be gradually accumulated. With the second approach, the kind of three-dimensional reconstruction object is greatly limited due to the introduction of pre-scanned template information. For example, the SMPL (Skinned Multi-Person Linear) human template model is only suitable for reconstruction of a human model, and cannot be used for reconstruction of a model of other things. In addition, the neural network is mainly relied on when the second scheme is used for three-dimensional reconstruction, but the training of the neural network depends on a large amount of data, so that the generalization of the three-dimensional reconstruction result is poor.

Therefore, in order to solve the above problem, an embodiment of the present application provides an image processing method. Referring to fig. 1, a three-dimensional reconstruction method according to an embodiment of the present application is shown, where the three-dimensional reconstruction method includes: step S101 to step S104.

Step S101: first image data and second image data are acquired.

The image processing method provided by the embodiment of the application can be applied to electronic equipment, and the electronic equipment can be smart phones, tablet computers, electronic books and other electronic equipment capable of running application programs. When image processing is performed, data required for image processing may be acquired first, where the data may include first image data and second image data. The first image data and the second image data may be image data of different time instants, the first image data may be referred to as marked data, the second image data may be referred to as data to be marked, and the first image data and the second image data may be image data of the same thing acquired at different time instants. For example, the electronic device may acquire the image of person a at time T1 when both hands of person a are crossed, where the image may be the first image data, and the electronic device may acquire the image of person a at time T2 when both hands of person a are lifted, where the image may be referred to as the second image data. Therefore, the first image data and the second image data have the same semantic information, which may refer to the same thing, such as the first image data and the second image data contain the same person or have the same animal or other same thing.

In other embodiments, the first image data and the second image data may include a color image and a depth image, respectively, and in particular, the first image data may include a first color image and a second depth image, and the second image data may include a second color image and a second depth image. In the embodiment of the invention, the depth image is acquired by a depth camera, the color image is acquired by a color camera, and the depth image and the color image are acquired by shooting the same object by two cameras, wherein the difference between the depth image and the color image is that the depth image contains depth information and the color image contains pixel information.

When three-dimensional reconstruction is performed, in order to make a reconstructed three-dimensional image more conform to an actual object, a color image needs to be acquired to generate a more real three-dimensional image through the color image. In addition, since an image conforming to the fact cannot be constructed only with a color image, that is, a model constructed with a three-dimensional image is called a two-dimensional model only, and it is a two-dimensional image only, and it is impossible to truly represent an actual object, not only a color image but also a depth image are required for three-dimensional reconstruction, and the presence of a depth image makes a constructed three-dimensional image more stereoscopic.

In summary, in order to more accurately realize three-dimensional reconstruction, the present application may acquire a color image and a depth image to form different image data, and realize reconstruction of a three-dimensional image by using the different image data. In addition, after the image data is acquired, the acquired image data may also be preprocessed according to an embodiment of the present invention, so as to facilitate better implementation of the three-dimensional reconstruction operation, specifically, referring to fig. 2, step S101 may include steps S1011 to S1012.

Step S1011: and acquiring a first color image and a first depth image, and performing image alignment operation on the first color image and the first depth image to obtain first image data.

As one way, the electronic device may acquire a plurality of first image data and a plurality of second image data, each of which may include a color image and a depth image, i.e., the first image data may include a first color image and a first depth image, and the second image data may include a second color image and a second depth image, the first color image and the second color image may be captured by a color camera, and the first depth image and the second depth image may be captured by a depth camera.

In some embodiments, the first image data and the second image data may be previously acquired by the color camera and the depth camera, or may be acquired by the color camera and the depth camera in real time. Specifically, the color camera and the depth camera may be used to capture the same thing at the same time and transmit the images captured by them to the electronic device. It should be noted that the color camera and the depth camera may be mounted on the electronic device, that is, the electronic device may capture the color image and the depth image with the color camera and the depth camera, respectively.

In addition, the color camera and the depth camera may not be disposed on the electronic device, and when the electronic device needs to perform three-dimensional reconstruction, the electronic device may send an image acquisition instruction to the color camera and the depth camera to instruct the color camera and the depth camera to acquire a color image and a depth image of the same object, respectively.

In other embodiments, when performing three-dimensional reconstruction, the electronic device may also determine the priority of the image according to the three-dimensional reconstruction requirement, and when the three-dimensional image has a higher requirement on the depth data, it may first send an image acquisition command to the depth camera and instruct the depth camera to acquire the depth image. In addition, when the image acquisition command is sent to the depth camera, the embodiment of the present invention may also determine the level of the image to be acquired, and if the requirement of the three-dimensional reconstruction on the depth image is high, send the first image level to the depth camera to instruct the depth camera to acquire the depth image of the first image level, and similarly, when the requirement of the three-dimensional reconstruction on the depth image is not high, send the second image level to the depth camera to instruct the depth camera to acquire the depth image of the second image level. The information levels between the first image level and the second image level are different, for example, the depth image corresponding to the first image level contains more accurate depth information than the depth image corresponding to the second image level. How the first image level and the second image level are specifically divided is not specifically limited herein, and may be selected according to actual situations.

In other embodiments, when the electronic device performs three-dimensional reconstruction, it may also determine the priority of the image according to the three-dimensional reconstruction requirement, and when the three-dimensional image has a higher requirement on the pixel data, it may first send an image acquisition command to the color camera and instruct the color camera to acquire a color image. In addition, when the image acquisition command is sent to the color camera, the embodiment of the present invention may also determine the pixel level of the image to be acquired, and if the requirement of the three-dimensional reconstruction on the color image is high, send the first pixel level to the color camera to instruct the color camera to acquire the color image at the first pixel level, and similarly, when the requirement of the three-dimensional reconstruction on the color image is not high, send the second pixel level to the color camera to instruct the color camera to acquire the color image at the second pixel level. The pixel levels are different between the first pixel level and the second pixel level, such as the color image corresponding to the first pixel level contains more accurate pixel information than the color image corresponding to the second pixel level, or the resolution of the color image at the first pixel level is higher than the resolution of the color image at the second pixel level. How the first pixel level and the second pixel level are specifically divided is not specifically limited herein, and may be selected according to actual situations.

In other embodiments, after acquiring the first color image and the first depth image, the electronic device may perform an alignment operation on the first color image and the first depth image, and because there is a difference in viewing angle, the embodiment of the present invention may further perform an image alignment operation on the obtained first depth image and the first color image according to internal and external parameters of the depth camera and the color camera. Specifically, the first depth image and the first color image may be transformed into the same view angle, so that the first depth image and the first color image have the same resolution, the same pixels, the same semantic information, and the like, the depth image and the color image obtained through the alignment operation may be referred to as first image data, and the alignment operation performed on the first color image and the first depth image may improve the accuracy of three-dimensional reconstruction.

Step S1012: and acquiring a second color image and a second depth image, and performing image alignment operation on the second color image and the second depth image to obtain second image data.

In one approach, the second image data may include a second color image captured by the color camera and a second depth image captured by the depth camera. Similar to the first image data acquisition process, after acquiring the second color image and the second depth image, the electronic device may perform an alignment operation on the second color image and the second depth image, and because there is a difference in viewing angle, the embodiment of the present invention may further perform an image alignment operation on the obtained second depth image and the second color image according to internal and external parameters of the depth camera and the color camera. Specifically, the embodiment of the present invention may convert the second depth image and the second color image into the same view angle, so that the second depth image and the second color image have the same resolution, the same pixels, the same semantic information, and the like, the depth image and the color image obtained through the alignment operation may be referred to as second image data, and the embodiment of the present invention may improve the accuracy of three-dimensional reconstruction by performing the alignment operation on the first color image and the first depth image.

Step S102: and inputting the first image data and the second image data into a feature matching network, and acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network.

In some embodiments, after the first image data and the second image data are acquired, the embodiment of the present invention may input the first image data and the second image data to a feature matching network, and obtain at least one matching feature point between the first image data and the second image data by using the feature matching network. It is known from the above description that the first image data and the second image data may comprise the same object, i.e. the first image data and the second image data have the same semantic information. Therefore, a plurality of matching feature points may exist in the first image data and the second image data, for example, the first image data and the second image data both include the person a, and at this time, the hand of the person a in the first image data and the hand of the person a in the second image data are the matching feature points, and for example, the face of the person a in the first image data and the face of the person a in the second image data are the matching feature points.

In other embodiments, matching feature points may be preset, and specifically, before the first image data and the second image data are input to the feature matching network, the present application may determine whether feature mark information is included in the first image data, where the feature mark information may be feature mark point information input by a user based on the first image data, and the feature mark information may be obtained through a color image in the first image data. For example, after the user acquires the first color image, he or she may mark the position of his or her hand and the position of his or her face in the first color image, and the face and the hand may be used as matching feature points. In the embodiment of the invention, when the user inputs the feature marking point information based on the first color image, the feature marking point information can be marked in the form of a rectangular frame, and the coordinates of the upper left corner and the lower right corner of the rectangular frame can be used for representing the position of the feature marking point. In addition, the feature marker information may also be obtained by the electronic device according to analysis of the use data of the user, and how to obtain the feature marker point information is not specifically limited here, and may be selected according to actual requirements.

In some embodiments, the feature matching network is utilized, and the embodiment of the present invention can more accurately acquire the at least one matching feature point between the second image data and the first image data. As can be known from the above description, the first image data may include at least one piece of feature label information, and after the first image data and the second image data are input to the feature matching network, the feature matching network may obtain, based on the feature label information in the first image data, a matching feature point corresponding to the feature label information in the second image data, and unique identification and location information and the like corresponding to the matching feature point.

In the embodiment of the present invention, the feature matching network may also be referred to as a twin network, and the feature matching network is configured to obtain feature points corresponding to different images, and position information of the feature points in the images. After the first image data and the second image data are input to the feature matching network, the feature matching network may compare the first image data and the second image data, and determine at least one matching feature point between the second image data and the first image data. For example, when the user marks the finger (serial number 13) of person a in the first image data and the finger 13 is in the position information in the first color image (15, 16), where 15 and 16 may be the x-coordinate value and the y-coordinate value of the finger in the first color image, and after the first image data and the second image data are input to the feature matching network, the embodiment of the present invention may determine the position information (18, 20) of the finger 13 in the second color image, and it is seen that the finger is moving in the first color image and the second color image.

In summary, the matching feature points in the embodiments of the present invention may be pixel coordinates of the same object on the image at different time points with the same semantic meaning. For example, when a hand of a person moves, images at two moments are acquired, and then pixel points representing the small finger in the images form a matching feature point. When a person moves, the coordinates of the matching feature points are usually all different, and the pixel values may also be different.

It should be noted that the second image data may also include feature label information, and after the second image data and the first image data are input to the feature matching network, the feature matching network may determine, based on the feature label information of the second image data, a matching feature point corresponding to the second image data in the first image data, where the matching feature point is the feature label information.

Step S103: and acquiring feature transformation data based on the matched feature points, and fusing the feature transformation data and the first image data to obtain target data.

In other embodiments, after obtaining a plurality of matching feature points, embodiments of the present invention may obtain feature transformation data based on the matching feature points, where the feature transformation data may also be referred to as a transformation matrix, and the transformation matrix may include translation data, rotation data, and other transformation data, and the other transformation data may include pixel transformation values, resolution transformation values, and luminance transformation values, and the like.

As one mode, after the first image data and the second image data are input to the feature matching network, the feature matching network may determine matching feature points corresponding to the feature mark information in the second image data based on the feature mark information in the first image data, and then determine feature transformation data based on the matching feature points in the second image data and the matching feature points in the first image data. As in the above example, the finger 13 may be used as a matching feature point, and based on the first position information of the finger 13 in the first image data and the second position information of the finger 13 in the second image data, the embodiment of the present invention may obtain feature transformation data (18-15, 20-16), and finally the feature transformation data may be (3, 4), and it is seen that the finger 13 evolves from the first image data to the second image data, which is shifted by 3 to the right direction and by 4 to the up direction. Alternatively, the translation transformation data may comprise depth information, such as a finger 13 being 5 from the camera in the first image data and 8 from the camera in the second image data, the finger being further from the camera and the feature transformation data finally obtained being (3, 4, 8). And more than one feature point is matched by the above introduction, the feature transformation data may be a transformation matrix.

In other embodiments, after the feature transformation data is acquired, the electronic device may fuse the feature transformation data with the first image data to obtain target data, may also fuse the feature transformation data with the second image data to obtain target data, or may also fuse the feature transformation data, the first image data, and the second image data to obtain target data. The specific data obtained by fusing the feature transformation data with the target data is not limited and can be selected according to actual situations.

Step S104: and performing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image.

An image processing method provided by the embodiment of the application is that by acquiring matching feature points between first image data and second image data, and determining feature transformation data based on the obtained matching feature points, so as to improve the precision of three-dimensional reconstruction, specifically, the present application can first obtain first image data and second image data, the first image data and the second image data are image data at different times, and then inputting the first image data and the second image data into a feature matching network, acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network, acquiring feature transformation data based on the matching feature point, fusing the feature transformation data and the first image data to obtain target data, and finally performing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image. According to the method and the device, more accurate matched feature points can be obtained by utilizing the feature matching network, so that the three-dimensional reconstruction precision can be improved to a certain extent.

Referring to fig. 3, the image processing method according to another embodiment of the present application may include steps S301 to S306.

Step S301: first image data and second image data are acquired.

As one manner, the feature matching network may include a first image encoder and a second image encoder, and after the first image data and the second image data are acquired, the present application may input the first image data to the first image encoder and input the second image data to the second image encoder, respectively, to implement encoding of the first image data and the second image data, that is, enter steps S302 and S303.

Step S302: the first image data is input to a first image encoder, and the first image data is encoded by the first image encoder to obtain first encoded data.

It is known from the above description that the feature matching network may be a twin network, and thus the weights are shared between the first image encoder and the second image encoder, i.e. the weight of the first image encoder and the weight of the second image encoder are the same. As one mode, the embodiment of the present invention may input the first image data to a first image encoder, where the first image data may also be referred to as a source RGB-D image, and the first image encoder is mainly configured to encode the first image data to obtain the first encoded data.

Step S303: and inputting the second image data to a second image encoder, and encoding the first image data by using the second image encoder to obtain second encoded data.

As another way, the embodiment of the present invention may input the second image data to a second image encoder, where the second image data may also be referred to as a target RGB-D image, and the second image encoder is mainly configured to encode the second image data to obtain the second encoded data. Since the first encoder and the second encoder share the weight, the first encoded data acquired by inputting the first image data to the first encoder is in the same form as the second encoded data acquired by inputting the second image data to the second encoder.

In a specific embodiment, the first image data is input to the first encoder, the first encoded data is output with a size of 7 × 7 × 96, the second image data is input to the second encoder, and the second encoded data is output with a size of 7 × 7 × 96, where the first image data corresponds to an image size of 224 × 224 × 6 and the second image data also corresponds to an image size of 224 × 224 × 6. As another way, after the first encoded data and the second encoded data are acquired, the embodiment of the present invention may acquire a sum of output characteristics of two image encoders. Therefore, the sum of the sizes of the first encoded data and the second encoded data is 7 × 7 × 192.

Step S304: and comparing the first image encoder data with the second encoded data to obtain at least one matching feature point between the first image data and the second image data.

As one way, the feature network matching network may include a bottleneck layer, where the bottleneck layer is used to greatly reduce the amount of computation, that is, the acquisition speed of the matched feature points may be increased by introducing the bottleneck layer, so as to increase the rate of three-dimensional reconstruction, and as in the above example, the output feature size of the first encoder and the second encoder is 7 × 7 × 192, and 7 × 7 × 96 output features may be obtained through the bottleneck layer. In the embodiment of the present invention, the bottleneck layer may use a convolutional neural network of 1 × 1.

In some embodiments, when the number of matching feature points exceeds a feature point threshold, the embodiment of the present invention may delete the matching feature points by using the bottleneck layer, and use the remaining matching feature points as target feature points. In other words, when the first image data or the second image data contains fewer feature points or the structure of the feature matching network is simpler, the embodiment of the present invention may not introduce a bottleneck layer. In other words, when the number of the matched feature points does not exceed the feature point threshold, the invention can also screen the matched feature points by using the bottleneck layer in real time to reduce the calculation amount, and specifically, whether the bottleneck layer is introduced into the feature matching network or not can be selected according to the actual situation without clear limitation.

As another mode, after the matching feature points are deleted by using the bottleneck layer, the embodiment of the present application may also decode the target feature points by using an image decoder to obtain a probability thermodynamic diagram, and compare the probability thermodynamic diagram with color images in the first image data and the second image data to obtain at least one matching feature point. In a specific embodiment, after obtaining the 7 × 7 × 96 features through the bottleneck layer, the electronic device may input the 7 × 7 × 96 features into an image decoder, obtain a probability thermodynamic diagram corresponding to the input matching points, and then determine the positions of the corresponding matching points by combining the color images, thereby obtaining non-rigid matching feature points corresponding to the first image data (source RGB-D image) and the second image data (target RGB-D image).

In other embodiments, the feature matching network may further include a Softmax classifier and a Sigmoid classifier for determining matching feature points between the first image data and the second image data from different perspectives. Therefore, the BCE loss function and the NLL loss function are adopted in the embodiment of the invention when the matching feature points are determined.

For a clearer understanding of the acquisition process of the matching feature points, the embodiment of the present invention provides an example diagram as shown in fig. 4, fig. 4 may be a working example diagram of a feature matching network, the source RGB-D image in fig. 4 may be first image data which may include a first color image and a first depth image, and similarly, the target RGB-D image may be second image data which may include a second color image and a second depth image. The method comprises the steps of inputting first image data to a first image encoder, inputting second image data to a second image encoder to obtain first encoded data, and then screening the obtained encoded data by using a bottleneck layer, wherein the first encoded data and the second encoded data can be called as encoding features, and then inputting data output by the bottleneck layer to an image decoder to decode the features to obtain a probability thermodynamic diagram.

In the embodiment of the invention, the probability thermodynamic diagram shows a set of probabilities of similar points in the first image data and the second image data, which can be used for calculating BCE loss and NLL loss, wherein the BCE loss refers to a cross entropy loss function for two classifications, the NLL loss is a negative log-likelihood function, the Sigmoid normalized exponential function and the Softmax normalized exponential function in the graph 4 are used for measuring the degree of difference between a predicted value and a true value of the feature matching network, and the smaller the loss value, the better the robustness of the model. In addition, the pixel coordinate at the probability maximum on the probability thermodynamic diagram is the position of the corresponding matching point on the second image data.

Step S305: and acquiring feature transformation data based on the matched feature points, and fusing the feature transformation data and the first image data to obtain target data.

Step S306: and performing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image.

In order to more clearly understand the process of three-dimensional modeling, the present invention provides an overall frame diagram of three-dimensional reconstruction, as shown in detail in fig. 5, wherein a depth map and a color map in fig. 5 refer to first image data and second image data, as can be understood from the above description, the first image data may include a first depth image and a first color image, and the second image data may include a second depth image and a second color image, and after the first image data and the second image data (the depth map and the color map) are acquired, which may input the first image data and the second image data to a second module of fig. 5, which is a neural network-based non-rigid feature matching module, the module is mainly used for inputting the first image data and the second image data into a feature matching network to obtain at least one matching feature point. After the matching feature points corresponding to the first image data and the second image data are obtained, the embodiment of the invention can input the obtained multiple matching feature points to the non-rigid deformation field estimation, the embodiment of the invention can obtain feature transformation data through the non-rigid deformation field estimation, the feature transformation data can comprise translation data, rotation data and the like, then the feature transformation data and the first image data are fused, namely, the feature transformation data and the first image data enter a depth map fusion module to obtain target data, and finally, the target data are utilized to perform three-dimensional reconstruction to obtain a three-dimensional image, and the three-dimensional image is output.

An image processing method provided by the embodiment of the application obtains matching feature points between first image data and second image data, and determining feature transformation data based on the obtained matching feature points, so as to improve the precision of three-dimensional reconstruction, specifically, the present application can first obtain first image data and second image data, the first image data and the second image data are image data at different times, and then inputting the first image data and the second image data into a feature matching network, acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network, acquiring feature transformation data based on the matching feature point, fusing the feature transformation data and the first image data to obtain target data, and finally performing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image. According to the method and the device, more accurate matched feature points can be obtained by utilizing the feature matching network, so that the three-dimensional reconstruction precision can be improved to a certain extent. In addition, the embodiment of the invention, such as an encoder, a bottleneck layer and a decoder of the feature matching network, can more accurately determine the matching feature points corresponding to the first image data and the second image data, and can accelerate the three-dimensional modeling rate to a certain extent.

Referring to fig. 6, the image processing method according to another embodiment of the present application may include steps S601 to S605.

Step S601: first image data and second image data are acquired.

Step S602: and inputting the first image data and the second image data into a feature matching network, and acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network.

As is known from the above description, the feature matching network may include a Softmax classifier and a Sigmoid classifier for determining matching feature points between the first image data and the second image data from different angles. The loss function corresponding to the Softmax classifier may be a BCE loss function, and a calculation expression of the BCE loss function is as follows: l is_BCE(ylogf (x)) + (1-y) log (1-f (x))), where y is the true label of the matching feature point (usually represented by an n-dimensional vector) and f (x) is the predicted probability of matching feature points (also represented by a vector). BCE loss measures the difference between the predicted matching feature and the authentic tag.

In addition, the loss function corresponding to the Sigmoid classifier may be an NLL loss function, and the calculation formula of the NLL loss function is as follows:

wherein, i refers to the serial number corresponding to the feature matching point. label (i) indicates the true label, and predict (i) indicates the corresponding probability of the predicted feature matching point, i.e. the probability of the feature matching point in the second image data.

Step S603: and determining the position information corresponding to each matched feature point, and acquiring feature change data based on the position information, wherein the feature change data comprises rotation information and translation information.

Referring to fig. 7, step S603 may include steps S6031 to S6032.

Step S6031: and determining first position information corresponding to each first matching characteristic point, and determining second position information corresponding to each second matching characteristic point.

As is known from the above description, the first image data may include a plurality of feature label information, and the matching feature point corresponding to each feature label information may be obtained through a feature matching network, where the feature point corresponding to the feature label information may be referred to as a first matching feature point, and the matching feature point corresponding to the feature label information in the second image data may be referred to as a second matching feature point. The first matching feature point and the second matching feature point may include a serial number corresponding to the feature point, and a position coordinate of the matching feature point in the color image.

Step S6032: feature transformation data is determined from the first location information and the second location information.

Step S604: and fusing the characteristic change data and the first image data to obtain target data.

Step S605: and performing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image.

An image processing method provided by the embodiment of the application obtains matching feature points between first image data and second image data, and determining feature transformation data based on the obtained matching feature points, so as to improve the precision of three-dimensional reconstruction, specifically, the present application can first obtain first image data and second image data, the first image data and the second image data are image data at different times, and then inputting the first image data and the second image data into a feature matching network, acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network, acquiring feature transformation data based on the matching feature point, fusing the feature transformation data and the first image data to obtain target data, and finally performing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image. According to the method and the device, more accurate matched feature points can be obtained by utilizing the feature matching network, so that the three-dimensional reconstruction precision can be improved to a certain extent. In addition, the embodiment of the invention is a three-dimensional reconstruction scheme which is light in weight and high in reconstruction precision, is suitable for various application scenes of a mobile terminal and improves the flexibility of three-dimensional reconstruction.

Referring to fig. 8, an embodiment of the present application provides an image processing apparatus 800. In a specific embodiment, the image processing apparatus 800 includes: a first acquisition module 801, a second acquisition module 802, a fusion module 803, and a reconstruction module 804.

The first obtaining module 801 is configured to obtain first image data and second image data, where the first image data and the second image data are image data at different times.

Further, the first obtaining module 801 is further configured to obtain a first color image and a first depth image, perform an image alignment operation on the first color image and the first depth image to obtain first image data, obtain a second color image and a second depth image, and perform an image alignment operation on the second color image and the second depth image to obtain second image data.

A second obtaining module 802, configured to input the first image data and the second image data into a feature matching network, and obtain at least one matching feature point between the first image data and the second image data by using the feature matching network.

Further, the feature network includes a first image encoder and a second image encoder, and the second obtaining module 802 is further configured to input the first image data to the first image encoder, encode the first image data by using the first image encoder to obtain first encoded data, input the second image data to the second image encoder, encode the first image data by using the second image encoder to obtain second encoded data, and compare the first image encoder data and the second encoded data to obtain at least one matching feature point between the first image data and the second image data.

Further, the feature network includes a bottleneck layer, and the second obtaining module 802 is further configured to delete the matched feature points by using the bottleneck layer when the number of the matched feature points exceeds the feature point threshold, and use the remaining matched feature points as target feature points.

Further, the second obtaining module 802 is further configured to decode the target feature point by using the image decoder to obtain a probability thermodynamic diagram, and compare the probability thermodynamic diagram with color images in the first image data and the second image data to obtain at least one matching feature point.

And a fusion module 803, configured to obtain feature transformation data based on the matching feature points, and fuse the feature transformation data and the first image data to obtain target data.

Further, the fusion module 803 is further configured to determine location information corresponding to each matching feature point, and obtain feature change data based on the location information, where the feature change data includes rotation information and translation information.

Further, the fusion module 803 is further configured to determine first location information corresponding to each first matching feature point, determine second location information corresponding to each second matching feature point, and determine the feature transformation data according to the first location information and the second location information.

Further, the feature matching network includes a Softmax classifier and a Sigmoid classifier for determining matching feature points between the first image data and the second image data from different perspectives.

And the reconstruction module 804 is configured to perform a three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.

The image processing device provided by the embodiment of the application obtains the matching characteristic points between the first image data and the second image data, and determining feature transformation data based on the obtained matching feature points, so as to improve the precision of three-dimensional reconstruction, specifically, the present application can first obtain first image data and second image data, the first image data and the second image data are image data at different times, and then inputting the first image data and the second image data into a feature matching network, acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network, acquiring feature transformation data based on the matching feature point, fusing the feature transformation data and the first image data to obtain target data, and finally performing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image. According to the method and the device, more accurate matched feature points can be obtained by utilizing the feature matching network, so that the three-dimensional reconstruction precision can be improved to a certain extent.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Referring to fig. 9, a block diagram of an electronic device 900 according to an embodiment of the present disclosure is shown. The electronic device 900 may be a smart phone, a tablet computer, an electronic book, or other electronic devices capable of running an application. The electronic device 900 in the present application may include one or more of the following components: a processor 910, a memory 920, and one or more applications, wherein the one or more applications may be stored in the memory 920 and configured to be executed by the one or more processors 910, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 910 may include one or more processing cores. The processor 910 interfaces with various components throughout the electronic device 900 using various interfaces and circuitry to perform various functions of the electronic device 900 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 920 and invoking data stored in the memory 920. Alternatively, the processor 910 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 910 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 910, but may be implemented by a communication chip.

The Memory 920 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 920 may be used to store instructions, programs, code sets, or instruction sets. The memory 920 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created during use by the electronic device 900 (e.g., phone books, audio-visual data, chat log data), and so forth.

Referring to fig. 10, a block diagram of a computer-readable storage medium 1000 according to an embodiment of the present application is shown. The computer-readable storage medium 1000 has stored therein program code that can be called by a processor to execute the methods described in the above-described method embodiments.

The computer-readable storage medium 1000 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 1000 includes a non-volatile computer-readable storage medium. The computer readable storage medium 1000 has storage space for program code 1010 for performing any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 1010 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring first image data and second image data, wherein the first image data and the second image data are image data at different moments;

inputting the first image data and the second image data into a feature matching network, and acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network;

acquiring feature transformation data based on the matched feature points, and fusing the feature transformation data and the first image data to obtain target data;

and executing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image.

2. The method of claim 1, wherein the acquiring first image data and second image data comprises:

acquiring a first color image and a first depth image, and performing image alignment operation on the first color image and the first depth image to obtain first image data;

and acquiring a second color image and a second depth image, and performing image alignment operation on the second color image and the second depth image to obtain second image data.

3. The method of claim 1, wherein the feature network comprises a first image encoder and a second image encoder, the first image encoder and the second image encoder being weight-shared;

the inputting the first image data and the second image data into a feature matching network, and acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network, includes:

inputting the first image data to the first image encoder, and encoding the first image data by using the first image encoder to obtain first encoded data;

inputting the second image data to the second image encoder, and encoding the first image data by using the second image encoder to obtain second encoded data;

and comparing the first image encoder data with the second encoded data to obtain at least one matching feature point between the first image data and the second image data.

4. The method of claim 3, wherein the feature network comprises a bottleneck layer, and wherein comparing the first image encoder data and the second encoded data to obtain at least one matching feature point between the first image data and the second image data comprises:

and when the number of the matched feature points exceeds a feature point threshold value, deleting the matched feature points by using the bottleneck layer, and taking the residual matched feature points as target feature points.

5. The method of claim 4, wherein the feature matching network further comprises an image decoder, and wherein after the pruning the matched feature points by the bottleneck layer, the method further comprises:

decoding the target characteristic points by using the image decoder to obtain a probability thermodynamic diagram;

and comparing the probability thermodynamic diagram with the color images in the first image data and the second image data to obtain at least one matching feature point.

6. The method of claim 1, wherein said obtaining feature transformation data based on said matched feature points comprises:

and determining the position information corresponding to each matched feature point, and acquiring feature change data based on the position information, wherein the feature change data comprises rotation information and translation information.

7. The method of claim 6, wherein the first image data comprises a plurality of first matched feature points and the second image data comprises a plurality of second matched feature points corresponding to the first matched feature points;

the determining the position information corresponding to each matching feature point and acquiring feature change data based on the position information includes:

determining first position information corresponding to each first matching characteristic point and determining second position information corresponding to each second matching characteristic point;

and determining the feature transformation data according to the first position information and the second position information.

8. The method of any one of claims 1 to 7, wherein the feature matching network comprises a Softmax classifier and a Sigmoid classifier for determining matching feature points between the first image data and the second image data from different perspectives.

9. An image processing apparatus, characterized in that the apparatus comprises:

the device comprises a first acquisition module, a second acquisition module and a processing module, wherein the first acquisition module is used for acquiring first image data and second image data, and the first image data and the second image data are image data at different moments;

the second acquisition module is used for inputting the first image data and the second image data into a feature matching network and acquiring at least one matching feature point between the first image data and the second image data by using the feature matching network;

the fusion module is used for acquiring feature transformation data based on the matched feature points and fusing the feature transformation data and the first image data to obtain target data;

and the reconstruction module is used for executing three-dimensional reconstruction operation according to the target data to obtain a three-dimensional image.

10. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-8.

11. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 8.