WO2021244161A1 - 基于多目全景图像的模型生成方法及装置 - Google Patents

基于多目全景图像的模型生成方法及装置 Download PDF

Info

Publication number
WO2021244161A1
WO2021244161A1 PCT/CN2021/088002 CN2021088002W WO2021244161A1 WO 2021244161 A1 WO2021244161 A1 WO 2021244161A1 CN 2021088002 W CN2021088002 W CN 2021088002W WO 2021244161 A1 WO2021244161 A1 WO 2021244161A1
Authority
WO
WIPO (PCT)
Prior art keywords
reference image
image
source
phase difference
level
Prior art date
Application number
PCT/CN2021/088002
Other languages
English (en)
French (fr)
Inventor
陈丹
张誉耀
谭志刚
Original Assignee
深圳看到科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳看到科技有限公司 filed Critical 深圳看到科技有限公司
Priority to US18/041,413 priority Critical patent/US20230237683A1/en
Publication of WO2021244161A1 publication Critical patent/WO2021244161A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present invention relates to the technical field of image processing, in particular to a model generation method and device based on multi-view panoramic images.
  • Traditional multi-view stereo vision usually uses monocular cameras to sample images at multiple pre-adjusted spatial positions.
  • a picture generated from a certain reference angle of view is a reference picture
  • all pictures generated from a view angle that overlaps with the angle of view are source pictures.
  • the traditional MVS (Mult-view Stereo, multi-view 3D reconstruction) algorithm usually determines the connection between the points by finding the matching points of the pixel points/feature points in the reference picture in all the source pictures, and usually uses the phase difference mark. Based on this connection, through the triangulation relationship, the depth value of each pixel in the reference picture can be further calculated. The depth value is fused by traditional multi-view fusion algorithm, and finally a stereo vision model of the scene can be generated.
  • the operation of finding the matching point in the above method consumes a lot of computing resources, and if the accuracy of the matching point is low, the accuracy of the subsequently generated stereo vision model may be poor.
  • the embodiments of the present invention provide a model generation method and model generation device that consume less computing resources and generate models with higher accuracy; to solve the problem that the existing model generation methods and model generation devices consume more computing resources and generate The technical problem of poor accuracy of the model.
  • the embodiment of the present invention provides a model generation method based on a multi-view panoramic image, which includes:
  • a depth map of the reference image is generated based on the final phase difference of the reference image, and a corresponding stereo vision model is constructed according to the depth map.
  • the embodiment of the present invention also provides a model generation method based on multi-view panoramic graphics, which includes:
  • the final cost body of the lower-level reference image of the corresponding setting level is obtained, and based on the final cost body, the reference image is calculated at the setting resolution
  • a depth map of the reference image is generated based on the final phase difference of the first-level reference image, and a corresponding stereo vision model is constructed according to the depth map.
  • the embodiment of the present invention also provides a model generation device based on multi-view panoramic graphics.
  • An embodiment of the present invention also provides a computer-readable storage medium, which stores processor-executable instructions, and the instructions are loaded by one or more processors to execute any of the above-mentioned model generation methods based on multi-view panoramic images .
  • the multi-view panoramic image-based model generation method and model generation device of the present invention increase the set resolution by calculating and fusing the cost volume of multiple source images and reference images.
  • the accuracy of the estimated difference between the following estimates is effectively improved, thereby effectively improving the accuracy of the generation model; at the same time, the calculation and fusion of the cost body consumes less computing resources, so it can reduce the computing resource consumption of the entire model generation process; effectively solve the problem
  • Some model generation methods and model generation devices consume a lot of computing resources and generate a model with poor accuracy.
  • FIG. 1 is a flowchart of a first embodiment of a method for generating a model based on a multi-view panoramic image of the present invention
  • FIG. 2 is a flowchart of a second embodiment of the method for generating a model based on a multi-view panoramic image of the present invention
  • FIG. 3 is a schematic diagram of the operation of folding and reducing the dimensionality of a first-level reference image into four second-level reference images;
  • FIG. 4 is a schematic diagram of the operation of tiling and upgrading four third-level reference images into a second-level reference image
  • FIG. 5 is a schematic structural diagram of a first embodiment of a model generating device based on a multi-view panoramic image of the present invention
  • Fig. 6 is a schematic structural diagram of a second embodiment of a device for generating a model based on a multi-view panoramic image of the present invention
  • FIG. 7 is a schematic flowchart of a specific embodiment of a model generation method and a model generation device based on a multi-view panoramic image of the present invention.
  • the multi-view panoramic image-based model generation method and the model generation device of the present invention are used for electronic equipment that generates a corresponding stereo vision model based on a reference image with overlapping perspectives and a final cost volume of the source image.
  • This electronic device includes but is not limited to wearable devices, head-mounted devices, medical and health platforms, personal computers, server computers, handheld or laptop devices, mobile devices (such as mobile phones, personal digital assistants (PDA), media players) Etc.), multi-processor systems, consumer electronic devices, small computers, large computers, distributed computing environments including any of the above systems or devices, and so on.
  • the electronic device is preferably a model creation terminal or a model creation server that creates a stereo vision model based on a reference image and a source image, so as to provide a stereo vision model with higher accuracy.
  • FIG. 1 is a flowchart of a first embodiment of a method for generating a model based on a multi-view panoramic image of the present invention.
  • the model generation method of this embodiment can be implemented using the above-mentioned electronic equipment.
  • the model generation method based on the multi-view panoramic image of this embodiment includes:
  • Step S101 Obtain a reference image and a plurality of corresponding source images, where the source image and the reference image have overlapping perspectives;
  • Step S102 Obtain the source camera parameters of the source image and the reference camera parameters of the reference image, and calculate the image correction rotation matrix of the source image and the reference image based on the source camera parameters and the reference camera parameters;
  • Step S103 extracting the reference image feature of the reference image and the source image feature of the source image, and calculating the cost volume of the reference image and the source image based on the reference image feature and the source image feature;
  • Step S104 Use the image correction rotation matrix to convert the coordinate system of the cost volume to obtain the corrected cost volume of the source image and the reference image;
  • Step S105 Perform a fusion operation on the corrected cost bodies of multiple source images corresponding to the reference image to obtain a final cost body;
  • Step S106 based on the final cost volume, calculate the phase difference distribution estimate of the reference image at the set resolution, and calculate the estimated phase difference at the set resolution;
  • Step S107 fusing the estimated phase difference of the reference image at each level of resolution to obtain the final phase difference of the reference image
  • step S108 a depth map of the reference image is generated based on the final phase difference of the reference image, and a corresponding stereo vision model is constructed according to the depth map.
  • a model generation device (such as a model creation server, etc.) obtains a reference image and a plurality of corresponding source images, where the source image and the reference image have overlapping perspectives.
  • the reference image is a standard image that needs to generate a stereo vision model
  • the source image is a reference image for generating a stereo vision model
  • the reference image and the source image can be images of the same item that are photographed from different angles.
  • step S102 the model generating device needs to calculate the relative position relationship between the reference image and each source image, and obtain the corresponding image correction rotation matrix.
  • the projection matrix corresponding to the reference image K 0 is the intrinsic matrix of the reference image
  • [R 0 t 0 ] is the extrinsic matrix of the reference image
  • R 0 is the rotation matrix of the reference image
  • t 0 is the reference image
  • P 1 , P 2 , ... P n are the projection matrices of n source images
  • P n K n ⁇ [R n t n ].
  • the x-axis of the image correction coordinate system can be set.
  • the relative position relationship between the reference image and the corresponding source image is determined through the projection matrix, and a corrected reference image is generated, so that the corrected reference image is only horizontally displaced to the left from the source image.
  • step S103 the model generation device uses the preset neural network to perform feature extraction on the reference image to obtain the reference image feature, and at the same time uses the preset neural network to perform feature extraction on the source image to obtain the source image feature.
  • the model generation device calculates the cost volume of the reference image and the source image based on the reference image feature and the source image feature.
  • the cost volume represents the depth probability value of the reference image in the stereo space.
  • the cost body of the reference image and the source image can be calculated based on the following formula:
  • c represents the number of feature degree channels of the feature map
  • h represents the width of the feature map
  • w represents the height of the feature map
  • F 0 is the feature map of the reference image
  • F 1 is the feature map of the source image
  • C(q,i,j ) Is the cost body of the reference image and the source image, where i is the row position of the cost body, j is the column position of the cost body
  • q is a set difference value
  • g(F 1 , q) represents the overall extension of the feature map F 1 Shift q and pixels in the w direction.
  • step S104 the model generation device uses the image correction rotation matrix obtained in step S102 to convert the coordinate system of the cost volume obtained in step S103 to obtain the corrected cost volume (the cost volume in the corrected view angle) between the source image and the reference image, so as to perform The cost bodies of multiple different source images and reference images are subsequently fused.
  • correction cost body of the source image and the reference image can be calculated by the following formula:
  • R 0 is the rotation matrix of the reference image
  • R is the image correction rotation matrix of the source image and the reference image
  • C'(m, n, p) is the correction cost volume of the source image and the reference image.
  • step S105 the model generation device performs a fusion operation on the corrected cost bodies of the multiple source images corresponding to the reference images obtained in step S104 to obtain the final cost bodies.
  • the model generation device may use the element-wise maximum pooling operation to perform a fusion operation on the corrected cost bodies of multiple source images corresponding to the reference image to obtain the final cost body.
  • the reference image has corresponding source image A, source image B, and source image C
  • the correction cost body of source image A has elements A1, A2, and A3
  • the correction cost body has elements B1, B2, B3
  • the correction cost body of the source image C has elements C1, C2, C3.
  • the final cost body after fusion has elements A1, B2, C3.
  • step S106 the model generation device calculates the estimated phase difference distribution of the reference image at the set resolution based on the final cost volume obtained in step S105, and calculates the estimated phase difference at the set resolution.
  • the model generation device uses a preset neural network to calculate the phase difference distribution estimation of the reference image at the set resolution based on the final cost volume. That is, under the set resolution, the final cost body calculated by the preset neural network will correspond to the phase difference distribution estimation, and then the estimated phase difference at the resolution can be calculated through the phase difference distribution estimation.
  • the preset neural network can be obtained through model training of positive and negative samples.
  • the size of the detected object in the reference image is 0.3% to 10% of the reference image size. If the detected object is larger than 10% of the reference image size, it may cause insensitivity to the detection of the motion of the detected object, such as The detected object is less than 0.3% of the size of the reference image. It may cause the movement of the corresponding detection object to be unable to be detected. In the case of a smaller resolution, the reference image pays more attention to the more detailed detection object movement, and in the case of a larger resolution, the reference image pays more attention to the more macroscopic detection object movement.
  • the size of the detection object that is highly sensitive to human eyes and comfortable for human eyes to observe is 0.3% to 10% of the entire image size, so traditional stereo vision models will use more computing resources
  • the matching point is calculated at this resolution, and this embodiment uses the final cost volume to calculate the estimated phase difference for the reference image at this resolution and the corresponding multiple source images, which can greatly reduce the reference at this resolution.
  • the computational cost of matching points between the image and the source image is 0.3% to 10% of the entire image size, so traditional stereo vision models will use more computing resources.
  • step S107 since the final phase difference of the reference image is synthesized from the estimated phase difference of the reference image at each resolution, the size of the detection object that the user pays attention to in the reference image is different under different resolutions. Therefore, the model generation device fuses the estimated phase difference of the reference image at each layer resolution, so that the final phase difference of the reference image can be obtained.
  • step S108 the model generating device generates a depth map of the reference image based on the final phase difference obtained in step S107, and constructs a corresponding stereo vision model according to the depth map.
  • the model generating device may generate the depth map of the reference image by the following formula.
  • f is the focal length of the camera corresponding to the reference image
  • b is the baseline length in the multi-eye panoramic image stereo system
  • d is the estimated phase difference
  • the final phase difference can be converted into a depth map, and then the multi-view depth map is subjected to mutual inspection to eliminate abnormal points, which can be used to generate a 3D point cloud, and finally generate a corresponding stereo vision model.
  • the model generation method based on the multi-view panoramic image of this embodiment increases the accuracy of the estimated phase difference at the set resolution by calculating and fusing the cost volumes of multiple source images and reference images, thereby effectively improving the generation
  • the accuracy of the model at the same time, the calculation and fusion of the cost body consumes less computing resources, so the computing resource consumption of the entire model generation process can be reduced.
  • FIG. 2 is a flowchart of a second embodiment of a method for generating a model based on a multi-view panoramic image of the present invention.
  • the model generation method of this embodiment can be implemented using the above-mentioned electronic equipment.
  • the model generation method based on the multi-view panoramic image of this embodiment includes:
  • Step S201 Obtain a reference image and a plurality of corresponding source images, where the source image and the reference image have overlapping perspectives;
  • Step S202 Perform a folding and dimensionality reduction operation on the first-level reference image to obtain at least one lower-level reference image corresponding to the first-level reference image; perform a folding and dimensionality-reduction operation on the first-level source image to obtain at least one corresponding to the first-level source image.
  • Step S203 Use the first preset residual convolution network to perform feature extraction on the lower-level reference image to obtain the lower-level reference image features; use the first preset residual convolution network to perform feature extraction on the lower-level source image to obtain Lower-level source image characteristics;
  • Step S204 based on the features of the lower-level reference image of the set level and the source image features of the set level, obtain the final cost body of the lower-level reference image of the corresponding set level, and calculate the reference image at the set resolution based on the final cost body.
  • Step S205 based on the lower-level reference image features of other levels and the source image features of other levels, obtain the lower-level reference image phase difference distribution estimation features of other levels of the reference image;
  • Step S206 using the second preset residual convolutional network, perform feature extraction on the phase difference distribution estimation feature of the lower-level reference image to obtain the difference feature of the lower-level reference image;
  • Step S207 Obtain an estimated phase difference of the lower-level reference image based on the difference feature of the lower-level reference image
  • Step S208 Perform a tiling and dimension upgrade operation on the difference feature to obtain the corrected difference feature of the first-level reference image; perform a tiling and dimension upgrade operation on the estimated phase difference to obtain the corrected phase difference of the first-level reference image;
  • Step S209 Obtain the final phase difference of the first-level reference image according to the corrected difference characteristics of the reference image, the source image, the first-level reference image, and the corrected phase difference of the first-level reference image;
  • step S210 a depth map of the reference image is generated based on the final phase difference of the first-level reference image, and a corresponding stereo vision model is constructed according to the depth map.
  • step S201 the model generating device acquires a reference image taken by a multi-eye camera and a plurality of corresponding source images, where the source image and the reference image have overlapping angles of view.
  • step S202 the model generation device performs a folding and dimensionality reduction operation on the first-level reference image, and obtains multiple lower-level reference images corresponding to the first-level reference image, such as four second-level reference images; Performing the folding and dimensionality reduction operation, four third-level reference images can be obtained.
  • FIG. 3 is a schematic diagram of the operation of folding and reducing the dimensionality of a first-level reference image into four second-level reference images.
  • the resolution of the first-level reference image is 4*4; the resolution of the second-level reference image is 2*2.
  • the model generation device also performs a folding and dimensionality reduction operation on the first-level source image to obtain multiple lower-level source images corresponding to the first-level source image, such as four second-level source images; such as continuing to fold the second-level source image With the dimensionality reduction operation, four third-level source images can be obtained.
  • the setting of reference images of different levels or resolutions can better meet the needs of different scene items in the scene.
  • the model generation device uses the first preset residual convolutional network to perform feature extraction on multiple lower-level reference images (such as second-level reference images and third-level reference images) obtained in step S202 to obtain Multiple lower-level reference image features at different levels.
  • multiple lower-level reference images such as second-level reference images and third-level reference images
  • the model generation device uses the first preset residual convolutional network to perform feature extraction on the multiple lower-level source images acquired in step S202 to obtain multiple lower-level source image features at different levels.
  • step S204 the model generation device obtains the final cost volume of the corresponding lower-level reference image of the set level based on the lower-level reference image feature of the set level and the source image feature of the set level.
  • the model generation device obtains the final cost volume of the corresponding lower-level reference image of the set level based on the lower-level reference image feature of the set level and the source image feature of the set level.
  • the model generation device calculates the image phase difference distribution estimation feature of the lower-level reference image of the reference image at the set resolution based on the final cost volume.
  • the model generating device may use a preset neural network to calculate the image phase difference distribution estimation feature of the lower-level reference image of the reference image at the set resolution. That is, under the set resolution, the final cost body calculated by the preset neural network will correspond to the phase difference distribution estimation, and then the estimated phase difference at the resolution can be calculated through the phase difference estimation.
  • the preset neural network can be obtained through model training of positive and negative samples.
  • the size of the detection object that is highly sensitive to human eyes and comfortable for human eyes to observe is 0.3% to 10% of the entire image size, so traditional stereo vision models will use more computing resources
  • the matching point is calculated at this resolution, and this embodiment uses the final cost volume to calculate the estimated phase difference for the reference image at this resolution and the corresponding multiple source images, which can greatly reduce the reference at this resolution.
  • the computational cost of matching points between the image and the source image is 0.3% to 10% of the entire image size, so traditional stereo vision models will use more computing resources.
  • step S205 the model generation device obtains the lower-level reference image phase difference distribution estimation features of other levels of the reference image based on the lower-level reference image features of other levels and the source image features of other levels. Due to the low consumption of computing resources at other resolutions, the existing feature point matching algorithm can be used to calculate the phase difference distribution estimation features of the lower-level reference images of other levels of the reference image.
  • step S206 the model generation device uses the second preset residual convolutional network to perform feature extraction on the phase difference distribution estimation features of the lower-level reference images acquired in step S204 and step S205 to obtain the difference features of the lower-level reference images.
  • step S207 the model generation device obtains the estimated phase difference of the lower-level reference image based on the acquired difference features of the lower-level reference image. That is, based on the preset estimated phase difference corresponding to the difference feature of the lower-level reference image, the estimated phase difference of the corresponding lower-level reference image is determined. If the difference feature of the lower-level reference image corresponds to a large difference in the preset estimates, the corresponding lower-level reference image has a relatively large difference; if the difference feature of the lower-level reference image corresponds to a small difference in the preset estimates, the corresponding The estimated phase difference of the obtained lower-level reference image is also small.
  • the preset estimated phase difference can be obtained through model training of positive and negative samples.
  • step S208 the model generation device performs a tiling-upgrading operation on the difference feature of the lower-level reference image acquired in step S206 to obtain the corrected difference feature of the first-level reference image;
  • the estimated phase difference is tiled and upgraded to obtain the corrected phase difference of the first-level reference image.
  • the model generation device can perform a tiling and dimensional upscaling operation on the difference feature of the third-level reference image to obtain the corrected difference feature of the second-level reference image, and the corrected difference feature of the second-level reference image can be used to calculate the second-level reference The difference feature of the image; the model generating device can then perform a tiling and dimension upgrade operation on the difference feature of the second-level reference image to obtain the corrected difference feature of the first-level reference image.
  • FIG. 4 is a schematic diagram of the operation of tiling and upgrading four third-level reference images into a second-level reference image.
  • the resolution of the image corresponding to the difference feature of the third-level reference image is 2*2; the resolution of the image corresponding to the corrected difference feature of the second-level reference image is 4*4.
  • the model generation device can perform a tiling and upscaling operation on the estimated phase difference of the third-level reference image to obtain the corrected phase difference of the second-level reference image.
  • the corrected phase difference of the second-level reference image can be used to calculate the second-level reference image.
  • step S209 the model generation device performs feature fusion on the reference image and source image obtained in step S201, and step S208 obtains the corrected difference feature of the first-level reference image and the corrected phase difference of the first reference image, and performs feature fusion according to the fused
  • the feature obtains the final phase difference of the corresponding first-level reference image.
  • the corresponding relationship between the fused feature and the final phase difference of the first-level reference image can be obtained through model training of positive and negative samples.
  • step S210 the model generating device generates a depth map of the reference image based on the final phase difference obtained in step 209, and constructs a corresponding stereo vision model according to the depth map.
  • the final phase difference can be converted into a depth map, and then the multi-view depth map is subjected to mutual inspection to eliminate abnormal points, which can be used to generate a 3D point cloud, and finally generate a corresponding stereo vision model.
  • the multi-view panoramic image-based model generation method of this embodiment uses the final cost volume at the set resolution to calculate the phase difference distribution estimation feature of the lower-level reference image, and directly uses the image at other resolutions.
  • the feature is used to calculate the phase difference distribution estimation feature of the lower-level reference image.
  • the accuracy of the generated model is further improved, and the computational resource consumption of the model generation process is reduced.
  • the present invention also provides a model generating device based on a multi-view panoramic image.
  • FIG. 5 is a schematic structural diagram of a first embodiment of the model generating device based on a multi-view panoramic image of the present invention.
  • the model generation device of this embodiment can be implemented using the first embodiment of the above-mentioned model generation method.
  • the model generation device 50 of this embodiment includes an image acquisition module 51, an image correction rotation matrix calculation module 52, a cost volume calculation module 53, a cost volume conversion module 54, a cost volume fusion module 55, a set estimated phase difference calculation module 56, and a phase difference fusion Module 57 and model building module 58.
  • the image acquisition module 51 is used to acquire a reference image and a plurality of corresponding source images, where the source image and the reference image have overlapping angles of view;
  • the image correction rotation matrix calculation module 52 is used to acquire the source camera parameters of the source image and the reference of the reference image Camera parameters, and based on the source camera parameters and reference camera parameters, calculate the image correction rotation matrix of the source image and the reference image;
  • the cost volume calculation module 53 is used to extract the reference image features of the reference image and the source image features of the source image, and based on the reference The image feature and the source image feature calculate the cost body of the reference image and the source image;
  • the cost body conversion module 54 is used to convert the coordinate system of the cost body using the image correction rotation matrix to obtain the corrected cost body of the source image and the reference image; cost body fusion
  • the module 55 is used to perform a fusion operation on the correction cost bodies of multiple source images corresponding to the reference image to obtain the final cost body;
  • the set estimated phase difference calculation module 56 is used to calculate the reference image at
  • the image acquisition module 51 first acquires a reference image and a plurality of corresponding source images, where the source image and the reference image have overlapping perspectives.
  • the reference image is a standard image that needs to generate a stereo vision model
  • the source image is a reference image for generating a stereo vision model
  • the reference image and the source image can be images taken from different angles of a unified object.
  • the image correction rotation matrix calculation module 52 needs to calculate the relative position relationship between the reference image and each source image, and obtain the corresponding image correction rotation matrix.
  • the relative position relationship between the reference image and the corresponding source image is determined through the projection matrix, and a corrected reference image is generated, so that the corrected reference image is only horizontally displaced to the left from the source image.
  • the cost volume calculation module 53 uses the preset neural network to perform feature extraction on the reference image to obtain the reference image feature, and at the same time uses the preset neural network to perform feature extraction on the source image to obtain the source image feature.
  • the cost volume calculation module 54 calculates the cost volume of the reference image and the source image based on the reference image feature and the source image feature.
  • the cost volume represents the depth probability value of the reference image in the stereo space.
  • the cost body of the reference image and the source image can be calculated based on the following formula:
  • c represents the number of feature degree channels of the feature map
  • h represents the width of the feature map
  • w represents the height of the feature map
  • F 0 is the feature map of the reference image
  • F 1 is the feature map of the source image
  • C(q,i,j ) Is the cost body of the reference image and the source image, where i is the row position of the cost body, j is the column position of the cost body
  • q is a set difference value
  • g(F 1 , q) represents the overall extension of the feature map F 1 Shift q and pixels in the w direction.
  • the cost volume conversion module 55 uses the image correction rotation matrix to convert the coordinate system of the cost volume to obtain the corrected cost volume of the source image and the reference image (the cost volume under the corrected view angle), so as to perform subsequent corrections to multiple different source images and reference images.
  • the cost of the body is merged.
  • correction cost body of the source image and the reference image can be calculated by the following formula:
  • R 0 is the rotation matrix of the reference image
  • R is the image correction rotation matrix of the source image and the reference image
  • C'(m, n, p) is the correction cost volume of the source image and the reference image.
  • the cost body fusion module 55 performs a fusion operation on the corrected cost bodies of the multiple source images corresponding to the obtained reference images to obtain the final cost body.
  • the cost body fusion module 55 may use the element-wise maximum pooling operation to perform a fusion operation on the corrected cost bodies of multiple source images corresponding to the reference image to obtain the final cost body.
  • the set estimated phase difference calculation module 56 calculates the estimated phase difference distribution of the reference image at the set resolution based on the obtained final cost volume, and calculates the estimated phase difference at the set resolution.
  • the set estimated phase difference calculation module 56 uses a preset neural network to calculate the estimated phase difference distribution of the reference image at the set resolution based on the final cost body. That is, under the set resolution, the final cost body calculated by the preset neural network will correspond to the phase difference distribution estimation, and then the estimated phase difference at the resolution can be calculated through the phase difference distribution estimation.
  • the preset neural network can be obtained through model training of positive and negative samples.
  • the size of the detected object in the reference image is 0.3% to 10% of the reference image size. If the detected object is larger than 10% of the reference image size, it may cause insensitivity to the detection of the motion of the detected object, such as The detected object is less than 0.3% of the reference image size. It may cause the movement of the corresponding detection object to be unable to be detected. In the case of a smaller resolution, the reference image pays more attention to the more detailed detection object movement, and in the case of a larger resolution, the reference image pays more attention to the more macroscopic detection object movement.
  • the size of the detection object that is highly sensitive to human eyes and comfortable for human eyes to observe is 0.3% to 10% of the entire image size, so traditional stereo vision models will use more computing resources
  • the matching point is calculated at this resolution, and this embodiment uses the final cost volume to calculate the estimated phase difference for the reference image at this resolution and the corresponding multiple source images, which can greatly reduce the reference at this resolution.
  • the computational cost of matching points between the image and the source image is 0.3% to 10% of the entire image size, so traditional stereo vision models will use more computing resources.
  • the phase difference fusion module 57 fuses the estimated phase difference of the reference image at each layer resolution, so as to obtain the final phase difference of the reference image.
  • model construction module 58 generates a depth map of the reference image based on the final phase difference, and constructs a corresponding stereo vision model according to the depth map.
  • model construction module 58 may generate the depth map of the reference image by the following formula.
  • f is the focal length of the camera corresponding to the reference image
  • b is the baseline length in the multi-eye panoramic image stereo system
  • d is the estimated phase difference
  • the final phase difference can be converted into a depth map, and then the multi-view depth map is subjected to mutual inspection to eliminate abnormal points, which can be used to generate a 3D point cloud, and finally generate a corresponding stereo vision model.
  • the model generation device based on the multi-view panoramic image of this embodiment increases the accuracy of the estimated phase difference at the set resolution by calculating and fusing the cost volumes of multiple source images and reference images, thereby effectively improving the generation
  • the accuracy of the model at the same time, the calculation and fusion of the cost body consumes less computing resources, so the computing resource consumption of the entire model generation process can be reduced.
  • FIG. 6 is a schematic structural diagram of a second embodiment of a model generating device based on a multi-view panoramic image of the present invention.
  • the model generation device of this embodiment can be implemented using the second embodiment of the above-mentioned model generation method.
  • the model generation device 60 of this embodiment includes an image acquisition module 61, a folding dimension reduction module 62, a feature extraction module 63, a first phase difference distribution estimation feature calculation module 64, a second phase difference distribution estimation feature calculation module 65, and a difference feature acquisition module 66 , Estimated phase difference calculation module 67, tiled dimension upgrade module 68, final phase difference acquisition module 69, and model construction module 6A.
  • the image acquisition module 61 is used to acquire a reference image and multiple corresponding source images, where the source image and the reference image have overlapping perspectives;
  • the folding dimension reduction module 62 is used to perform a folding and dimensionality reduction operation on the first-level reference image to acquire the first-level At least one lower-level reference image corresponding to the reference image; performing a folding and dimensionality reduction operation on the first-level source image to obtain at least one lower-level source image corresponding to the first-level source image;
  • the feature extraction module 63 is configured to use the first preset residual volume Integral network, which extracts the features of the lower-level reference image to obtain the features of the lower-level reference image; uses the first preset residual convolution network to extract the features of the lower-level source image to obtain the features of the lower-level source image;
  • the calculation module 64 is configured to obtain the final cost body of the lower reference image of the corresponding setting level based on the features of the lower-level reference image of the set level and the source image features of
  • the image acquisition module 61 first acquires a reference image taken by a multi-view camera and a plurality of corresponding source images, wherein the source image and the reference image have overlapping angles of view.
  • the folding dimension reduction module 62 performs a folding and dimensionality reduction operation on the first-level reference image, and obtains multiple lower-level reference images corresponding to the first-level reference image, such as four second-level reference images; such as continuing to fold the second-level reference image With dimensionality reduction operations, four third-level reference images can be obtained.
  • the folding dimensionality reduction module 62 also performs a folding and dimensionality reduction operation on the first-level source image, and obtains multiple lower-level source images corresponding to the first-level source image, such as four second-level source images; such as continuing to the second-level source image Performing folding and dimensionality reduction operations, four third-level source images can be obtained.
  • the setting of reference images of different levels or resolutions can better meet the needs of different scene items in the scene.
  • the feature extraction module 63 uses the first preset residual convolutional network to perform feature extraction on multiple lower-level reference images (such as second-level reference images and third-level reference images, etc.) to obtain multiple different levels of lower-level reference Image characteristics.
  • multiple lower-level reference images such as second-level reference images and third-level reference images, etc.
  • the feature extraction module 63 uses the first preset residual convolutional network to perform feature extraction on multiple lower-level source images to obtain multiple lower-level source image features at different levels.
  • the first phase difference distribution estimation feature calculation module 64 obtains the final cost volume of the corresponding lower-level reference image of the set level based on the lower-level reference image characteristics of the set level and the source image features of the set level.
  • the first phase difference distribution estimation feature calculation module 64 calculates the image phase difference distribution estimation feature of the lower-level reference image of the reference image at the set resolution based on the final cost volume.
  • the first phase difference distribution estimation feature calculation module 64 may use a preset neural network to calculate the image phase difference distribution estimation feature of the lower-level reference image of the reference image at the set resolution. That is, under the set resolution, the final cost body calculated by the preset neural network will correspond to the phase difference distribution estimation, and then the estimated phase difference at the resolution can be calculated through the phase difference estimation.
  • the preset neural network can be obtained through model training of positive and negative samples.
  • the second phase difference distribution estimation feature calculation module 65 obtains the phase difference distribution estimation features of other levels of the reference image based on the lower-level reference image features of other levels and the source image features of other levels. Due to the low consumption of computing resources at other resolutions, the existing feature point matching algorithm can be used to calculate the phase difference distribution estimation features of the lower-level reference images of other levels of the reference image.
  • the difference feature acquisition module 66 uses the second preset residual convolution network to perform feature extraction on the phase difference distribution estimation feature of the lower-level reference image to obtain the difference feature of the lower-level reference image.
  • the estimated phase difference calculation module 67 obtains the estimated phase difference of the lower-level reference image based on the acquired difference features of the lower-level reference image. That is, based on the preset estimated phase difference corresponding to the difference feature of the lower-level reference image, the estimated phase difference of the corresponding lower-level reference image is determined. If the difference feature of the lower-level reference image corresponds to a large difference in the preset estimates, the corresponding lower-level reference image has a relatively large difference; if the difference feature of the lower-level reference image corresponds to a small difference in the preset estimates, the corresponding The estimated phase difference of the obtained lower-level reference image is also small.
  • the preset estimated phase difference can be obtained through model training of positive and negative samples.
  • the tile upgrade module 68 performs tile upgrade operation on the difference feature of the lower-level reference image to obtain the corrected difference feature of the first-level reference image; the tile upgrade module performs tile upgrade on the estimated phase difference of the lower-level reference image , To get the corrected phase difference of the first-level reference image.
  • the tile upgrade module 68 can perform tile upgrade operations on the difference features of the third-level reference image to obtain the corrected difference feature of the second-level reference image.
  • the corrected difference feature of the second-level reference image can be used to calculate the first-level reference image.
  • the tile upgrade module 68 can perform tile upgrade operations on the estimated phase difference of the third-level reference image to obtain the corrected phase difference of the second-level reference image.
  • the corrected phase difference of the second-level reference image can be used to calculate the first-level reference image.
  • the final phase difference acquisition module 69 performs feature fusion on the reference image, the corrected difference feature of the first-level reference image of the source image, and the corrected phase difference of the first reference image, and obtains the final corresponding first-level reference image according to the fused features. difference.
  • the corresponding relationship between the fused feature and the final phase difference of the first-level reference image can be obtained through model training of positive and negative samples.
  • model construction module 6A generates a depth map of the reference image based on the final phase difference, and constructs a corresponding stereo vision model according to the depth map.
  • the final phase difference can be converted into a depth map, and then the multi-view depth map is subjected to mutual inspection to eliminate abnormal points, which can be used to generate a 3D point cloud, and finally generate a corresponding stereo vision model.
  • the multi-view panoramic image-based model generation device of this embodiment uses the final cost volume at the set resolution to calculate the phase difference distribution estimation feature of the lower-level reference image, and directly uses the image at other resolutions.
  • the feature is used to calculate the phase difference distribution estimation feature of the lower-level reference image.
  • the accuracy of the generated model is further improved, and the computational resource consumption of the model generation process is reduced.
  • FIG. 7 is a schematic flowchart of a specific embodiment of a model generation method and a model generation device based on a multi-view panoramic image of the present invention.
  • the model generation method and generation device of this specific embodiment generate a multi-resolution feature map by performing multiple folding and dimensionality reduction on the first-level reference image and the corresponding first-level source image.
  • the resolution level can be adjusted according to the actual reference image size to ensure that the minimum resolution difference evaluation can include the maximum difference between the reference image and the source image.
  • the actual value of the difference is predicted based on the phase difference distribution generated by the left and right eye image feature maps and the feature map of the image at that resolution.
  • the reference image and the final cost volume of the corresponding multiple source images are used to calculate the estimated phase difference, thereby greatly reducing the matching point between the reference image and the source image at this resolution. The calculated cost.
  • the predicted phase difference and the feature map used to generate the prediction will be passed to the upper-level reference image for fusion processing through the tiling and upgrading operation, and the dense phase difference diagram of the original resolution is generated through multiple tiling and upgrading operations, which is further based on the phase difference map.
  • the multi-view panoramic image-based model generation method and model generation device of the present invention increase the accuracy of the estimated phase difference at the set resolution by calculating and fusing the cost bodies of multiple source images and reference images, thereby effectively Improved the accuracy of the generative model; at the same time, the calculation and fusion of the cost body consumes less computing resources, so it can reduce the computing resource consumption of the entire model generation process; effectively solves the existing model generation methods and calculations of the model generation device The technical problem of high resource consumption and poor accuracy of the generated model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)

Abstract

一种基于多目全景图像的模型生成方法,其包括:计算源图像和参考图像的图像矫正旋转矩阵;提取参考图像的参考图像特征以及源图像的源图像特征;对参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体;计算设定分辨率下的估计相差;得到参考图像的最终相差;生成参考图像的深度图,并构建对应的立体视觉模型。

Description

基于多目全景图像的模型生成方法及装置 技术领域
本发明涉及图像处理技术领域,特别是涉及一种基于多目全景图像的模型生成方法及装置。
背景技术
传统的多视角立体视觉,通常采用单目相机在多个预先调校的空间位置进行图像采样。设某参考视角产生的图片为参考图片,所有与该视角有重合的视角产生的图片为源图片。传统MVS(Mult-view Stereo,多视点三维重建)算法通常通过寻找参考图片中的像素点/特征点在所有源图片中的匹配点,来确定点之间联系,通常使用相差标示。基于该联系,通过三角化关系,可以进一步计算得出参考图片中每一像素点的深度数值。该深度数值通过传统多视角融合算法融合,最终可产生该场景的立体视觉模型。
但是上述方法中寻找匹配点的操作对计算资源消耗较大,且如果匹配点准确度较低可能导致后续生成的立体视觉模型的精准度较差。
故,有必要提供一种基于多目全景图像的模型生成方法及装置,以解决现有技术所存在的问题。
发明内容
本发明实施例提供一种对计算资源消耗较低且生成模型的精准度较高的模型生成方法及模型生成装置;以解决现有的模型生成方法及模型生成装置的计算资源消耗较大且生成模型的精准度较差的技术问题。
本发明实施例提供一种基于多目全景图像的模型生成方法,其包括:
获取参考图像以及对应的多个源图像,其中所述源图像与所述参考图像具有重合视角;
获取所述源图像的源相机参数以及所述参考图像的参考相机参数,并基于所述源相机参数以及所述参考相机参数,计算所述源图像和所述参考图像的图像矫正旋转矩阵;
提取所述参考图像的参考图像特征以及所述源图像的源图像特征,并基于所述参考图像特征以及所述源图像特征计算所述参考图像与所述源图像的代价体;
使用所述图像矫正旋转矩阵对所述代价体进行坐标系转换,得到所述源图像与所述参考图像的矫正代价体;
对所述参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终 代价体;
基于所述最终代价体,计算所述参考图像在设定分辨率下的相差分布估计,并计算所述设定分辨率下的估计相差;
对所述参考图像在每一层分辨率下的估计相差进行融合,得到所述参考图像的最终相差;
基于所述参考图像的最终相差生成所述参考图像的深度图,并根据所述深度图构建对应的立体视觉模型。
本发明实施例还提供一种基于多目全景图形的模型生成方法,其包括:
获取参考图像以及对应的多个源图像,其中所述源图像与所述参考图像具有重合视角;
对所述第一级参考图像进行折叠降维操作,获取所述第一级参考图像对应的至少一个下级参考图像;对所述第一级源图像进行折叠降维操作,获取所述第一级源图像对应的至少一个下级源图像;
使用第一预设残差卷积网络,对所述下级参考图像进行特征提取,以得到下级参考图像特征;使用第一预设残差卷积网络,对所述下级源图像进行特征提取,以得到下级源图像特征;
基于设定层级的下级参考图像特征以及设定层级的源图像特征,得到对应的设定层级的下级参考图像的最终代价体,并基于所述最终代价体,计算所述参考图像在设定分辨率下的下级参考图像相差分布估计特征;
基于其他层级的下级参考图像特征以及其他层级的源图像特征,得到所参考图像的其他层级的下级参考图像相差分布估计特征;
使用第二预设残差卷积网络,对所述下级参考图像相差分布估计特征进行特征提取,得到下级参考图像的差异特征;
基于所述下级参考图像的差异特征,得到所述下级参考图像的估计相差;
对所述差异特征进行平铺升维操作,以得到第一级参考图像的修正差异特征;对所述估计相差进行平铺升维操作,以得到第一级参考图像的修正相差;
根据所述参考图像、所述源图像、所述第一级参考图像的修正差异特征以及所述第一级参考图像的修正相差,得到第一级参考图像的最终相差;
基于所述第一级参考图像的最终相差生成所述参考图像的深度图,并根据所述深度图构建对应的立体视觉模型。
本发明实施例还提供一种基于多目全景图形的模型生成装置。
本发明实施例还提供一种计算机可读存储介质,其内存储有处理器可执行指令,所述指令由一个或一个以上处理器加载,以执行上述任一基于多目全景图像的模型生成方法。
相较于现有技术的模型生成方法,本发明的基于多目全景图像的模型生成 方法以及模型生成装置通过对多个源图像与参考图像的代价体进行计算以及融合,提高了设定分辨率下的估计相差的准确性,从而有效的提高了生成模型的精准度;同时代价体的计算和融合对计算资源的消耗较小,因此可降低整个模型生成流程的计算资源消耗;有效解决了现有的模型生成方法及模型生成装置的计算资源消耗较大且生成模型的精准度较差的技术问题。
附图说明
图1为本发明的基于多目全景图像的模型生成方法的第一实施例的流程图;
图2为本发明的基于多目全景图像的模型生成方法的第二实施例的流程图;
图3为一个第一级参考图像折叠降维成四个第二级参考图像的操作示意图;
图4为四个第三级参考图像平铺升维成一个第二级参考图像的操作示意图;
图5为本发明的基于多目全景图像的模型生成装置的第一实施例的结构示意图;
图6为本发明的基于多目全景图像的模型生成装置的第二实施例的结构示意图;
图7为本发明的基于多目全景图像的模型生成方法及模型生成装置的具体实施例的流程示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明的基于多目全景图像的模型生成方法以及模型生成装置用于基于具有重合视角的参考图像以及源图像的最终代价体来生成对应的立体视觉模型的电子设备。该电子设备包括但不限于可穿戴设备、头戴设备、医疗健康平台、个人计算机、服务器计算机、手持式或膝上型设备、移动设备(比如移动电话、个人数字助理(PDA)、媒体播放器等等)、多处理器***、消费型电子设备、小型计算机、大型计算机、包括上述任意***或设备的分布式计算环境,等等。
该电子设备优选为基于参考图像以及源图像创建立体视觉模型的模型创建终端或模型创建服务器,以提供精准度较高的立体视觉模型。
请参照图1,图1为本发明的基于多目全景图像的模型生成方法的第一实施例的流程图。本实施例的模型生成方法可使用上述的电子设备进行实施,本实施例的基于多目全景图像的模型生成方法包括:
步骤S101,获取参考图像以及对应的多个源图像,其中源图像与参考图像具有重合视角;
步骤S102,获取源图像的源相机参数以及参考图像的参考相机参数,并基于源相机参数以及参考相机参数,计算源图像和参考图像的图像矫正旋转矩阵;
步骤S103,提取参考图像的参考图像特征以及源图像的源图像特征,并基于参考图像特征以及源图像特征计算参考图像与源图像的代价体;
步骤S104,使用图像矫正旋转矩阵对代价体进行坐标系转换,得到源图像与参考图像的矫正代价体;
步骤S105,对参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体;
步骤S106,基于最终代价体,计算参考图像在设定分辨率下的相差分布估计,并计算设定分辨率下的估计相差;
步骤S107,对参考图像在每一层分辨率下的估计相差进行融合,得到参考图像的最终相差;
步骤S108,基于参考图像的最终相差生成参考图像的深度图,并根据深度图构建对应的立体视觉模型。
下面详细说明本实施例的基于多目全景图像的模型生成方法的各步骤的具体流程。
在步骤S101中,模型生成装置(如模型创建服务器等)获取参考图像以及对应的多个源图像,这里的源图像和参考图像具有重合视角。参考图像为需要生成立体视觉模型的标准图像,源图像为生成立体视觉模型的参照图像,参考图像与源图像可为对同一物品进行不同角度拍照的图像。
在步骤S102中,模型生成装置需要计算参考图像和每个源图像之间的相对位置关系,并获取对应的图像矫正旋转矩阵。
这里设定参考图像对应的投影矩阵(projection matrix)为
Figure PCTCN2021088002-appb-000001
其中K 0为参考图像的内参矩阵(intrinsic matrix),[R 0t 0]为参考图像的外参矩阵(extrinsic matrix),R 0为参考图像的旋转矩阵(rotation matrix),t 0为参考图像的平移向量(translation vector)。P 1,P 2,……P n为n个源图像的投影矩阵,同理 P n=K n·[R nt n]。在世界坐标系中,参考图像对应的相机的光心(optical centre)坐标为c 0=-R 0'·t 0,源图像对应的相机的光心坐标为c 1=-R 1'·t 1
因此可设定图像矫正坐标系的x轴,该x轴为v x=(c 1-c 0)·sign(R 0(1,:)·R 0'·t 1),其中R 0(1,:)表示旋转矩阵R 0的第一行中的所有元素,sign(R 0(1,:)·R 0'·t 1)用于确定c 1是否在c 0的右侧,即源图像是否相对参考图像实现了右向平移,如果c 1位于c 0的右侧,则v x的正方向是从参考图像位置指向P 1代表的源图像位置,该操作保证投影后,源图像位置相对参考图像位置进行右向移动。
设定图像矫正坐标系的y轴,该y轴为v y=cross(R 1(3,:),v x),其中cross函数用于计算向量积,计算结果也是一向量,该向量垂直于该操作的两个操作数,通常也是向量。
设定图像矫正坐标系的z轴,该z轴为v z=cross(v x,v y)。
这样源图像和参考图像的图像矫正旋转矩阵为:R=[v x/||v x|| 2,v y/||v y|| 2,v z/||v z|| 2]',其中||·|| 2为L 2测度。
基于该图像矫正旋转矩阵通过投影矩阵判断参考图像与对应源图像之间的相对位置关系,产生得到矫正后的参考图像,使得矫正后的参考图像相对源图像仅水平方向存在左向位移。
在步骤S103中,模型生成装置使用预设神经网络对参考图像进行特征提取,以获取参考图像特征,同时使用预设神经网络对源图像进行特征提取,获取源图像特征。
随后模型生成装置基于参考图像特征以及源图像特征计算参考图像和源图像的代价体。该代价体表示的是参考图像在立体空间中的深度概率值。具体的,可基于以下公式计算参考图像与源图像的代价体:
C(q,i,j)=F 0(:,i,j)'·g(F 1,q)(:,i,j);
F 0∈P c×h×w
其中c表示特征图的特征度信道数量,h代表特征图宽度,w代表特征图高度,F 0为参考图像的特征图,F 1为所述源图像的特征图,C(q,i,j)为参考图像与源图像的代价体,其中i为代价体的行位置,j为代价体的列位置,q为一设定相差值,g(F 1,q)代表特征图F 1整体延w方向平移q和像素。
在步骤S104中,模型生成装置使用步骤S102获取的图像矫正旋转矩阵对步骤S103获取的代价体进行坐标系转换,得到源图像与参考图像的矫正代价体(矫正视角下的代价体),以便进行后续对多个不同源图像与参考图像的代价体进行融合。
具体的,可通过以下公式计算源图像和参考图像的矫正代价体:
C'(m,n,p)=R 0·R'·C(q,i,j);
其中R 0为参考图像的旋转矩阵,R为源图像和所述参考图像的图像矫正旋转矩阵,C'(m,n,p)为源图像和参考图像的矫正代价体。
在步骤S105中,模型生成装置对步骤S104获取的参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体。
具体的,模型生成装置可使用逐元素最大池化操作对参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体。
下面通过具体对逐元素最大池化操作进行说明,比如参考图像具有对应的源图像A、源图像B以及源图像C,源图像A的矫正代价体具有元素A1、A2、A3;源图像B的矫正代价体具有元素B1、B2、B3;源图像C的矫正代价体具有元素C1、C2、C3。
如果A1、B1、C1中最大的是A1,A2、B2、C2中最大的是B2,A3、B3、C3中最大是C3,则融合后的最终代价体具有元素A1、B2、C3。
在步骤S106中,模型生成装置基于步骤S105获取的最终代价体,计算参考图像在设定分辨率下的相差分布估计,并计算设定分辨率下的估计相差。
具体的,模型生成装置基于最终代价体,使用预设神经网络计算参考图像在设定分辨率下的相差分布估计。即在设定分辨率下,最终代价体通过预设神经网络的计算会对应相差分布估计,进而可通过该相差分布估计来计算该分辨率下的估计相差。其中预设神经网络可通过正负样本的模型训练获取。
在该设定分辨率下,参考图像中检测物体的尺寸为参考图像尺寸的0.3%至10%,如检测物体大于参考图像尺寸的10%,则可能导致对检测物体运动的检测不敏感,如检测物体小于参考图像尺寸的0.3%。则可能导致无法检测到对应检测物体的运动。在更小分辨率的情况下,参考图像更加关注更细节的检测物体运动,在更大分辨率的情况下,参考图像更加关注更宏观的检测物体运动。
由于在参考图像中,对于人眼敏感度较高且人眼观察舒适度较高的检测物体的尺寸就在整个图像尺寸的0.3%至10%,因此传统立体视觉模型会采用较多的计算资源在该分辨率下进行匹配点的计算,而本实施例对该分辨率下的参考图像以及对应的多个源图像使用最终代价体进行估计相差的计算,可大大降低在此分辨率下的参考图像和源图像的匹配点的计算成本。
在步骤S107中,由于参考图像的最终相差是由各个分辨率下的参考图像的估计相差合成的,不同分辨率下,用户对参考图像中关注的检测物体的尺寸是不同的。因此模型生成装置对参考图像在每一层分辨率下的估计相差进行融合,从而可得到参考图像的最终相差。
在步骤S108中,模型生成装置基于步骤S107获取的最终相差生成参考图像的深度图,并根据深度图构建对应的立体视觉模型。
具体的,模型生成装置可通过以下公式生成参考图像的深度图。
Figure PCTCN2021088002-appb-000002
其中f为参考图像对应相机的焦距,b为多目全景图像立体***中的基线长度,d为估计相差。
通过上述公式,最终相差可转化为深度图,随后多视角深度图进行互检,剔除掉异常点,即可用于生成3D点云,最后生成对应的立体视觉模型。
这样即完成了本实施例的基于多目全景图像的模型生成方法的立体视觉模型的生成过程。
本实施例的基于多目全景图像的模型生成方法通过对多个源图像与参考图像的代价体进行计算以及融合,提高了设定分辨率下的估计相差的准确性,从而有效的提高了生成模型的精准度;同时代价体的计算和融合对计算资源的消耗较小,因此可降低整个模型生成流程的计算资源消耗。
请参照图2,图2为本发明的基于多目全景图像的模型生成方法的第二实施例的流程图。本实施例的模型生成方法可使用上述的电子设备进行实施,本实施例的基于多目全景图像的模型生成方法包括:
步骤S201,获取参考图像以及对应的多个源图像,其中源图像与参考图像具有重合视角;
步骤S202,对第一级参考图像进行折叠降维操作,获取第一级参考图像对应的至少一个下级参考图像;对第一级源图像进行折叠降维操作,获取第一级源图像对应的至少一个下级源图像;
步骤S203,使用第一预设残差卷积网络,对下级参考图像进行特征提取,以得到下级参考图像特征;使用第一预设残差卷积网络,对下级源图像进行特征提取,以得到下级源图像特征;
步骤S204,基于设定层级的下级参考图像特征以及设定层级的源图像特征,得到对应的设定层级的下级参考图像的最终代价体,并基于最终代价体,计算参考图像在设定分辨率下的下级参考图像相差分布估计特征;
步骤S205,基于其它层级的下级参考图像特征以及其他层级的源图像特征,得到参考图像的其他层级的下级参考图像相差分布估计特征;
步骤S206,使用第二预设残差卷积网络,对下级参考图像相差分布估计特征进行特征提取,得到下级参考图像的差异特征;
步骤S207,基于下级参考图像的差异特征,得到下级参考图像的估计相差;
步骤S208,对差异特征进行平铺升维操作,以得到第一级参考图像的修 正差异特征;对估计相差进行平铺升维操作,以得到第一级参考图像的修正相差;
步骤S209,根据参考图像、源图像、第一级参考图像的修正差异特征以及第一级参考图像的修正相差,得到第一级参考图像的最终相差;
步骤S210,基于第一级参考图像的最终相差生成参考图像的深度图,并根据深度图构建对应的立体视觉模型。
下面详细说明本实施例的基于多目全景图像的模型生成方法的各步骤的具体流程。
在步骤S201中,模型生成装置获取多目摄像机拍摄的参考图像以及对应的多个源图像,其中源图像与参考图像具有重合视角。
在步骤S202中,模型生成装置对第一级参考图像进行折叠降维操作,获取第一级参考图像对应多个下级参考图像,如四个第二级参考图像;如继续对第二级参考图像进行折叠降维操作,则可获取四个第三级参考图像。
具体请参照图3,图3为一个第一级参考图像折叠降维成四个第二级参考图像的操作示意图。该第一级参考图像的分辨率为4*4;该第二级参考图像的分辨率为2*2。
同时模型生成装置还对第一级源图像进行折叠降维操作,获取第一级源图像对应的多个下级源图像,如四个第二级源图像;如继续对第二级源图像进行折叠降维操作,则可获取四个第三级源图像。
不同级别或分辨率的参考图像的设置,可较好的满足场景内不同场景物品感受野的需求。
在步骤S203中,模型生成装置使用第一预设残差卷积网络,对步骤S202获取的多个下级参考图像(如第二级参考图像以及第三级参考图像等)进行特征提取,以得到多个不同级别的下级参考图像特征。
同时模型生成装置使用第一预设残差卷积网络,对步骤S202获取的多个下级源图像进行特征提取,以得到多个不同级别的下级源图像特征。
在步骤S204中,模型生成装置基于设定层级的下级参考图像特征以及设定层级的源图像特征,得到对应的设定层级的下级参考图像的最终代价体。具体的最终代价体的计算过程可参照基于多目全景图像的模型生成方法的第一实施例的步骤S101至步骤S105。
随后模型生成装置基于最终代价体,计算参考图像在设定分辨率下的下级参考图像的图像相差分布估计特征。
具体的,模型生成装置可使用预设神经网络计算参考图像在设定分辨率下的下级参考图像的图像相差分布估计特征。即在设定分辨率下,最终代价体通过预设神经网络的计算会对应相差分布估计,进而可通过该相差估计来计算该 分辨率下的估计相差。其中预设神经网络可通过正负样本的模型训练获取。
由于在参考图像中,对于人眼敏感度较高且人眼观察舒适度较高的检测物体的尺寸就在整个图像尺寸的0.3%至10%,因此传统立体视觉模型会采用较多的计算资源在该分辨率下进行匹配点的计算,而本实施例对该分辨率下的参考图像以及对应的多个源图像使用最终代价体进行估计相差的计算,可大大降低在此分辨率下的参考图像和源图像的匹配点的计算成本。
在步骤S205中,模型生成装置基于其他层级的下级参考图像特征以及其他层级的源图像特征,得到参考图像的其他层级的下级参考图像相差分布估计特征。由于在其他分辨率下对计算资源消耗较低,这里可采用现有的特征点匹配算法计算参考图像的其他层级的下级参考图像相差分布估计特征。
在步骤S206中,模型生成装置使用第二预设残差卷积网络,对步骤S204和步骤S205获取的下级参考图像相差分布估计特征进行特征提取,以获取下级参考图像的差异特征。
在步骤S207中,模型生成装置基于获取的下级参考图像的差异特征,得到下级参考图像的估计相差。即基于下级参考图像的差异特征对应的预设估计相差,来确定对应的下级参考图像的估计相差。如该下级参考图像的差异特征对应的预设估计相差较大,则对应得到的下级参考图像的估计相差也较大;如该下级参考图像的差异特征对应的预设估计相差较小,则对应得到的下级参考图像的估计相差也较小。该预设估计相差可通过正负样本的模型训练获取。
在步骤S208中,模型生成装置对步骤S206获取的下级参考图像的差异特征进行平铺升维操作,以得到第一级参考图像的修正差异特征;模型生成装置对步骤S207获取的下级参考图像的估计相差进行平铺升维,以得到第一级参考图像的修正相差。
比如模型生成装置可对第三级参考图像的差异特征进行平铺升维操作,以得到第二级参考图像的修正差异特征,该第二级参考图像的修正差异特征可用于计算第二级参考图像的差异特征;随后模型生成装置可对第二级参考图像的差异特征进行平铺升维操作,以得到第一级参考图像的修正差异特征。
具体请参照图4,图4为四个第三级参考图像平铺升维成一个第二级参考图像的操作示意图。该第三级参考图像的差异特征对应图像的分辨率为2*2;该第二级参考图像的修正差异特征对应图像的分辨率为4*4。
同理,模型生成装置可对第三级参考图像的估计相差进行平铺升维操作,以得到第二级参考图像的修正相差,该第二级参考图像的修正相差可用于计算第二级参考图像的估计相差;随后模型生成装置对第二级参考图像的估计相差进行平铺升维操作,以得到第一级参考图像的修正相差。
在步骤S209中,模型生成装置对步骤S201获取的参考图像、源图像, 步骤S208获取得到第一级参考图像的修正差异特征以及第一参考图像的修正相差,进行特征融合,并根据融合后的特征得到对应的第一级参考图像的最终相差。该融合后的特征与第一级参考图像的最终相差的对应关系可通过正负样本的模型训练获取。
在步骤S210中,模型生成装置基于步骤209获取的最终相差生成参考图像的深度图,并根据深度图构建对应的立体视觉模型。
最终相差可转化为深度图,随后多视角深度图进行互检,剔除掉异常点,即可用于生成3D点云,最后生成对应的立体视觉模型。
这样即完成了本实施例的基于多目全景图像的模型生成方法的立体视觉模型的生成过程。
在第一实施例的基础上,本实施例的基于多目全景图像的模型生成方法在设定分辨率下使用最终代价体来计算下级参考图像相差分布估计特征,在其他分辨率下直接使用图像特征来计算下级参考图像相差分布估计特征,在简化整个模型生成流程的基础上,进一步提高了生成模型的精准度,降低了模型生成流程的计算资源消耗。
本发明还提供一种基于多目全景图像的模型生成装置,请参照图5,图5为本发明的基于多目全景图像的模型生成装置的第一实施例的结构示意图。本实施例的模型生成装置可使用上述的模型生成方法的第一实施例进行实施。本实施例的模型生成装置50包括图像获取模块51、图像矫正旋转矩阵计算模块52、代价体计算模块53、代价体转换模块54、代价体融合模块55、设定估计相差计算模块56、相差融合模块57以及模型构建模块58。
图像获取模块51用于获取参考图像以及对应的多个源图像,其中源图像与所述参考图像具有重合视角;图像矫正旋转矩阵计算模块52用于获取源图像的源相机参数以及参考图像的参考相机参数,并基于源相机参数以及参考相机参数,计算源图像和参考图像的图像矫正旋转矩阵;代价体计算模块53用于提取参考图像的参考图像特征以及源图像的源图像特征,并基于参考图像特征以及源图像特征计算参考图像与源图像的代价体;代价体转换模块54用于使用图像矫正旋转矩阵对代价体进行坐标系转换,得到源图像与参考图像的矫正代价体;代价体融合模块55用于对参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体;设定估计相差计算模块56用于基于最终代价体,计算参考图像在设定分辨率下的相差分布估计,并计算设定分辨率下的估计相差;相差融合模块57用于对参考图像在每一层分辨率下的估计相差进行融合,得到参考图像的最终相差;模型构建模块58用于基于参考图像的最终相差生成参考图像的深度图,并根据深度图构建对应的立体视觉模型。
本实施例的基于多目全景图像的模型生成装置50使用时,首先图像获取模块51获取参考图像以及对应的多个源图像,这里的源图像和参考图像具有重合视角。参考图像为需要生成立体视觉模型的标准图像,源图像为生成立体视觉模型的参照图像,参考图像与源图像可为对统一物品进行不同角度拍照的图像。
随后图像矫正旋转矩阵计算模块52需要计算参考图像和每个源图像之间的相对位置关系,并获取对应的图像矫正旋转矩阵。
基于该图像矫正旋转矩阵通过投影矩阵判断参考图像与对应源图像之间的相对位置关系,产生得到矫正后的参考图像,使得矫正后的参考图像相对源图像仅水平方向存在左向位移。
然后代价体计算模块53使用预设神经网络对参考图像进行特征提取,以获取参考图像特征,同时使用预设神经网络对源图像进行特征提取,获取源图像特征。
随代价体计算模块54基于参考图像特征以及源图像特征计算参考图像和源图像的代价体。该代价体表示的是参考图像在立体空间中的深度概率值。具体的,可基于以下公式计算参考图像与源图像的代价体:
C(q,i,j)=F 0(:,i,j)'·g(F 1,q)(:,i,j);
F 0∈P c×h×w
其中c表示特征图的特征度信道数量,h代表特征图宽度,w代表特征图高度,F 0为参考图像的特征图,F 1为所述源图像的特征图,C(q,i,j)为参考图像与源图像的代价体,其中i为代价体的行位置,j为代价体的列位置,q为一设定相差值,g(F 1,q)代表特征图F 1整体延w方向平移q和像素。
随后代价体转换模块55使用图像矫正旋转矩阵对代价体进行坐标系转换,得到源图像与参考图像的矫正代价体(矫正视角下的代价体),以便进行后续对多个不同源图像与参考图像的代价体进行融合。
具体的,可通过以下公式计算源图像和参考图像的矫正代价体:
C'(m,n,p)=R 0·R'·C(q,i,j);
其中R 0为参考图像的旋转矩阵,R为源图像和所述参考图像的图像矫正旋转矩阵,C'(m,n,p)为源图像和参考图像的矫正代价体。
然后代价体融合模块55对获取的参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体。
具体的,代价体融合模块55可使用逐元素最大池化操作对参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体。
随后设定估计相差计算模块56基于获取的最终代价体,计算参考图像在 设定分辨率下的相差分布估计,并计算设定分辨率下的估计相差。
具体的,设定估计相差计算模块56基于最终代价体,使用预设神经网络计算参考图像在设定分辨率下的相差分布估计。即在设定分辨率下,最终代价体通过预设神经网络的计算会对应相差分布估计,进而可通过该相差分布估计来计算该分辨率下的估计相差。其中预设神经网络可通过正负样本的模型训练获取。
在该设定分辨率下,参考图像中检测物体的尺寸为参考图像尺寸的0.3%至10%,如检测物体大于参考图像尺寸的10%,则可能导致对检测物体运动的检测不敏感,如检测物体小于参考图像尺寸的0.3%。则可能导致无法检测到对应检测物体的运动。在更小分辨率的情况下,参考图像更加关注更细节的检测物体运动,在更大分辨率的情况下,参考图像更加关注更宏观的检测物体运动。
由于在参考图像中,对于人眼敏感度较高且人眼观察舒适度较高的检测物体的尺寸就在整个图像尺寸的0.3%至10%,因此传统立体视觉模型会采用较多的计算资源在该分辨率下进行匹配点的计算,而本实施例对该分辨率下的参考图像以及对应的多个源图像使用最终代价体进行估计相差的计算,可大大降低在此分辨率下的参考图像和源图像的匹配点的计算成本。
由于参考图像的最终相差是由各个分辨率下的参考图像的估计相差合成的,不同分辨率下,用户对参考图像中关注的检测物体的尺寸是不同的。因此相差融合模块57对参考图像在每一层分辨率下的估计相差进行融合,从而可得到参考图像的最终相差。
最后模型构建模块58基于最终相差生成参考图像的深度图,并根据深度图构建对应的立体视觉模型。
具体的,模型构建模块58可通过以下公式生成参考图像的深度图。
Figure PCTCN2021088002-appb-000003
其中f为参考图像对应相机的焦距,b为多目全景图像立体***中的基线长度,d为估计相差。
通过上述公式,最终相差可转化为深度图,随后多视角深度图进行互检,剔除掉异常点,即可用于生成3D点云,最后生成对应的立体视觉模型。
这样即完成了本实施例的基于多目全景图像的模型生成装置50的立体视觉模型的生成过程。
本实施例的基于多目全景图像的模型生成装置通过对多个源图像与参考图像的代价体进行计算以及融合,提高了设定分辨率下的估计相差的准确性,从而有效的提高了生成模型的精准度;同时代价体的计算和融合对计算资源的 消耗较小,因此可降低整个模型生成流程的计算资源消耗。
请参照图6,图6为本发明的基于多目全景图像的模型生成装置的第二实施例的结构示意图。本实施例的模型生成装置可使用上述的模型生成方法的第二实施例进行实施。本实施例的模型生成装置60包括图像获取模块61、折叠降维模块62、特征提取模块63、第一相差分布估计特征计算模块64、第二相差分布估计特征计算模块65、差异特征获取模块66、估计相差计算模块67、平铺升维模块68、最终相差获取模块69以及模型构建模块6A。
图像获取模块61用于获取参考图像以及对应的多个源图像,其中源图像与参考图像具有重合视角;折叠降维模块62用于对第一级参考图像进行折叠降维操作,获取第一级参考图像对应的至少一个下级参考图像;对第一级源图像进行折叠降维操作,获取第一级源图像对应的至少一个下级源图像;特征提取模块63用于使用第一预设残差卷积网络,对下级参考图像进行特征提取,以得到下级参考图像特征;使用第一预设残差卷积网络,对下级源图像进行特征提取,以得到下级源图像特征;第一相差分布估计特征计算模块64用于基于设定层级的下级参考图像特征以及设定层级的源图像特征,得到对应的设定层级的下级参考图像的最终代价体,并基于最终代价体,计算参考图像在设定分辨率下的下级参考图像相差分布估计特征;第二相差分布估计特征计算模块65用于基于其他层级的下级参考图像特征以及其他层级的源图像特征,得到参考图像的其他层级的下级参考图像相差分布估计特征;差异特征获取模块66用于使用第二预设残差卷积网络,对下级参考图像相差分布估计特征进行特征提取,得到下级参考图像的差异特征;估计相差计算模块67用于基于下级参考图像的差异特征,得到下级参考图像的估计相差;平铺升维模块68用于对差异特征进行平铺升维操作,以得到第一级参考图像的修正差异特征;对估计相差进行平铺升维操作,以得到第一级参考图像的修正相差;最终相差获取模块69用于根据参考图像、源图像、第一级参考图像的修正差异特征以及第一级参考图像的修正相差,得到第一级参考图像的最终相差;模型构建模块6A用于基于第一级参考图像的最终相差生成参考图像的深度图,并根据深度图构建对应的立体视觉模型。
本实施例的基于多目全景图像的模型生成装置60使用时,首先图像获取模块61获取多目摄像机拍摄的参考图像以及对应的多个源图像,其中源图像与参考图像具有重合视角。
随后折叠降维模块62对第一级参考图像进行折叠降维操作,获取第一级参考图像对应多个下级参考图像,如四个第二级参考图像;如继续对第二级参考图像进行折叠降维操作,则可获取四个第三级参考图像。
同时折叠降维模块62还对第一级源图像进行折叠降维操作,获取第一级源图像对应的多个下级源图像,如四个第二级源图像;如继续对第二级源图像进行折叠降维操作,则可获取四个第三级源图像。
不同级别或分辨率的参考图像的设置,可较好的满足场景内不同场景物品感受野的需求。
然后特征提取模块63使用第一预设残差卷积网络,对多个下级参考图像(如第二级参考图像以及第三级参考图像等)进行特征提取,以得到多个不同级别的下级参考图像特征。
同时特征提取模块63使用第一预设残差卷积网络,对多个下级源图像进行特征提取,以得到多个不同级别的下级源图像特征。
随后第一相差分布估计特征计算模块64基于设定层级的下级参考图像特征以及设定层级的源图像特征,得到对应的设定层级的下级参考图像的最终代价体。
随后第一相差分布估计特征计算模块64基于最终代价体,计算参考图像在设定分辨率下的下级参考图像的图像相差分布估计特征。
具体的,第一相差分布估计特征计算模块64可使用预设神经网络计算参考图像在设定分辨率下的下级参考图像的图像相差分布估计特征。即在设定分辨率下,最终代价体通过预设神经网络的计算会对应相差分布估计,进而可通过该相差估计来计算该分辨率下的估计相差。其中预设神经网络可通过正负样本的模型训练获取。
然后第二相差分布估计特征计算模块65基于其他层级的下级参考图像特征以及其他层级的源图像特征,得到参考图像的其他层级的下级参考图像相差分布估计特征。由于在其他分辨率下对计算资源消耗较低,这里可采用现有的特征点匹配算法计算参考图像的其他层级的下级参考图像相差分布估计特征。
随后差异特征获取模块66使用第二预设残差卷积网络,对下级参考图像相差分布估计特征进行特征提取,以获取下级参考图像的差异特征。
然后估计相差计算模块67基于获取的下级参考图像的差异特征,得到下级参考图像的估计相差。即基于下级参考图像的差异特征对应的预设估计相差,来确定对应的下级参考图像的估计相差。如该下级参考图像的差异特征对应的预设估计相差较大,则对应得到的下级参考图像的估计相差也较大;如该下级参考图像的差异特征对应的预设估计相差较小,则对应得到的下级参考图像的估计相差也较小。该预设估计相差可通过正负样本的模型训练获取。
随后平铺升维模块68对下级参考图像的差异特征进行平铺升维操作,以得到第一级参考图像的修正差异特征;平铺升维模块对下级参考图像的估计相差进行平铺升维,以得到第一级参考图像的修正相差。
比如平铺升维模块68可对第三级参考图像的差异特征进行平铺升维操作,以得到第二级参考图像的修正差异特征,该第二级参考图像的修正差异特征可用于计算第二级参考图像的差异特征;随后平铺升维模块可对第二级参考图像的差异特征进行平铺升维操作,以得到第一级参考图像的修正差异特征。
同理,平铺升维模块68可对第三级参考图像的估计相差进行平铺升维操作,以得到第二级参考图像的修正相差,该第二级参考图像的修正相差可用于计算第二级参考图像的估计相差;随后平铺升维模块对第二级参考图像的估计相差进行平铺升维操作,以得到第一级参考图像的修正相差。
然后最终相差获取模块69对参考图像、源图像第一级参考图像的修正差异特征以及第一参考图像的修正相差,进行特征融合,并根据融合后的特征得到对应的第一级参考图像的最终相差。该融合后的特征与第一级参考图像的最终相差的对应关系可通过正负样本的模型训练获取。
最后模型构建模块6A基于最终相差生成参考图像的深度图,并根据深度图构建对应的立体视觉模型。
最终相差可转化为深度图,随后多视角深度图进行互检,剔除掉异常点,即可用于生成3D点云,最后生成对应的立体视觉模型。
这样即完成了本实施例的基于多目全景图像的模型生成装置的立体视觉模型的生成过程。
在第一实施例的基础上,本实施例的基于多目全景图像的模型生成装置在设定分辨率下使用最终代价体来计算下级参考图像相差分布估计特征,在其他分辨率下直接使用图像特征来计算下级参考图像相差分布估计特征,在简化整个模型生成流程的基础上,进一步提高了生成模型的精准度,降低了模型生成流程的计算资源消耗。
请参照图7,图7为本发明的基于多目全景图像的模型生成方法及模型生成装置的具体实施例的流程示意图。本具体实施例的模型生成方法及生成装置通过对第一级参考图像以及对应的第一级源图像进行多次折叠降维,以产生多分辨率下的特征图。分辨率级数可根据实际参考图像尺寸进行调整,以确保最低分辨率相差评估可以囊括参考图像与源图像的最大相差。各分辨率下,根据左右眼图像特征图所产生的相差分布和图像在该分辨率下的特征图,预测相差实际数值。并且对于人眼敏感度较高的分辨率层级,使用参考图像以及对应的多个源图像的最终代价体进行估计相差的计算,从而大大降低了此分辨率下的参考图像和源图像的匹配点的计算成本。
上述预测所得相差和用于产生预测的特征图会经过平铺升维操作传递至上级参考图像进行融合处理,经多次平铺升维操作生成原始分辨率的密集相差 图,从而进一步基于相差图生成对应的深度图以及对应的立体视觉模型。
本发明的基于多目全景图像的模型生成方法以及模型生成装置通过对多个源图像与参考图像的代价体进行计算以及融合,提高了设定分辨率下的估计相差的准确性,从而有效的提高了生成模型的精准度;同时代价体的计算和融合对计算资源的消耗较小,因此可降低整个模型生成流程的计算资源消耗;有效解决了现有的模型生成方法及模型生成装置的计算资源消耗较大且生成模型的精准度较差的技术问题。
综上所述,虽然本发明已以实施例揭露如上,实施例前的序号仅为描述方便而使用,对本发明各实施例的顺序不造成限制。并且,上述实施例并非用以限制本发明,本领域的普通技术人员,在不脱离本发明的精神和范围内,均可作各种更动与润饰,因此本发明的保护范围以权利要求界定的范围为准。

Claims (15)

  1. 一种基于多目全景图像的模型生成方法,其包括:
    获取参考图像以及对应的多个源图像,其中所述源图像与所述参考图像具有重合视角;
    获取所述源图像的源相机参数以及所述参考图像的参考相机参数,并基于所述源相机参数以及所述参考相机参数,计算所述源图像和所述参考图像的图像矫正旋转矩阵;
    提取所述参考图像的参考图像特征以及所述源图像的源图像特征,并基于所述参考图像特征以及所述源图像特征计算所述参考图像与所述源图像的代价体;
    使用所述图像矫正旋转矩阵对所述代价体进行坐标系转换,得到所述源图像与所述参考图像的矫正代价体;
    对所述参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体;
    基于所述最终代价体,计算所述参考图像在设定分辨率下的相差分布估计,并计算所述设定分辨率下的估计相差;
    对所述参考图像在每一层分辨率下的估计相差进行融合,得到所述参考图像的最终相差;
    基于所述参考图像的最终相差生成所述参考图像的深度图,并根据所述深度图构建对应的立体视觉模型。
  2. 根据权利要求1所述的基于多目全景图像的模型生成方法,其中基于以下公式计算所述源图像和所述参考图像的图像矫正旋转矩阵:
    v x=(c 1-c 0)·sign(R 0(1,:)·R 0'·t 1);
    v y=cross(R 1(3,:),v x);
    v z=cross(v x,v y);
    R=[v x/||v x|| 2,v y/||v y|| 2,v z/||v z|| 2]';
    c 0=-R 0'·t 0
    c 1=-R 1'·t 1
    其中R 0为所述参考图像的旋转矩阵,t 0为所述参考图像的平移矩阵,R 1为对应的源图像的旋转矩阵,t 1为对应的源图像的旋转矩阵,R为所述源图像和所述参考图像的图像矫正旋转矩阵。
  3. 根据权利要求1所述的基于多目全景图像的模型生成方法,其中基于以下公式计算所述参考图像与所述源图像的代价体:
    C(q,i,j)=F 0(:,i,j)'·g(F 1,q)(:,i,j)
    F 0∈P c×h×w
    其中c表示特征图的特征度信道数量,h代表特征图宽度,w代表特征图高度,F 0为参考图像的特征图,F 1为所述源图像的特征图,C(q,i,j)为参考图像与源图像的代价体,其中i为所述代价体的行位置,j为所述代价体的列位置,q为一设定相差值,g(F 1,q)代表特征图F 1整体延w方向平移q和像素。
  4. 根据权利要求1所述的基于多目全景图像的模型生成方法,其中通过以下公式计算所述源图像和所述参考图像的矫正代价体:
    C'(m,n,p)=R 0·R'·C(q,i,j);
    其中R 0为所述参考图像的旋转矩阵,R为所述源图像和所述参考图像的图像矫正旋转矩阵,C'(m,n,p)为所述源图像和所述参考图像的矫正代价体。
  5. 根据权利要求1所述的基于多目全景图像的模型生成方法,其中所述对所述参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体的步骤为:
    使用逐元素最大池化操作对所述参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体。
  6. 根据权利要求1所述的基于多目全景图像的模型生成方法,其中所述基于所述最终代价体,计算所述参考图像在设定分辨率下的相差分布估计,并计算所述设定分辨率下的估计相差的步骤为:
    基于最终代价体,使用预设神经网络计算所述参考图像在设定分辨率下的相差分布估计,并计算所述设定分辨率下的估计相差;
    其中在所述设定分辨率下,所述参考图像中检测物体的尺寸为所述参考图像的尺寸的0.3%至10%。
  7. 根据权利要求1所述的基于多目全景图像的模型生成方法,其中通过以下公式生成所述参考图像的深度图:
    Figure PCTCN2021088002-appb-100001
    其中f为所述参考图像对应相机的焦距,b为多目全景图像立体***中的基线长度,d为估计相差。
  8. 一种基于多目全景图像的模型生成方法,其包括:
    获取参考图像以及对应的多个源图像,其中所述源图像与所述参考图像具有重合视角;
    对所述第一级参考图像进行折叠降维操作,获取所述第一级参考图像对应 的至少一个下级参考图像;对所述第一级源图像进行折叠降维操作,获取所述第一级源图像对应的至少一个下级源图像;
    使用第一预设残差卷积网络,对所述下级参考图像进行特征提取,以得到下级参考图像特征;使用第一预设残差卷积网络,对所述下级源图像进行特征提取,以得到下级源图像特征;
    基于设定层级的下级参考图像特征以及设定层级的源图像特征,得到对应的设定层级的下级参考图像的最终代价体,并基于所述最终代价体,计算所述参考图像在设定分辨率下的下级参考图像相差分布估计特征;
    基于其他层级的下级参考图像特征以及其他层级的源图像特征,得到所参考图像的其他层级的下级参考图像相差分布估计特征;
    使用第二预设残差卷积网络,对所述下级参考图像相差分布估计特征进行特征提取,得到下级参考图像的差异特征;
    基于所述下级参考图像的差异特征,得到所述下级参考图像的估计相差;
    对所述差异特征进行平铺升维操作,以得到第一级参考图像的修正差异特征;对所述估计相差进行平铺升维操作,以得到第一级参考图像的修正相差;
    根据所述参考图像、所述源图像、所述第一级参考图像的修正差异特征以及所述第一级参考图像的修正相差,得到第一级参考图像的最终相差;
    基于所述第一级参考图像的最终相差生成所述参考图像的深度图,并根据所述深度图构建对应的立体视觉模型。
  9. 一种基于多目全景图像的模型生成装置,其包括:
    图像获取模块,用于获取参考图像以及对应的多个源图像,其中所述源图像与所述参考图像具有重合视角;
    图像矫正旋转矩阵计算模块,用于获取所述源图像的源相机参数以及所述参考图像的参考相机参数,并基于所述源相机参数以及所述参考相机参数,计算所述源图像和所述参考图像的图像矫正旋转矩阵;
    代价体计算模块,用于提取所述参考图像的参考图像特征以及所述源图像的源图像特征,并基于所述参考图像特征以及所述源图像特征计算所述参考图像与所述源图像的代价体;
    代价体转换模块,用于使用所述图像矫正旋转矩阵对所述代价体进行坐标系转换,得到所述源图像与所述参考图像的矫正代价体;
    代价体融合模块,用于对所述参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体;
    设定估计相差计算模块,用于基于所述最终代价体,计算所述参考图像在设定分辨率下的相差分布估计,并计算所述设定分辨率下的估计相差;
    相差融合模块,用于对所述参考图像在每一层分辨率下的估计相差进行融 合,得到所述参考图像的最终相差;
    模型构建模块,用于基于所述参考图像的最终相差生成所述参考图像的深度图,并根据所述深度图构建对应的立体视觉模型。
  10. 根据权利要求9所述的基于多目全景图像的模型生成装置,其中所述图像矫正旋转矩阵计算模块基于以下公式计算所述源图像和所述参考图像的图像矫正旋转矩阵:
    v x=(c 1-c 0)·sign(R 0(1,:)·R 0'·t 1);
    v y=cross(R 1(3,:),v x);
    v z=cross(v x,v y);
    R=[v x/||v x|| 2,v y/||v y|| 2,v z/||v z|| 2]';
    c 0=-R 0'·t 0
    c 1=-R 1'·t 1
    其中R 0为所述参考图像的旋转矩阵,t 0为所述参考图像的平移矩阵,R 1为对应的源图像的旋转矩阵,t 1为对应的源图像的旋转矩阵,R为所述源图像和所述参考图像的图像矫正旋转矩阵。
  11. 根据权利要求9所述的基于多目全景图像的模型生成装置,其中所述代价体计算模块基于以下公式计算所述参考图像与所述源图像的代价体:
    C(q,i,j)=F 0(:,i,j)'·g(F 1,q)(:,i,j)
    F 0∈P c×h×w
    其中c表示特征图的特征度信道数量,h代表特征图宽度,w代表特征图高度,F 0为参考图像的特征图,F 1为所述源图像的特征图,C(q,i,j)为参考图像与源图像的代价体,其中i为所述代价体的行位置,j为所述代价体的列位置,q为一设定相差值,g(F 1,q)代表特征图F 1整体延w方向平移q和像素。
  12. 根据权利要求9所述的基于多目全景图像的模型生成装置,其中所述代价体转换模块通过以下公式计算所述源图像和所述参考图像的矫正代价体:
    C'(m,n,p)=R 0·R'·C(q,i,j);
    其中R 0为所述参考图像的旋转矩阵,R为所述源图像和所述参考图像的图像矫正旋转矩阵,C'(m,n,p)为所述源图像和所述参考图像的矫正代价体。
  13. 根据权利要求9所述的基于多目全景图像的模型生成装置,其中所述代价体融合模块用于使用逐元素最大池化操作对所述参考图像对应的多个源图像的矫正代价体进行融合操作,得到最终代价体。
  14. 根据权利要求9所述的基于多目全景图像的模型生成装置,其中所述设定估计相差计算模块用于基于最终代价体,使用预设神经网络计算所述参考图像在设定分辨率下的相差分布估计,并计算所述设定分辨率下的估计相差;
    其中在所述设定分辨率下,所述参考图像中检测物体的尺寸为所述参考图像的尺寸的0.3%至10%。
  15. 根据权利要求9所述的基于多目全景图像的模型生成装置,其中所述模型构建模块通过以下公式生成所述参考图像的深度图:
    Figure PCTCN2021088002-appb-100002
    其中f为所述参考图像对应相机的焦距,b为多目全景图像立体***中的基线长度,d为估计相差。
PCT/CN2021/088002 2020-06-04 2021-04-19 基于多目全景图像的模型生成方法及装置 WO2021244161A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/041,413 US20230237683A1 (en) 2020-06-04 2021-04-19 Model generation method and apparatus based on multi-view panoramic image

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010501846.9 2020-06-04
CN202010501846.9A CN111402345B (zh) 2020-06-04 2020-06-04 基于多目全景图像的模型生成方法及装置

Publications (1)

Publication Number Publication Date
WO2021244161A1 true WO2021244161A1 (zh) 2021-12-09

Family

ID=71414108

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088002 WO2021244161A1 (zh) 2020-06-04 2021-04-19 基于多目全景图像的模型生成方法及装置

Country Status (3)

Country Link
US (1) US20230237683A1 (zh)
CN (1) CN111402345B (zh)
WO (1) WO2021244161A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402345B (zh) * 2020-06-04 2020-09-04 深圳看到科技有限公司 基于多目全景图像的模型生成方法及装置
CN112991207B (zh) * 2021-03-11 2022-11-15 五邑大学 全景深度估计方法、装置、终端设备及存储介质
CN113837106A (zh) * 2021-09-26 2021-12-24 北京的卢深视科技有限公司 人脸识别方法、***、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545586A (zh) * 2017-08-04 2018-01-05 中国科学院自动化研究所 基于光场极限平面图像局部的深度获取方法及***
US20180018776A1 (en) * 2016-07-15 2018-01-18 Samsung Electronics Co., Ltd. Content aware visual image pattern matching
EP3349176A1 (en) * 2017-01-17 2018-07-18 Facebook, Inc. Three-dimensional scene reconstruction from set of two-dimensional images for consumption in virtual reality
CN109146001A (zh) * 2018-09-14 2019-01-04 西安电子科技大学 多视角isar图像融合方法
CN110458952A (zh) * 2019-08-19 2019-11-15 江苏濠汉信息技术有限公司 一种基于三目视觉的三维重建方法和装置
CN111127538A (zh) * 2019-12-17 2020-05-08 武汉大学 一种基于卷积循环编码-解码结构的多视影像三维重建方法
CN111402345A (zh) * 2020-06-04 2020-07-10 深圳看到科技有限公司 基于多目全景图像的模型生成方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110176060B (zh) * 2019-04-28 2020-09-18 华中科技大学 基于多尺度几何一致性引导的稠密三维重建方法和***

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018776A1 (en) * 2016-07-15 2018-01-18 Samsung Electronics Co., Ltd. Content aware visual image pattern matching
EP3349176A1 (en) * 2017-01-17 2018-07-18 Facebook, Inc. Three-dimensional scene reconstruction from set of two-dimensional images for consumption in virtual reality
CN107545586A (zh) * 2017-08-04 2018-01-05 中国科学院自动化研究所 基于光场极限平面图像局部的深度获取方法及***
CN109146001A (zh) * 2018-09-14 2019-01-04 西安电子科技大学 多视角isar图像融合方法
CN110458952A (zh) * 2019-08-19 2019-11-15 江苏濠汉信息技术有限公司 一种基于三目视觉的三维重建方法和装置
CN111127538A (zh) * 2019-12-17 2020-05-08 武汉大学 一种基于卷积循环编码-解码结构的多视影像三维重建方法
CN111402345A (zh) * 2020-06-04 2020-07-10 深圳看到科技有限公司 基于多目全景图像的模型生成方法及装置

Also Published As

Publication number Publication date
CN111402345A (zh) 2020-07-10
CN111402345B (zh) 2020-09-04
US20230237683A1 (en) 2023-07-27

Similar Documents

Publication Publication Date Title
CN110135455B (zh) 影像匹配方法、装置及计算机可读存储介质
WO2021244161A1 (zh) 基于多目全景图像的模型生成方法及装置
TWI777538B (zh) 圖像處理方法、電子設備及電腦可讀儲存介質
CN107230225B (zh) 三维重建的方法和装置
US10334168B2 (en) Threshold determination in a RANSAC algorithm
CN108352071B (zh) 用于计算机视觉***中的运动结构处理的方法
Eder et al. Pano popups: Indoor 3d reconstruction with a plane-aware network
US20120069018A1 (en) Ar process apparatus, ar process method and storage medium
CN112083403B (zh) 用于虚拟场景的定位追踪误差校正方法及***
WO2021004416A1 (zh) 一种基于视觉信标建立信标地图的方法、装置
Yezzi et al. Structure from motion for scenes without features
JP2020067978A (ja) 床面検出プログラム、床面検出方法及び端末装置
WO2023082822A1 (zh) 图像数据的处理方法和装置
CN113643414A (zh) 一种三维图像生成方法、装置、电子设备及存储介质
CN114332125A (zh) 点云重建方法、装置、电子设备和存储介质
US11145072B2 (en) Methods, devices and computer program products for 3D mapping and pose estimation of 3D images
CN113140034A (zh) 基于房间布局的全景新视角生成方法、装置、设备和介质
CN107330930B (zh) 三维图像深度信息提取方法
CN110443228B (zh) 一种行人匹配方法、装置、电子设备及存储介质
KR102315696B1 (ko) 관절 모델 정합 장치 및 방법
CN114981845A (zh) 图像扫描方法及装置、设备、存储介质
CN112288813B (zh) 基于多目视觉测量与激光点云地图匹配的位姿估计方法
CN113379845A (zh) 一种相机标定方法及装置、电子设备及存储介质
JP2006337075A (ja) 3次元情報復元装置
CN109902695B (zh) 一种面向像对直线特征匹配的线特征矫正与提纯方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817580

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 28.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21817580

Country of ref document: EP

Kind code of ref document: A1