CN116958381A

CN116958381A - Automatic generation method for building facade replacement texture

Info

Publication number: CN116958381A
Application number: CN202310735232.0A
Authority: CN
Inventors: 赵婷婷; 李志林; 朱军; 慎利; 遆鹏; 谢亚坤; 沈星宇
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2023-06-20
Filing date: 2023-06-20
Publication date: 2023-10-27

Abstract

The invention discloses a method for automatically generating a building facade replacement texture, which comprises the following steps: performing window extraction by adopting a Mask-Rcnn network, and extracting the spatial position information and Mask data of the window on the vertical face of the building; generating a building facade wall texture image by using a Pix2PixHD network; carrying out regularization treatment on the window mask data, retrieving window textures with consistent structure and style from a material library, generating window textures with spatial layout and geometric structure characteristics, attaching the window textures to the generated wall textures, and replacing original texture data of the building model through texture mapping; the invention solves the problem that the building facade is easy to twist and lose in the three-dimensional modeling of the urban area, and simultaneously reserves the information of the real space position, the size, the layout and the like of the building facade window.

Description

Automatic generation method for building facade replacement texture

Technical Field

The invention relates to the technical field of computer vision image generation, in particular to an automatic generation method of a building elevation replacement texture.

Background

Analysis and three-dimensional modeling of real city scenes are fundamental problems of computer vision and computer graphics, and with the wide use of three-dimensional city building models in the fields of house investigation, stereo measurement, city planning and the like, the construction of three-dimensional models of buildings is greatly concerned. Because of the advantages of low modeling cost, rich model texture detail expression, short modeling time, high speed and the like, oblique photography modeling has been developed into a mainstream method for large-scale reconstruction.

Currently, oblique photography modeling mostly employs a Multi-view stereovision (MVS) method. The working flow comprises the following three steps: firstly, obtaining camera parameters and scene sparse three-dimensional information from a motion obtaining structure (structure from motion, SFM) for aerial images obtained by an unmanned aerial vehicle, then performing dense matching by using an MVS method to obtain a dense three-dimensional point cloud model, and finally obtaining the three-dimensional model of a building by poisson surface reconstruction and texture mapping of the point cloud. However, the existing process also has some problems, such as distortion during model reconstruction; in addition, the texture may be distorted during mapping due to stretching, clipping, transformation and other operations, so that the texture is not well attached after mapping, and a distortion feeling is given. In addition, when the unmanned aerial vehicle image acquisition is carried out on a building, the acquired images are insufficient in number or the building is blocked by vegetation and the like, so that the three-dimensional building model has the phenomena of building elevation distortion, hollowness and the like. Thus requiring texture replacement of the building facade.

In addition, aiming at the problem of image texture generation, currently, a GAN network is commonly used to perform picture texture synthesis, but a building facade image not only comprises wall textures, but also comprises objects such as windows, if the window on the building facade is also subjected to image generation, the position, the size and the layout information of the actual window on the building facade can be damaged, and the requirement of expressing a real scene of a three-dimensional building is not met.

Disclosure of Invention

In order to solve the problems in the prior art, the invention aims to provide an automatic generation method for the replacement texture of the building elevation, which solves the problem that the building elevation is easy to twist and lose in the three-dimensional modeling of the urban area, and simultaneously reserves the information of the real space position, the size, the layout and the like of the building elevation window.

In order to achieve the above purpose, the invention adopts the following technical scheme: an automatic generation method of a building elevation replacement texture comprises the following steps:

step 1, performing window extraction by adopting a Mask-Rcnn network, and extracting space position information and Mask data of a window on a building elevation;

step 2, generating a building facade wall texture image by using a Pix2PixHD network;

and step 3, regularizing the window mask data, retrieving window textures with consistent structure and style from a material library, generating window textures with spatial layout and geometric structure characteristics, attaching the window textures to the generated wall textures, and replacing original texture data of the building model through texture mapping.

As a further improvement of the present invention, the step 1 is specifically as follows:

building vertical window is manufactured, and training samples are extracted: preparing a building elevation data set, marking the manual data of the window by using marking software, storing the marking result in a Json format, and carrying out data enhancement on the data by adopting a random 90-degree multiple rotation, horizontal overturning and telescopic transformation method so as to enhance the segmentation recognition result of the network and the generalization performance of the model;

and performing parameter fine adjustment and model training on the Mask-Rcnn network by using the window training data set, testing the trained model, and extracting the position information and Mask file of the building vertical window.

As a further improvement of the present invention, the step 2 is specifically as follows:

preparing a building facade wall data set: the input is a noise image, and the label is a real wall texture image; carrying out data enhancement on the expansion change of the image pair;

parameter fine tuning and model training are carried out on the Pix2PixHD network: and after training the model, testing to obtain the generated building facade wall texture image.

As a further improvement of the present invention, the step 3 is specifically as follows:

carrying out regularization treatment on the window mask file and complementing the undetected window, and attaching the window to the generated building elevation wall texture image according to the real layout and size information of the window in the mask file; and finally mapping the generated building elevation texture image into a three-dimensional model through texture mapping.

The invention adopts the methods of image generation and window extraction to generate the real wall texture of the building facade, and keeps the real layout information and the actual size of the window according to the mask file of the extracted window.

The invention focuses on the problem of automatic generation of building elevation texture replacement, and the three-dimensional model of the building after elevation replacement looks more regular and uniform by replacing the twisted and hollow building elevation and simultaneously keeping the information of the real space position, the size, the layout and the like of the building elevation window, thereby achieving the purpose of complete building elevation.

The beneficial effects of the invention are as follows:

1. the invention automatically extracts the window mask on the vertical face of the building, thereby saving the time for manually acquiring the position and the size of the window;

2. the building elevation texture generated by the invention keeps the true position and structural layout of the window on the original elevation, and ensures the authenticity of the window layout on the elevation;

3. the size of the building elevation image generated by the invention is not limited to 512 x 512 image sizes, and a larger image can be generated, so that the method is more suitable for the situation that the size of the elevation surface in an actual model is larger.

Drawings

FIG. 1 is a flow chart of a building elevation texture generation in accordance with an embodiment of the present invention;

FIG. 2 is a flow chart of a Mask-Rcnn network in an embodiment of the invention;

FIG. 3 is a schematic diagram of a window mask file extracted in an embodiment of the present invention;

fig. 4 is a schematic diagram of a GAN network structure according to an embodiment of the present invention;

FIG. 5 is a diagram of a Pix2PixHD generation network in accordance with an embodiment of the invention;

FIG. 6 is a diagram illustrating the calculation of Pix2PixHD discrimination network loss in accordance with an embodiment of the present invention;

FIG. 7 is a building elevation image input in an embodiment of the present invention;

FIG. 8 is a regularized window mask file according to an embodiment of the present invention;

fig. 9 is a graph showing building facade results produced in an embodiment of the invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Examples

In order to extract window information and layout structure of the window information on the building facade, the Mask-Rcnn network is adopted to extract the window, and the spatial position information and Mask data of the window on the building facade are extracted; generating a building facade wall texture image by using a Pix2PixHD network; and (3) carrying out regularization processing on the window mask data, retrieving window textures with consistent structure and style from a material library, generating window textures with spatial layout and geometric structure characteristics, attaching the window textures to the generated wall textures, and replacing original texture data of the building model through texture mapping.

The basic steps of the algorithm of the embodiment are as follows:

(1) And (5) manufacturing a building facade window and extracting a training sample. Preparing a building elevation data set, marking the manual data of the window by using marking software, storing the marking result in a Json format, and carrying out data enhancement on the data by adopting a random 90-degree multiple rotation, horizontal overturning and telescopic transformation method so as to enhance the segmentation recognition result of the network and the generalization performance of the model.

(2) And performing parameter fine adjustment and model training on the Mask-Rcnn network by using the window training data set, testing the trained model, and extracting the position information and Mask file of the building vertical window.

(3) A building facade wall dataset is prepared. The input is a noise image, and the label is a real wall texture image. And carrying out data enhancement on the expansion change of the image pair.

(4) Parameter fine tuning and model training are performed on the Pix2PixHD network. And after training the model, testing to obtain the generated building facade wall texture image.

(5) And (3) regularizing the window mask file and complementing the undetected window, and attaching the window to the generated building elevation wall texture image according to the real layout and size information of the window in the mask file. And finally mapping the generated building elevation texture image into a three-dimensional model through texture mapping.

This embodiment is further described below:

the specific implementation flow chart of the building elevation texture replacement in this embodiment is shown in fig. 1, firstly, parameter adjustment and model training are performed on a model according to a prepared building elevation texture data set, and then a building elevation with a proper size is generated by using the trained model. And then, acquiring a window of the original building elevation by using a Mask-Rcnn, detecting and positioning the window, acquiring a binary file of the window of the building elevation, carrying out Mask regularization on the acquired binary file of the window, selecting a window with proper size and style from a window database prepared in advance by using the regularized window Mask file, attaching the window to the generated building elevation, acquiring the building elevation with a real window layout, and finally mapping the texture image of the acquired building elevation onto a three-dimensional building model.

The field of deep learning has made a major breakthrough in recent years, wherein most research results are based on a perception technology, and a computer perceives objects and recognizes contents by imitating the thinking mode of human beings. The idea of generating a countermeasure network (Generative adversarial network, GAN) was proposed by Goodfellow in 2014, and its development history is only six years, but it brings great impact to the field of artificial intelligence. The game process of GAN uses the data distribution produced by the production network (Generator Network, G) to fit the actual data distribution. Namely, a network for generating building elevation pictures receives random noise, generates building elevation pictures and outputs the building elevation pictures. The function of the discrimination network (Discriminator Network, D) is to calculate the probability that a picture is a generated or true picture from an input building facade picture by means of a discriminator. The two are respectively and reversely updated according to the returned results, the networks are balanced mutually, the dynamic change finally reaches Nash equilibrium, and the GAN network structure is shown in figure 4. Based on the superiority of GAN performance, the model is gradually applied to various directions in the field of image processing, including image conversion, image restoration, style migration, image generation, and the like. The generation of the building elevation belongs to the generation of pictures by utilizing a GAN network, and because the size of the building elevation in an actual model is larger, the Mask-Rcnn network is adopted to extract window positions and divide the window, and the Pix2PixHD network frame is adopted to generate high-definition texture of the building elevation.

The Mask-Rcnn network structure is shown in fig. 2, a main feature extraction network and a feature pyramid network (Feature Pyramid Networks, FPN) are utilized to perform convolution feature extraction, bottom layer information is fused, a region suggestion network (Region Proposal Network, RPN) generates a suggestion window, then the feature windows of each level are fused and pooled to generate a feature map with a fixed size, and finally the feature map with the fixed size is classified, frame regression and window Mask prediction are performed. And manually marking the window by using marking software on the acquired building elevation data set, enhancing the data by using a random 90-degree multiple rotation, horizontal overturning and telescopic transformation method, and then performing model training. The window mask results extracted during the model test stage are shown in fig. 3.

The workflow of the Pix2PixHD network is shown in fig. 5 and 6: g denotes a generation network, and D denotes a discrimination network. The function of the generating network is to generate the data distribution of the target domain according to the input data, namely, the probability of the identification error of the judging network is maximized, so that the image is mistakenly considered as a real sample image instead of a generated false image by the judging network; the object of the discrimination network is to maximize discrimination loss, and accurately judge the result of the generated network, and accurately identify the image generated by the generated network and the label image of the target domain. The generating network and the judging network are mutually opposed in the GAN network training, and finally the generating network and the judging network learn the respective optimal states together to achieve Nash distribution.

In order to improve the resolution of the generated image, pix2PixHD designs a coarse-to-fine generation network comprising two sub-networks as shown in fig. 5. G1 is a global generation network, G2 is a local enhancement network, an input image firstly obtains local shallow layer characteristics through the G2 network, then the input image is subjected to 2 times downsampling, a downsampling result is input into the G1 network, the G1 network is a complete coding and decoding network structure, global characteristics of the image can be obtained, and finally the local characteristics of the G2 network and the global characteristics of the G1 network are subjected to pixel-by-pixel addition (Element-wise Add) and are input into the G2 network. Wherein the G2 network doubles the resolution of the image generated by the G1 network. As for the discrimination network, in order to better be able to produce high resolution pictures, pix2PixHD adopts a multi-scale discrimination network as shown in fig. 6. Three discrimination components with the same network structure but different image scales are adopted to form a discrimination network, and downsampling with sampling coefficients of 1,2 and 4 is respectively carried out on the image generated by the generation network and the label image. In this way, the label picture and the generated picture after downsampling are respectively given to the three distinguishing components, the distinguishing network corresponding to the picture with the minimum resolution has a larger receptive field, the global sense of image generation is stronger, and the distinguishing network corresponding to the image with the maximum resolution captures richer details.

The implementation flow of this embodiment is shown in the figure, and is divided into building elevation texture data set making, model parameter fine tuning and training, and model testing.

The specific implementation process is as follows:

(1) A texture dataset for a building facade is created for model training. Because the building facade image size is relatively big, can't directly input into the network and carry out model training, in addition training dataset size is too big can cause the memory that consumes in the training process, leads to model training slower scheduling problem. The training data needs to be cropped to the proper size. Finally, the diversity of the data set is increased through data enhancement, and horizontal rotation, telescopic transformation and the like are mainly performed.

(2) Model parameter fine tuning and training. And fine tuning the super parameters in the model, such as learning rate, iteration times, batch processing size and the like, observing curve changes of training loss and verification loss during training, and stopping model training in advance when the training loss curve is not reduced or the accuracy of the verification set is not increased.

(3) And (5) model testing. And calling the trained model to test, and obtaining a building elevation texture image generated by the model through inputting the noise image.

(4) And (3) segmenting and extracting the window on the vertical face image of the original building by using a Mask-Rcnn network, acquiring the position information of the window, regularizing the window Mask file, retrieving a proper window from a material library prepared in advance by using the regularized window Mask file, and attaching the proper window to a proper position for generating the vertical face of the building. And attaching the generated vertical face texture result to the three-dimensional model through texture mapping.

This embodiment has been fully verified in experiments, as shown in fig. 7, which is a new Mask file after size regularization of the extracted window Mask file (shown in fig. 3), because the size and shape of the windows are irregular in the Mask file extracted by the Mask-Rcnn network, and thus all the window sizes are regularized by using a regularization algorithm. Fig. 9 shows the result of attaching a window file retrieved from a window database prepared in advance to a generated elevation, and it can be seen from comparison with fig. 7 that the layout position of the window remains unchanged. Finally, the result of fig. 9 is mapped into a three-dimensional model of the building by a texture mapping method, and the building elevation texture with the concave-convex structure shown in fig. 7 can be obtained.

The foregoing examples merely illustrate specific embodiments of the invention, which are described in greater detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims

1. The automatic generation method of the building elevation replacement texture is characterized by comprising the following steps:

2. The automatic generation method of building facade replacement textures according to claim 1, wherein the step 1 is specifically as follows:

and performing parameter fine adjustment and model training on the Mask-Rcnn network by using the window training data set, testing the trained model, and extracting the spatial position information and Mask file of the building vertical window.

3. The automatic generation method of the building facade replacement texture according to claim 2, wherein the step 2 is specifically as follows:

4. The automatic generation method of building facade replacement textures according to claim 3, wherein the step 3 is specifically as follows: