CN115731477A

CN115731477A - Image recognition method, illicit detection method, terminal device, and storage medium

Info

Publication number: CN115731477A
Application number: CN202211396331.2A
Authority: CN
Inventors: 黄积晟; 任宇鹏; 李乾坤; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-03-03

Abstract

The application discloses an image identification method, an illegal construction detection method, a terminal device and a computer storage medium, wherein the image identification method comprises the following steps: acquiring an unmanned aerial vehicle image acquired based on a target area; inputting an unmanned aerial vehicle image into a pre-trained example segmentation network, wherein the example segmentation network comprises a feature extraction model, a feature detection model and a mask generation model; extracting a characteristic image of the unmanned aerial vehicle image through a characteristic extraction model; generating a target building detection frame in the feature image through the feature detection model; generating a mask of a target building on the feature image according to a target building detection frame through the mask generation model; and acquiring building information in the unmanned aerial vehicle image based on the mask output by the example segmentation network. According to the image identification method, the building mask in the unmanned aerial vehicle image is extracted efficiently by providing the example segmentation network aiming at the building.

Description

Image recognition method, illicit detection method, terminal device, and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image recognition method, an illegal construction detection method, a terminal device, and a computer storage medium.

Background

The regulation of breaking away into the structure has always been one of city management's important work, and the tradition is carried out the patrol of breaking away into the structure with the manual mode and is consuming time hard, and is also timely effective inadequately to the supervision of breaking away into the structure, and simultaneously, because the restriction of angle of tour, can't in time discover the illegal and indiscriminate act of setting up and additional construction of building top, lead to follow-up need to invest a large amount of time and energy and carry out the work of demolising and rebuilding of breaking away into the structure.

Disclosure of Invention

The application provides an image identification method, an illegal construction detection method, a terminal device and a computer storage medium.

One technical solution adopted by the present application is to provide an image recognition method, including:

acquiring an unmanned aerial vehicle image acquired based on a target area;

inputting the unmanned aerial vehicle image into a pre-trained example segmentation network, wherein the example segmentation network comprises a feature extraction model, a feature detection model and a mask generation model;

extracting a feature image of the unmanned aerial vehicle image through the feature extraction model;

generating a target building detection frame in the feature image through the feature detection model;

generating a mask of a target building on the feature image according to the target building detection frame through the mask generation model;

and acquiring building information in the unmanned aerial vehicle image based on the mask output by the example segmentation network.

The mask generation model comprises a first multilayer perceptron, a second multilayer perceptron, a deconvolution layer and a convolution layer which are connected in sequence.

The mask generation model further comprises a maximization processing layer and an average processing layer, the maximization processing layer and the average processing layer are arranged between the second multilayer sensor and the deconvolution layer in parallel, and the maximization processing layer and the average processing layer are used for removing the hole information of the target building detection frame in the mask generation process.

The loss function of the example segmentation network at least comprises mask loss, wherein the mask loss is calculated by adopting a sampling point loss function, and the sampling point loss function calculates the loss function of the example segmentation network by utilizing the difference information of the predicted mask output and the actual mask output of a plurality of sampling points in the unmanned aerial vehicle image.

The image identification method further comprises the following steps:

in the training process of the example segmentation network, acquiring a training feature map of a training image;

obtaining the confidence degrees of all pixel points in the training characteristic diagram;

selecting a plurality of sampling points in the training characteristic diagram according to the confidence degrees of all the pixel points;

and training the example segmentation network by using the difference information of the predicted mask output and the actual mask output of the plurality of sampling points.

Selecting a plurality of sampling points in the training feature map according to the confidence degrees of all the pixel points, wherein the selecting comprises the following steps:

acquiring a first confidence coefficient that all pixel points belong to a foreground category and a second confidence coefficient that all pixel points belong to a background category;

according to the sum of the absolute value of the first confidence coefficient and the absolute value of the second confidence coefficient, the uncertainty of the pixel point is determined;

and taking the pixel points with the uncertainty larger than or equal to a preset threshold value in all the pixel points as sampling points selected from the training characteristic diagram.

The image identification method further comprises the following steps:

in the training process of the example segmentation network, acquiring a training image and a building roof mask thereof;

extracting a mask of an illegal building from the training image or the illegal building database;

affixing the illicit building mask to the building rooftop mask of the training image to form a new training image.

After generating a mask of a target building on the feature image according to the target building detection frame through the mask generation model, the image identification method further includes:

acquiring a building roof mask output by the example segmentation network;

generating a corresponding minimum external rectangle based on the building roof mask code, and acquiring a region to be normalized, which does not belong to the building roof mask code, in the minimum external rectangle;

generating a maximum inscribed rectangle in the area to be normalized;

and cutting the minimum external matrix according to the maximum internal matrix to obtain a normalized building roof mask.

Another technical solution adopted by the present application is to provide an illegal construction detection method, including:

acquiring an unmanned aerial vehicle image acquired based on a target area, and acquiring an orthographic map slice corresponding to the unmanned aerial vehicle image;

acquiring a building roof mask and a building violation mask on the unmanned aerial vehicle image, wherein the mask acquisition mode is as in the mask acquisition mode in the image identification method;

obtaining a reprojected image of the orthographic map slice reprojected to a coordinate system of the drone image based on the building rooftop mask;

acquiring a change detection area result of the target area based on the image difference information of the re-projected image and the unmanned aerial vehicle image;

and combining the change detection area result and the illegal building detection result of the illegal building mask code to obtain illegal building detection information of the unmanned aerial vehicle image.

Another technical solution adopted by the present application is to provide a terminal device, where the terminal device includes a memory and a processor coupled to the memory;

wherein the memory is adapted to store program data and the processor is adapted to execute the program data to implement the image recognition method and/or the violation detection method as described above.

Another technical solution adopted by the present application is to provide a computer storage medium for storing program data, which when executed by a computer, is used to implement the image recognition method and/or the violation detection method as described above.

The beneficial effect of this application is: the terminal equipment acquires an unmanned aerial vehicle image acquired based on a target area; inputting an unmanned aerial vehicle image into a pre-trained example segmentation network, wherein the example segmentation network comprises a feature extraction model, a feature detection model and a mask generation model; extracting a characteristic image of the unmanned aerial vehicle image through a characteristic extraction model; generating a target building detection frame in the feature image through the feature detection model; generating a mask of a target building on the feature image according to a target building detection frame through the mask generation model; and acquiring building information in the unmanned aerial vehicle image based on the mask output by the example segmentation network. According to the image identification method, the building mask in the unmanned aerial vehicle image is extracted efficiently by providing the example segmentation network aiming at the building.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a first embodiment of an image recognition method provided in the present application;

FIG. 2 is a block diagram of an embodiment of an example split network provided herein;

FIG. 3 is a flowchart illustrating a second embodiment of an image recognition method according to the present application;

FIG. 4 is a schematic flow chart diagram illustrating a third embodiment of an image recognition method provided by the present application;

FIG. 5 is a schematic diagram of a simulation data generation process provided herein;

FIG. 6 is a schematic flow chart diagram illustrating a fourth embodiment of an image recognition method provided by the present application;

FIG. 7 is a schematic view of a normalized roof of a building provided herein;

FIG. 8 is a schematic flow chart diagram illustrating an embodiment of a violation detection method provided by the present application;

FIG. 9 is a schematic diagram of a general flow of the violation detection method provided by the present application;

fig. 10 is a schematic structural diagram of an embodiment of a terminal device provided in the present application;

FIG. 11 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an image recognition method according to a first embodiment of the present disclosure.

The image recognition method is applied to an image recognition device, wherein the image recognition device can be a server, and can also be a system in which the server and a local terminal are matched with each other. Accordingly, each part, for example, each unit, sub-unit, module, and sub-module, included in the image recognition apparatus may be all disposed in the server, or may be disposed in the server and the local terminal, respectively.

Further, the server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing distributed servers, or as a single software or software module, and is not limited herein. In some possible implementations, the image recognition method of the embodiments of the present application may be implemented by a processor calling computer readable instructions stored in a memory.

Specifically, as shown in fig. 1, the image recognition method according to the embodiment of the present application specifically includes the following steps:

step S11: and acquiring an unmanned aerial vehicle image acquired based on the target area.

In this application embodiment, image recognition device patrols and examines the in-process at unmanned aerial vehicle to target area or target area, reads the image of unmanned aerial vehicle or unmanned aerial vehicle video stream throughout the year to acquire the unmanned aerial vehicle image in image of unmanned aerial vehicle or the unmanned aerial vehicle video stream throughout the year.

Step S12: inputting unmanned aerial vehicle images into a pre-trained example segmentation network, wherein the example segmentation network comprises a feature extraction model, a feature detection model and a mask generation model.

In the embodiment of the application, the image recognition device inputs the unmanned aerial vehicle image acquired in step S11 into a pre-trained example segmentation network, and the example segmentation network recognizes building targets in the unmanned aerial vehicle image, such as building roofs and illegal buildings, and generates corresponding masks.

Specifically, please refer to fig. 2, and fig. 2 is a schematic structural diagram of an embodiment of the example split network provided in the present application.

As shown in fig. 2, the example segmentation network of the present application specifically includes a feature extraction model, a feature detection model, and a mask generation model.

The feature extraction model adopts a swim-transform as an encoder to enhance the extraction capability of the network on the image features. As shown in FIG. 2, the swim-transform includes four pooling layers, each of which reduces the resolution of the input feature map, expanding the field of view layer by layer like CNN. Through the swim-transform, the feature image of the unmanned aerial vehicle image is reduced by one scale after passing through one pooling layer every time, and finally five feature images with different scales can be obtained.

The feature Detection model adopts an FCOS (full volumetric One-Stage Object Detection) network to extract a Detection frame of a target building, wherein the FCOS network is a One-Stage network, target Detection without an anchor and without a proposal is realized based on the idea of center-ness, and the recall rate is not inferior to that of a Detection algorithm based on the anchor.

And predicting a Mask with the size of 28x28 by using an SAG-Mask network structure for the Mask generation model, and finally scaling to the size corresponding to the detection box. The SAG-Mask part of the decoder provided by the application is replaced by MLP, namely, the characteristic diagram of the last layer of the encoder shock-transform is combined with the result of the FCOS network to extract the characteristic diagram with the size of N14, wherein N represents the predicted number of objects. The SAG-Mask network structure uses two MLPs (Multilayer Perceptron), then uses max (maximum processing layer) and avg (average processing layer) to remove holes respectively, and adds them together, and finally uses deconvolution and one convolution layer to output the final result.

The maximization processing layer and the average processing layer are arranged between the second multilayer sensor and the deconvolution layer in parallel and used for removing the cavity information of the target building detection frame in the mask generation process.

The loss function of the example segmentation network shown in fig. 2 is composed of four parts, namely, a target classification loss, a center position loss, a regression loss, and a mask loss. The mask loss is calculated by adopting a sampling point loss function, and the sampling point loss function calculates the mask loss of the example segmentation network by utilizing the difference information of the predicted mask output and the actual mask output of a plurality of sampling points in the unmanned aerial vehicle image.

Specifically, please refer to fig. 3 for a process of selecting sampling points and calculating a loss function using the sampling points in the example segmentation network, and fig. 3 is a flowchart illustrating a second embodiment of the image recognition method according to the present application.

Specifically, as shown in fig. 3, the image recognition method according to the embodiment of the present application specifically includes the following steps:

step S21: in the training process of the example segmentation network, a training feature map of a training image is obtained.

In the embodiment of the present application, in the training process of the example segmentation network, the image recognition device extracts a training feature map of 2 × h × w of a training image by using the feature extraction model.

Step S22: and obtaining the confidence degrees of all pixel points in the training characteristic graph.

In the embodiment of the application, the image recognition device predicts a first confidence coefficient that a pixel point in each training feature map belongs to a foreground category and a second confidence coefficient that the pixel point belongs to a background category by using a feature detection model. Wherein the foreground includes the building of the violation and the roof of the building.

Step S23: and selecting a plurality of sampling points in the training characteristic graph according to the confidence degrees of all the pixel points.

In the embodiment of the present application, the image recognition device selects K sampling points instead of the entire training feature map to calculate the loss function for each training feature map of H × W according to the uncertainty of the current pixel point.

The uncertainty of the pixel points can be determined by predicting confidence degrees of different categories of the pixel points. Specifically, the image recognition device obtains a first confidence that all pixel points on the training feature map belong to the foreground category and a second confidence that all pixel points belong to the background category, and then calculates the sum of the absolute value of the first confidence and the absolute value of the second confidence of the pixel points as the uncertainty of the pixel points. The specific formula is as follows:

uncertainty＝-(torch.abs(gt_class_logits1))-(torch.abs(gt_class_logits2))

wherein uncertainties are uncertainties of the pixel points, gt _ class _ locations 1 is a first confidence level, gt _ class _ locations 2 is a second confidence level, and torch.

The image recognition device takes the pixel points with the uncertainty higher than a certain threshold value as sampling points.

Step S24: and training the example segmentation network by using the difference information of the predicted mask output and the actual mask output of a plurality of sampling points.

In the embodiment of the present application, the image recognition apparatus trains the example segmentation network by using the difference between the predicted mask output and the actual mask output of the sampling point selected in step S23, and takes the prediction mask output of the example segmentation network approaching the actual mask output as a training purpose.

In the image identification method of the embodiment of the application, the image identification device uses the loss function of the sampling point, so that the training speed of the network can be effectively improved; in addition, when the image recognition device is trained by using a simulated data set, the difference exists between the simulated data set and a real standard data set, and the marking precision is low, so that the influence caused by the error of the data set can be further reduced by using a sampling point loss function.

Next, a process of generating a simulation data set according to the present application is described, specifically referring to fig. 4 and fig. 5, where fig. 4 is a schematic flowchart of a third embodiment of an image recognition method according to the present application, and fig. 5 is a schematic diagram of a simulation data generation process according to the present application.

According to the embodiment of the application, the small data set marked with the roof of the building and the illegal building in the roof is used firstly, the case segmentation network is trained, and the roof of the building and the illegal building are predicted for the unmarked unmanned aerial vehicle image through the case segmentation network after training. And extracting the illegal buildings from the images to form an illegal building data set.

Specifically, as shown in fig. 4, the image recognition method according to the embodiment of the present application specifically includes the following steps:

step S31: in the training process of the example segmentation network, a training image and a building roof mask thereof are obtained.

In an embodiment of the present application, a mask generation model of an example segmented network generates a building rooftop mask on a training image. As shown in fig. 5, the original image of the training image is shown in the upper left corner, and the building roof mask and the illegal building mask of the training image are shown in the upper right corner.

Step S32: and extracting the mask of the building violation from the training image or the violation database.

In the embodiment of the present application, the image recognition apparatus randomly extracts the mask of the building violating from the training image or the database of the building violating, for example, the mask of the building violating is extracted from the training image at the bottom left corner of fig. 5 and from the training image at the top right corner.

Step S33: the offending building mask is affixed to the building rooftop mask of the training image to form a new training image.

In the embodiment of the present application, the image recognition device attaches the default building mask to the building rooftop mask of the training image to form a new training image, such as the simulation data generated in the lower right corner of fig. 5.

The attachment rule provided by the embodiment of the application includes but is not limited to:

1. the default building mask must be entirely within the building rooftop mask;

2. the mask of the building to be built has to have one edge to coincide with the edge of the mask of the roof of the building;

3. there cannot be overlap with other offending building masks.

Step S13: and extracting the characteristic image of the unmanned aerial vehicle image through the characteristic extraction model.

Step S14: and generating a target building detection frame in the characteristic image through the characteristic detection model.

Step S15: and generating a mask of the target building on the feature image according to the target building detection frame through a mask generation model.

In the embodiment of the present application, functions of the feature extraction model, the feature detection model, and the mask generation model in application are substantially the same as those in a training process, and refer to the description of the training process. Wherein the mask generation model may generate a building rooftop mask and a violation building mask.

Through the above process, the example segmentation network outputs a building rooftop mask, but the building rooftop mask is generally irregularly shaped due to being a product of the example segmentation, and may contain a large number of unreasonably sharp corners and the like. However, the general building roofs all have a rectangular shape, so the image recognition apparatus of the present application can normalize the building roofs by using a model-driven method, specifically refer to fig. 6, and fig. 6 is a flowchart illustrating a fourth embodiment of the image recognition method provided by the present application.

Specifically, as shown in fig. 6, the image recognition method according to the embodiment of the present application specifically includes the following steps:

step S41: a building rooftop mask of an instance segmentation network output is obtained.

Step S42: and generating a corresponding minimum external rectangle based on the building roof mask code, and acquiring a region to be normalized, which does not belong to the building roof mask code, in the minimum external rectangle.

In the embodiment of the application, the image recognition device marks out the area of the building roof mask within the minimum circumscribed rectangle by recognizing the shape of the building roof mask, and marks the rest of the area as the area to be normalized.

Step S43: and generating a maximum inscribed rectangle in the area to be normalized.

In the embodiment of the application, the image recognition device generates the maximum inscribed rectangles in each connected region to be normalized, and the number of the maximum inscribed rectangles is consistent with the number of the connected regions to be normalized.

Step S44: and cutting the minimum external matrix according to the maximum internal matrix to obtain the normalized roof mask of the building.

In the embodiment of the present application, the image recognition device cuts each of the belonged areas to be regularized according to the maximum inscribed rectangle determined in step S43, and reserves the area where the building roof mask is located, thereby realizing regularization of the building roof mask.

Further, the image recognition device may further perform regularization on the building roof mask continuously in an iterative manner until the shape of the actual building roof is approximated, that is, the schematic diagram of the regularized building roof shown in fig. 7. The iteration times can be set in a preset time mode, and whether to continue iteration can also be determined in a mode of determining whether the area of the region to be normalized of each iteration exceeds a preset area threshold, wherein the preset area threshold can be determined by the area of the building roof mask, namely, the preset area threshold is obtained by multiplying the area of the building roof mask by a proportionality coefficient.

Step S16: and acquiring building information in the unmanned aerial vehicle image based on the mask output by the example segmentation network.

In the embodiment of the application, the image recognition device segments the roof mask code and the illegal building mask code of the building through the examples from step S11 to step S15, so as to obtain the building information in the unmanned aerial vehicle image, including the position of the illegal building, the time of the illegal building, the type of the illegal building, and the like.

In the embodiment of the application, the image recognition device acquires the unmanned aerial vehicle image acquired based on the target area; inputting an unmanned aerial vehicle image into a pre-trained example segmentation network, wherein the example segmentation network comprises a feature extraction model, a feature detection model and a mask generation model; extracting a characteristic image of the unmanned aerial vehicle image through a characteristic extraction model; generating a target building detection frame in the feature image through the feature detection model; generating a mask of a target building on the feature image according to a target building detection frame through the mask generation model; and acquiring building information in the unmanned aerial vehicle image based on the mask output by the example segmentation network. According to the image identification method, the building mask in the unmanned aerial vehicle image is efficiently extracted by providing the example segmentation network aiming at the building.

By providing an example segmentation network aiming at the roof of a building and an illegal building, the method can efficiently extract the roof mask of the building and the mask of the illegal building; by the data simulation method for the building illegal building, the illegal building in the roof can be simulated automatically and efficiently, the data volume is greatly improved, and the complex task volume caused by data labeling is reduced; finally, the building mask is structured by building regularization, so that the building mask is more attractive in effect.

Further, based on the image identification method, the invention further provides an illegal construction supervision scheme based on the unmanned aerial vehicle inspection image and comprehensively utilizing deep learning registration, instance segmentation and historical image change analysis. The system can realize automatic extraction and analysis of the illegal buildings in the specific area, effectively inspects illegal buildings in the wide area in a dead-angle-free and low-cost mode, finds illegal buildings and rapidly disposes efficiency, and achieves the background inspection of the existing illegal buildings, in-process detection of the illegal buildings and additional prevention.

The deep learning example segmentation mainly detects the existing building against construction, constructors and related construction equipment; the deep learning image registration mainly carries out registration of unmanned aerial vehicle images and maps for multiple times; the deep learning change analysis mainly carries out the change region detection of unmanned aerial vehicle images and maps for multiple periods.

Referring to fig. 8 and 9, fig. 8 is a schematic flowchart of an embodiment of a default detection method provided in the present application, and fig. 9 is a schematic flowchart of a general flow of the default detection method provided in the present application.

The technical process of the method for detecting and monitoring the deep learning building violation based on the unmanned aerial vehicle image specifically refers to the general flow diagram illustrated in fig. 9. The violation detection method mainly comprises three modules, namely a first module (namely patent 2 in figure 2) base map positioning and cutting transformation module, a second module (namely patent 3 in figure 2) unmanned aerial vehicle image/video stream instance segmentation module and a third module (namely patent 1 in figure 2) change detection and post-processing module. The following respectively introduces the work flows of the first module, the second module and the third module with reference to a flow diagram of an embodiment of the violation detection method described in fig. 1.

The illegal construction detection method is applied to an illegal construction detection device, wherein the illegal construction detection device can be a server or a system formed by the server and the illegal construction detection device in a mutual matching mode. Accordingly, each part, for example, each unit, sub-unit, module, and sub-module, included in the illegal building detection device may be all disposed in the server, or may be disposed in the server and the illegal building detection device, respectively.

Further, the server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules, for example, software or software modules for providing distributed servers, or as a single software or software module, and is not limited herein. In some possible implementations, the violation detection method of the embodiments of the present application may be implemented by a processor calling a computer readable instruction stored in a memory.

Specifically, as shown in fig. 8, the method for detecting violation according to the embodiment of the present application specifically includes the following steps:

step S51: and acquiring an unmanned aerial vehicle image acquired based on the target area, and acquiring an ortho map slice corresponding to the unmanned aerial vehicle image.

In the embodiment of the application, the default detection device cuts out an ortho map slice equivalent to an image range of the unmanned aerial vehicle from the ortho map according to GPS information, such as longitude and latitude of a target area, by reading the GPS information in an image exif (Exchangeable image file format) of the unmanned aerial vehicle.

In other embodiment modes, the default detection device can expand a preset area on the basis of a map area determined by GPS information of an unmanned aerial vehicle image, so that under the condition of considering camera distortion, calculation errors and the like, it is ensured that an orthographic map slice can provide all pixel information matched with the unmanned aerial vehicle image, positioning is performed in the unmanned aerial vehicle image through the GPS information, base map corresponding area cutting is performed according to the width and the height of the unmanned aerial vehicle image, the cutting width and the height are slightly larger than the length and the width of the image, the positioning error of the unmanned aerial vehicle image is compounded, and the success rate of the subsequent orthographic map slice and unmanned aerial vehicle image registration is improved.

Specifically, the illegal building detection device acquires a first image size of the unmanned aerial vehicle image, and determines a second image size of the ortho-map slice according to the first image size, wherein the second image size is slightly larger than the second image size. And the default detection device determines the positioning range of the ortho-map slice in the ortho-map according to the positioning information, and then cuts out the ortho-map slice from the ortho-map according to the second image size and the positioning range.

Step S52: and acquiring a roof mask of the building and a mask of the building to be built on the unmanned aerial vehicle image.

In this application embodiment, the illegal building detection device acquires the building roof mask and the illegal building mask on the unmanned aerial vehicle image through the image identification method in the above embodiment, and the specific process is not repeated here.

Step S53: a re-projection image of the orthographic map slice re-projected to the coordinate system of the drone image is acquired based on the building rooftop mask.

In the embodiment of the present application, as shown in fig. 9, the violation detection apparatus calculates a first homography matrix between the ortho-map slice and the drone image according to the position of the building rooftop mask. Then, the illegal building detection device re-projects the orthographic map slices onto a coordinate system of the unmanned aerial vehicle image according to the homography matrix to obtain a re-projected image.

Among them, a homographic matrix (homographic matrix) is equivalent to a matrix used in the perspective transformation. The perspective transformation describes the mapping relationship between two planes. It is understood that the homography matrix is so-called because the relationship between the two planes is deterministic and the transformation can only be represented by a unique matrix.

Step S54: and acquiring a change detection area result of the target area based on the image difference information of the re-projected image and the unmanned aerial vehicle image.

Continuing to refer to a third module in fig. 9, the illegal building detection device respectively obtains the images or video key frames obtained by the unmanned aerial vehicle through the previous inspection and the orthographic map slices after homographic transformation, and the orthographic map slices are used as input of the change detection module to analyze the change conditions of the areas corresponding to the two moments.

In this application embodiment, the violation detection device can acquire image difference information, such as pixel value difference, pixel value distribution, and the like, between the reprojection image and the unmanned aerial vehicle image, so as to compare and generate a change detection area result of the target area, that is, analyze a change detection area of the target area.

Step S55: and combining the result of the change detection area and the illegal building detection result of the illegal building mask code to acquire illegal building detection information of the unmanned aerial vehicle image.

In the embodiment of the application, the illegal building detection device performs IoU (cross-over ratio) calculation on the overlapping area by fusing the example segmentation results of the illegal building, removes the area with large overlapping rate, acquires the final detection result of the illegal building, and forms alarm information. The illegal building detection information comprises illegal building behavior positions, illegal building behavior time, illegal building behavior types and the like.

As shown in fig. 9, on one hand, the violation detection device obtains the change detection results of the orthomap slice and the unmanned aerial vehicle map through the twin change detection model, and the change detection results are represented as a prediction change detection frame; on the one hand, the illegal building detection device cuts apart the roof and the illegal building instance on the unmanned aerial vehicle image through the target detection model trained in advance, acquires the illegal building on the unmanned aerial vehicle image, and the mask code of the illegal building is expressed. Furthermore, the illegal building detection device fuses the prediction change detection frame and the mask of the illegal building, so that the result of the change detection area is post-processed to obtain final illegal building detection information.

Specifically, the illegal building detection device calculates the overlapping rate of a plurality of the prediction change detection frames and a plurality of the illegal building masks, removes the prediction change detection frames or the illegal building masks with the overlapping rate larger than a preset threshold value, and forms illegal building detection information by the remaining prediction change detection frames and the illegal building masks.

The above embodiments are only examples of the present disclosure, and do not limit the technical scope of the present disclosure, so that any minor modifications, equivalent changes or modifications made from the above embodiments according to the spirit of the present disclosure will still fall within the technical scope of the present disclosure.

Continuing to refer to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of a terminal device provided in the present application. The terminal device 600 of the embodiment of the present application includes a processor 61, a memory 62, an input-output device 63, and a bus 64.

The processor 61, the memory 62, and the input/output device 63 are respectively connected to the bus 64, the memory 62 stores program data, and the processor 61 is configured to execute the program data to implement the image recognition method and/or the violation detection method according to the above embodiments.

In the embodiment of the present application, the processor 61 may also be referred to as a CPU (Central Processing Unit). The processor 61 may be an integrated circuit chip having signal processing capabilities. The processor 61 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 61 may be any conventional processor or the like.

Still referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application, and the computer storage medium 700 stores program data 71, and when the program data 71 is executed by a processor, the program data is used to implement the image recognition method and/or the violation detection method of the foregoing embodiment.

Embodiments of the present application may be implemented in software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solutions of the present application, which are essential or contributing to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, which is defined by the claims and the accompanying drawings, and the equivalents and equivalent structures and equivalent processes used in the present application and the accompanying drawings are also directly or indirectly applicable to other related technical fields and are all included in the scope of the present application.

Claims

1. An image recognition method, characterized in that the image recognition method comprises:

acquiring an unmanned aerial vehicle image acquired based on a target area;

2. The image recognition method according to claim 1,

the mask generating model comprises a first multilayer perceptron, a second multilayer perceptron, a deconvolution layer and a convolution layer which are connected in sequence.

3. The image recognition method according to claim 2,

the mask generation model further comprises a maximization processing layer and an averaging processing layer, the maximization processing layer and the averaging processing layer are arranged between the second multilayer sensor and the deconvolution layer in parallel, and the maximization processing layer and the averaging processing layer are used for removing the hole information of the target building detection frame in the mask generation process.

4. The image recognition method according to claim 1,

5. The image recognition method according to claim 4,

the image recognition method further comprises the following steps:

selecting a plurality of sampling points in the training characteristic graph according to the confidence degrees of all the pixel points;

6. The image recognition method according to claim 5,

selecting a plurality of sampling points in the training characteristic diagram according to the confidence degrees of all the pixel points, wherein the method comprises the following steps:

and taking the pixel points with the uncertainty greater than or equal to a preset threshold value in all the pixel points as sampling points selected from the training characteristic graph.

7. The image recognition method according to claim 5,

the image recognition method further comprises the following steps:

affixing the offending building mask to the building rooftop mask of the training image to form a new training image.

8. The image recognition method according to claim 1,

after the mask of the target building is generated on the feature image according to the target building detection frame through the mask generation model, the image identification method further includes:

acquiring a building roof mask output by the example segmentation network;

generating a maximum inscribed rectangle in the area to be normalized;

9. A method for detection of an illegal build, the method comprising:

acquiring a building roof mask and a building violation mask on the unmanned aerial vehicle image, wherein the mask is acquired according to the mask acquisition mode in the image identification method of any one of claims 1 to 8;

obtaining a re-projection image of the orthographic map slice re-projected to a coordinate system of the drone image based on the building rooftop mask;

acquiring a change detection area result of the target area based on the image difference information of the reprojected image and the unmanned aerial vehicle image;

10. A terminal device, comprising a memory and a processor coupled to the memory;

wherein the memory is adapted to store program data, and the processor is adapted to execute the program data to implement the image recognition method of any of claims 1 to 8, and/or the violation detection method of claim 9.

11. A computer storage medium for storing program data which, when executed by a computer, is adapted to implement the image recognition method of any one of claims 1 to 8 and/or the violation detection method of claim 9.