CN116664845B

CN116664845B - Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism

Info

Publication number: CN116664845B
Application number: CN202310935833.6A
Authority: CN
Inventors: 聂秀山; 方静远; 宁阳; 袭肖明; 郭杰
Original assignee: Shandong Jianzhu University
Current assignee: Shandong Jianzhu University
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-10-13
Anticipated expiration: 2043-07-28
Also published as: CN116664845A

Abstract

The application belongs to the technical field of image segmentation and provides an intelligent building image segmentation method and system based on an inter-block contrast attention mechanism. The method comprises the steps of predicting a target segmentation area of a scene image of a construction site by adopting a segmentation model based on the scene image of the construction site to be segmented; extracting a feature map of a scene image training sample of a construction site; based on the split labels, a plurality of label vectors are obtained; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; and optimizing the super parameters of the segmentation model according to the output result of the segmentation model and the segmentation label. The method can carry out target segmentation processing on the scene image of the construction site, and effectively realize intelligent monitoring of safety of the construction site.

Description

Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism

Technical Field

The application belongs to the technical field of image segmentation, and particularly relates to an intelligent building image segmentation method and system based on an inter-block contrast attention mechanism.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The semantic segmentation task is an important subject in computer vision, and the main task is to assign a class label at a pixel level to each pixel of an image, so that the semantic segmentation task plays an important role in the fields of automatic driving, computer vision, medical image analysis, computer-aided diagnosis and the like.

The intelligent construction site is to carry out scientific management, intelligent production and the like on the construction site by an informatization means, and relates to the technologies of computer technology, artificial intelligence technology, sensing technology, virtual reality and the like. The construction engineering generally has the problems of short construction period, heavy tasks, high risk, difficult management and the like. At present, the management of the construction site mainly comprises inspection and spot check, and the problems of poor timeliness, high personnel supervision cost and the like exist, so that the frequency of illegal operation is improved, and the safety, quality and progress of the construction site cannot be effectively ensured.

Along with the development of artificial intelligence, the artificial intelligence technology is gradually applied to an auxiliary supervision system of a construction site, an image recognition algorithm technology based on deep learning is applied to monitor pictures shot by a construction site monitor and a tower crane, at present, only global long-distance dependent information and only short-distance dependent information are considered for a segmentation task of an intelligent construction site scene image, and the accuracy of a result of the image segmentation task is affected by the two technologies.

Disclosure of Invention

In order to solve the technical problems in the background art, the application provides an intelligent building site image segmentation method and system based on an inter-block contrast attention mechanism, which can carry out target segmentation processing on a building site scene image, effectively realize intelligent monitoring of the safety of a building site, provide production management efficiency and ensure the safety construction of the building site.

In order to achieve the above purpose, the present application adopts the following technical scheme:

a first aspect of the present application provides a method for intelligent worker image segmentation based on inter-block contrast attention mechanisms.

An intelligent work image segmentation method based on an inter-block contrast attention mechanism comprises the following steps:

based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;

the training process of the segmentation model comprises the following steps: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model.

Further, the process of establishing a block-level correlation matrix based on the plurality of block-level feature maps and the plurality of block-level CAMs includes: and respectively carrying out matrix transformation on the plurality of block-level feature maps and the plurality of block-level CAM, and then carrying out matrix multiplication to obtain a block-level correlation matrix of long dependency relationship between the channel and the category.

Further, the calculating process of the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix includes: and taking the channel with the response value higher than the set value in the block-level correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix.

Further, the mapping the block-level correlation matrix to the global correlation matrix includes: the block-level correlation matrix is mapped to a global correlation matrix through the full connection layer.

Further, the calculating process of the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix includes: and taking the channel with the response value higher than the set value in the global correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix.

Further, after obtaining the output feature map, the method further comprises: and processing the dimension of the output feature map to be the same as the dimension of the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through up-sampling, namely, obtaining an output result of the segmentation model.

Further, the loss function includes:

under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of prediction loss functions of the block level CAM;

performing a semantic segmentation loss function of the target segmentation region;

and a contrast loss function between inter-block long dependencies and intra-block long dependencies.

A second aspect of the present application provides an intelligent worker image segmentation system based on an inter-block contrast attention mechanism.

An intelligent building image segmentation system based on inter-block contrast attention mechanism, comprising:

a prediction module configured to: based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;

a segmentation model training module configured to: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model.

A third aspect of the present application provides a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in the intelligent worker image segmentation method based on the inter-block contrast attention mechanism as described in the first aspect above.

A fourth aspect of the application provides a computer device.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the intelligent worker image segmentation method based on the inter-block contrast attention mechanism as described in the first aspect above when the program is executed.

Compared with the prior art, the application has the beneficial effects that:

for construction site monitoring images of construction site monitoring and tower crane transmission, the application performs feature extraction through the neural network, and performs pixel-level classification on the input feature images to obtain the segmentation areas of each object in the input images, thereby being beneficial to further detecting the violation operation and the potential safety hazard of the construction site.

According to the application, contrast learning is introduced into a semantic segmentation task under a supervision setting, pixels with the same label are closer in a feature space through the contrast learning, and pixels with different labels have relatively larger distances in the feature space, so that the characteristic capability of the feature is further enhanced. Because the attention mechanism and the contrast learning are excellent in semantic segmentation tasks, the attention mechanism and the contrast learning are combined, and the channel correlation matrix of the feature map and the class activation map (Class Activation Mapping, CAM) is forced to have higher confidence through contrast loss, so that a robust and accurate target segmentation result of the scene image of the construction site is obtained, and the segmentation precision of the scene image of the construction site is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.

FIG. 1 is a flow chart of an intelligent worker image segmentation method based on the inter-block contrast attention mechanism shown in the present application;

fig. 2 is a block diagram illustrating an intelligent worker image segmentation method based on the inter-block contrast attention mechanism according to the present application.

Detailed Description

The application will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

It is noted that the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and systems according to various embodiments of the present disclosure. It should be noted that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the logical functions specified in the various embodiments. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or operations, or combinations of special purpose hardware and computer instructions.

Example 1

As shown in fig. 1 and fig. 2, the present embodiment provides an intelligent map image segmentation method based on an inter-block contrast attention mechanism, and the present embodiment is illustrated by applying the method to a server, and it can be understood that the method may also be applied to a terminal, and may also be applied to a system and a terminal, and implemented through interaction between the terminal and the server. The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network servers, cloud communication, middleware services, domain name services, security services CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. In this embodiment, the method includes the steps of:

acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model.

The following describes the specific scheme of this embodiment in detail:

1. feature extraction

For an input image, the input image is first mapped to a frame through a backbone network (e.g., VGG, resNet-101)Wherein H, W is the length and width of the feature map, respectively, and C is the number of channels of the feature map.

2. Building block level feature map

Firstly, performing One-Hot Encoding (One-Hot Encoding) on the split labels to obtain K label vectorsWhere K is the number of categories of the dataset. Let the feature map be divided into->Individual blocks(s)>Subsequently by->Is to obtain the block class classification tag +.>. Subsequently, will->Divided into Np blocks, e.g. np=4, i.e. feature map +.>Dividing into 16 block-level feature maps with length and width of h and w respectively, and dividing the tag +.>Under supervision, by->Convolving, block level->Mapping to Np->Is a block level CAM of (c). Final block level feature map->Is +.>The dimension of the block level CAM is +.>。

3. Mining long inter-class dependencies at the block level

Will bePersonal->Block-level feature map->Changed into N through reshape _p />hw/>Dimension of C->Personal->Block level CAM of (2) through reshape to N _p />K/>Dimension of hw>Matrix multiplication with CAM to obtain N _p />K/>C matrix, which establishes the correlation of K and C, i.e. +.>Personal->Correlation matrix->，/>The correlation between each category i and the channel j is embodied. Eventually long dependencies between channels and categories can be established within the block. By constructing the correlation matrix T, the long dependency relationship between the channels in the block and the data set category is mined, and the requirement of the pixel level classification task on fine-granularity short dependency information is ensured while long dependency information is acquired.

4. Automatic acquisition of optimal samples

Maintaining a learnable weight matrix A for obtaining a comparison learning positive sample with dimensions ofThe matrix contains K normalized weight vectors +.>. In particular, for a certain category +.>The present embodiment relates to the correlation matrix +.>The channel with higher response value is used as positive sample, and the correlation matrix is +.>The channel with lower response value is used as a negative sample, and the positive sample adaptively acquires higher weight coefficient. Meanwhile, in order to consider information of all samples in a block, the present embodiment takes coefficient matrix 1-a as an adaptive weight coefficient for selecting a negative sample.

5. Construction of contrast attention mechanism between blocks

Forcing by introducing block level contrast learningHas stronger characterization capability, i.e. a certain channel has higher correlation to a certain class, so that the channel is at +.>Response value +.>The larger. By means of the fully connected layer ∈ ->Individual block level correlation matrix->Mapping to Global correlation matrix->I.e. +.>Wherein->Representing input dimensions asAnd outputting a linear layer with the dimension of 1, and acquiring the inter-block long dependency relationship. By means of an automatic acquisition strategy of optimal samples, for each category block level and global correlation matrix are selected simultaneously +.>And->The channel with the high response value in the middle is taken as a positive sample, and the channel with the low response value is taken as a negative sample. And the positive sample similarity is given higher weight through the weight matrix A and weighted summation is carried out to be used as the positive sample similarity, and the negative sample similarity is given higher weight through the weight matrix A and weighted summation is carried out to be used as the negative sample similarity. Further calculating the contrast loss, the distance between the positive samples can be reduced by the contrast loss, and the distance between the negative samples is increased, namely, for a certain category, the channels related to the category are more similar under the action of the contrast loss, and the passage unrelated to the category is dredged.

Since the positive and negative samples that were last used for contrast loss computation come from each block, the above operations can extend the block-level semantic information to the global. Further, attention mechanisms are introduced based on several block-level feature maps and several block-level CAMs. The traditional self-attention mechanism model usually adopts a Query-Key-Value model (QKV), namely, input characteristic diagrams are respectively subjected to linear transformation W _q 、W _k 、W _v And obtaining Q, K, V three feature images, and obtaining a correlation matrix by Q and K through dot product scaling and other modes, and obtaining an output feature image by matrix multiplication with V. The application constructs an attention mechanism through the input feature map and the CAM, and the process of acquiring the output feature map comprises the following steps: passing block level CAM throughThe linear transformation of (2) is used as the calculation of the attention mechanism with a plurality of block-level feature maps and a plurality of block-level CAM to construct an output feature map.

I.e. first the channel and class correlation matrix is acquired. And secondly, acquiring long-distance dependence among the classes in the block, and acquiring a global correlation matrix of the long-distance dependence among the global classes by aggregating a plurality of block-level correlation matrices. And finally, calculating contrast loss after sampling positive and negative samples, and returning the loss to a network, so that the model is forced to learn a more structural channel and category correlation matrix through contrast learning. Therefore, long dependency relationships in blocks and long dependency relationships among blocks are established simultaneously, and the dependence of semantic segmentation tasks on global semantic information and fine-grained information is met.

6. Acquiring semantic segmentation masks

Obtained by the operation of contrast attention among blocksFeature map of->And remodel it intoA feature map of a dimension that has the same dimensions as the input feature map. Then, through up-sampling operation, a semantic segmentation mask with the same size as the original image is obtained, so that an accurate and robust segmentation result is obtained, and segmentation loss is calculated by the segmentation result and a segmentation label>。

7. Calculate losses and perform gradient pass back

Predicting loss by CAM classification under class-level label guidanceSemantic segmentation loss->And contrast loss of contrast attention between blocks +.>Summarizing to obtain final loss->And gradient back transmission is performed, and the application defines the gradient back transmission as follows:

wherein,,、/>、/>the weight coefficient of each loss is respectively 1, 0.4 and 1 through a large number of experiments.

8. Model test and application

Inputting the test set into a trained segmentation model, outputting a predicted segmentation result, and evaluating the model performance through an average cross-over ratio (mIoU).

The tested model can be used for a site image segmentation task, and for site monitoring and tower crane shooting images, segmented areas of targets in the images are obtained through the segmentation model.

In terms of local information, the present embodiment uses a partitioned attention mechanism to establish correlations between channels and categories of the original feature map, and the intra-block attention mechanism is more beneficial to how the network focuses on mining more fine-grained information than the traditional attention mechanism. In the global information layer, the present embodiment proposes inter-block comparison learning, and first, each channel and a channel with a higher response value in the class correlation matrix are taken as positive samples of the corresponding class, and a channel with a lower response value is taken as a negative sample of the corresponding class. More specifically, positive and negative samples in blocks of each category are expanded to global positive and negative samples, so that the model is promoted to acquire fine-granularity local semantic information while mining more distinguishable global semantic information. It is noted that this embodiment maintains a learnable weight matrix in terms of how to select positive and negative samples, and ensures that the positive and negative samples are selected with a better fit while not losing any positive and negative sample information.

The application digs the dependency relationship of image category to the channel by carrying out the blocking operation to the feature map and the CAM and calculating the channel correlation of the feature map and the CAM in the block dimension. In order to enable the generated feature map and the embedding space of the channel correlation matrix of the CAM to have stronger characterization capability, the application provides a block-level contrast attention mechanism which not only can model long-distance dependency of images, but also establishes short-distance dependency between channels and categories based on block-level features, thereby simultaneously ensuring the requirements of semantic segmentation tasks on coarse-granularity and fine-granularity information. And the correlation between the channels and the categories is emphasized by contrast learning, and block-level positive and negative samples of each category are fused to obtain global positive and negative samples of each category, so that the correlation between the blocks is established, and the segmentation model has better characterization capability and robust segmentation performance.

Example two

The embodiment provides an intelligent image segmentation system based on an inter-block contrast attention mechanism.

a segmentation model training module configured to: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model. It should be noted that the prediction module and the segmentation model training module are the same as the example and application scenario implemented by the steps in the first embodiment, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.

Example III

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the intelligent worker image segmentation method based on the inter-block contrast attention mechanism as described in the above embodiment.

Example IV

The present embodiment provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps in the intelligent image segmentation method based on the inter-block contrast attention mechanism according to the above embodiment when executing the program.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. The intelligent map image segmentation method based on the inter-block contrast attention mechanism is characterized by comprising the following steps of:

the training process of the segmentation model comprises the following steps: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; based on a plurality of block-level feature graphs and a plurality of block-level CAMs, a block-level correlation matrix is established, specifically: respectively carrying out matrix transformation on a plurality of block-level feature graphs and a plurality of block-level CAM, and then carrying out matrix multiplication to obtain a block-level correlation matrix of long dependency relationship between channels and categories; mapping the block-level correlation matrix into a global correlation matrix, specifically: mapping the block-level correlation matrix into a global correlation matrix through a full connection layer; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix to obtain an output feature map, and obtaining an output result of the segmentation model according to the output feature map, wherein the method specifically comprises the following steps of: processing the dimension of the output feature map to be the same as the dimension of the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through up-sampling, namely, obtaining an output result of a segmentation model; and optimizing the super parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label to obtain a trained segmentation model.

2. The intelligent work image segmentation method based on inter-block contrast attention mechanism of claim 1, wherein the calculation process of the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix comprises: and taking the channel with the response value higher than the set value in the block-level correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix.

3. The intelligent job image segmentation method based on inter-block contrast attention mechanism according to claim 1, wherein the calculation process of the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix comprises: and taking the channel with the response value higher than the set value in the global correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix.

4. The intelligent job image segmentation method based on inter-block contrast attention mechanism according to claim 1, wherein the loss function comprises:

5. An intelligent building image segmentation system based on inter-block contrast attention mechanism, comprising:

a segmentation model training module configured to: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; based on a plurality of block-level feature graphs and a plurality of block-level CAMs, a block-level correlation matrix is established, specifically: respectively carrying out matrix transformation on a plurality of block-level feature graphs and a plurality of block-level CAM, and then carrying out matrix multiplication to obtain a block-level correlation matrix of long dependency relationship between channels and categories; mapping the block-level correlation matrix into a global correlation matrix, specifically: mapping the block-level correlation matrix into a global correlation matrix through a full connection layer; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix to obtain an output feature map, and obtaining an output result of the segmentation model according to the output feature map, wherein the method specifically comprises the following steps of: processing the dimension of the output feature map to be the same as the dimension of the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through up-sampling, namely, obtaining an output result of a segmentation model; and optimizing the super parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label to obtain a trained segmentation model.

6. A computer readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the steps of the intelligent worker image segmentation method based on inter-block contrast attention mechanism as claimed in any of claims 1-4.

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps in the intelligent inter-block contrast attention based method of any of claims 1-4 when the program is executed.