CN116664845B - Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism - Google Patents

Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism Download PDF

Info

Publication number
CN116664845B
CN116664845B CN202310935833.6A CN202310935833A CN116664845B CN 116664845 B CN116664845 B CN 116664845B CN 202310935833 A CN202310935833 A CN 202310935833A CN 116664845 B CN116664845 B CN 116664845B
Authority
CN
China
Prior art keywords
block
level
correlation matrix
segmentation
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310935833.6A
Other languages
Chinese (zh)
Other versions
CN116664845A (en
Inventor
聂秀山
方静远
宁阳
袭肖明
郭杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Jianzhu University
Original Assignee
Shandong Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Jianzhu University filed Critical Shandong Jianzhu University
Priority to CN202310935833.6A priority Critical patent/CN116664845B/en
Publication of CN116664845A publication Critical patent/CN116664845A/en
Application granted granted Critical
Publication of CN116664845B publication Critical patent/CN116664845B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application belongs to the technical field of image segmentation and provides an intelligent building image segmentation method and system based on an inter-block contrast attention mechanism. The method comprises the steps of predicting a target segmentation area of a scene image of a construction site by adopting a segmentation model based on the scene image of the construction site to be segmented; extracting a feature map of a scene image training sample of a construction site; based on the split labels, a plurality of label vectors are obtained; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; and optimizing the super parameters of the segmentation model according to the output result of the segmentation model and the segmentation label. The method can carry out target segmentation processing on the scene image of the construction site, and effectively realize intelligent monitoring of safety of the construction site.

Description

Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism
Technical Field
The application belongs to the technical field of image segmentation, and particularly relates to an intelligent building image segmentation method and system based on an inter-block contrast attention mechanism.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The semantic segmentation task is an important subject in computer vision, and the main task is to assign a class label at a pixel level to each pixel of an image, so that the semantic segmentation task plays an important role in the fields of automatic driving, computer vision, medical image analysis, computer-aided diagnosis and the like.
The intelligent construction site is to carry out scientific management, intelligent production and the like on the construction site by an informatization means, and relates to the technologies of computer technology, artificial intelligence technology, sensing technology, virtual reality and the like. The construction engineering generally has the problems of short construction period, heavy tasks, high risk, difficult management and the like. At present, the management of the construction site mainly comprises inspection and spot check, and the problems of poor timeliness, high personnel supervision cost and the like exist, so that the frequency of illegal operation is improved, and the safety, quality and progress of the construction site cannot be effectively ensured.
Along with the development of artificial intelligence, the artificial intelligence technology is gradually applied to an auxiliary supervision system of a construction site, an image recognition algorithm technology based on deep learning is applied to monitor pictures shot by a construction site monitor and a tower crane, at present, only global long-distance dependent information and only short-distance dependent information are considered for a segmentation task of an intelligent construction site scene image, and the accuracy of a result of the image segmentation task is affected by the two technologies.
Disclosure of Invention
In order to solve the technical problems in the background art, the application provides an intelligent building site image segmentation method and system based on an inter-block contrast attention mechanism, which can carry out target segmentation processing on a building site scene image, effectively realize intelligent monitoring of the safety of a building site, provide production management efficiency and ensure the safety construction of the building site.
In order to achieve the above purpose, the present application adopts the following technical scheme:
a first aspect of the present application provides a method for intelligent worker image segmentation based on inter-block contrast attention mechanisms.
An intelligent work image segmentation method based on an inter-block contrast attention mechanism comprises the following steps:
based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;
the training process of the segmentation model comprises the following steps: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model.
Further, the process of establishing a block-level correlation matrix based on the plurality of block-level feature maps and the plurality of block-level CAMs includes: and respectively carrying out matrix transformation on the plurality of block-level feature maps and the plurality of block-level CAM, and then carrying out matrix multiplication to obtain a block-level correlation matrix of long dependency relationship between the channel and the category.
Further, the calculating process of the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix includes: and taking the channel with the response value higher than the set value in the block-level correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix.
Further, the mapping the block-level correlation matrix to the global correlation matrix includes: the block-level correlation matrix is mapped to a global correlation matrix through the full connection layer.
Further, the calculating process of the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix includes: and taking the channel with the response value higher than the set value in the global correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix.
Further, after obtaining the output feature map, the method further comprises: and processing the dimension of the output feature map to be the same as the dimension of the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through up-sampling, namely, obtaining an output result of the segmentation model.
Further, the loss function includes:
under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of prediction loss functions of the block level CAM;
performing a semantic segmentation loss function of the target segmentation region;
and a contrast loss function between inter-block long dependencies and intra-block long dependencies.
A second aspect of the present application provides an intelligent worker image segmentation system based on an inter-block contrast attention mechanism.
An intelligent building image segmentation system based on inter-block contrast attention mechanism, comprising:
a prediction module configured to: based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;
a segmentation model training module configured to: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model.
A third aspect of the present application provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in the intelligent worker image segmentation method based on the inter-block contrast attention mechanism as described in the first aspect above.
A fourth aspect of the application provides a computer device.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the intelligent worker image segmentation method based on the inter-block contrast attention mechanism as described in the first aspect above when the program is executed.
Compared with the prior art, the application has the beneficial effects that:
for construction site monitoring images of construction site monitoring and tower crane transmission, the application performs feature extraction through the neural network, and performs pixel-level classification on the input feature images to obtain the segmentation areas of each object in the input images, thereby being beneficial to further detecting the violation operation and the potential safety hazard of the construction site.
According to the application, contrast learning is introduced into a semantic segmentation task under a supervision setting, pixels with the same label are closer in a feature space through the contrast learning, and pixels with different labels have relatively larger distances in the feature space, so that the characteristic capability of the feature is further enhanced. Because the attention mechanism and the contrast learning are excellent in semantic segmentation tasks, the attention mechanism and the contrast learning are combined, and the channel correlation matrix of the feature map and the class activation map (Class Activation Mapping, CAM) is forced to have higher confidence through contrast loss, so that a robust and accurate target segmentation result of the scene image of the construction site is obtained, and the segmentation precision of the scene image of the construction site is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.
FIG. 1 is a flow chart of an intelligent worker image segmentation method based on the inter-block contrast attention mechanism shown in the present application;
fig. 2 is a block diagram illustrating an intelligent worker image segmentation method based on the inter-block contrast attention mechanism according to the present application.
Detailed Description
The application will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
It is noted that the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and systems according to various embodiments of the present disclosure. It should be noted that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the logical functions specified in the various embodiments. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or operations, or combinations of special purpose hardware and computer instructions.
Example 1
As shown in fig. 1 and fig. 2, the present embodiment provides an intelligent map image segmentation method based on an inter-block contrast attention mechanism, and the present embodiment is illustrated by applying the method to a server, and it can be understood that the method may also be applied to a terminal, and may also be applied to a system and a terminal, and implemented through interaction between the terminal and the server. The server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network servers, cloud communication, middleware services, domain name services, security services CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. In this embodiment, the method includes the steps of:
based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;
acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model.
The following describes the specific scheme of this embodiment in detail:
1. feature extraction
For an input image, the input image is first mapped to a frame through a backbone network (e.g., VGG, resNet-101)Wherein H, W is the length and width of the feature map, respectively, and C is the number of channels of the feature map.
2. Building block level feature map
Firstly, performing One-Hot Encoding (One-Hot Encoding) on the split labels to obtain K label vectorsWhere K is the number of categories of the dataset. Let the feature map be divided into->Individual blocks(s)>Subsequently by->Is to obtain the block class classification tag +.>. Subsequently, will->Divided into Np blocks, e.g. np=4, i.e. feature map +.>Dividing into 16 block-level feature maps with length and width of h and w respectively, and dividing the tag +.>Under supervision, by->Convolving, block level->Mapping to Np->Is a block level CAM of (c). Final block level feature map->Is +.>The dimension of the block level CAM is +.>
3. Mining long inter-class dependencies at the block level
Will bePersonal->Block-level feature map->Changed into N through reshape p />hw/>Dimension of C->Personal->Block level CAM of (2) through reshape to N p />K/>Dimension of hw>Matrix multiplication with CAM to obtain N p />K/>C matrix, which establishes the correlation of K and C, i.e. +.>Personal->Correlation matrix->,/>The correlation between each category i and the channel j is embodied. Eventually long dependencies between channels and categories can be established within the block. By constructing the correlation matrix T, the long dependency relationship between the channels in the block and the data set category is mined, and the requirement of the pixel level classification task on fine-granularity short dependency information is ensured while long dependency information is acquired.
4. Automatic acquisition of optimal samples
Maintaining a learnable weight matrix A for obtaining a comparison learning positive sample with dimensions ofThe matrix contains K normalized weight vectors +.>. In particular, for a certain category +.>The present embodiment relates to the correlation matrix +.>The channel with higher response value is used as positive sample, and the correlation matrix is +.>The channel with lower response value is used as a negative sample, and the positive sample adaptively acquires higher weight coefficient. Meanwhile, in order to consider information of all samples in a block, the present embodiment takes coefficient matrix 1-a as an adaptive weight coefficient for selecting a negative sample.
5. Construction of contrast attention mechanism between blocks
Forcing by introducing block level contrast learningHas stronger characterization capability, i.e. a certain channel has higher correlation to a certain class, so that the channel is at +.>Response value +.>The larger. By means of the fully connected layer ∈ ->Individual block level correlation matrix->Mapping to Global correlation matrix->I.e. +.>Wherein->Representing input dimensions asAnd outputting a linear layer with the dimension of 1, and acquiring the inter-block long dependency relationship. By means of an automatic acquisition strategy of optimal samples, for each category block level and global correlation matrix are selected simultaneously +.>And->The channel with the high response value in the middle is taken as a positive sample, and the channel with the low response value is taken as a negative sample. And the positive sample similarity is given higher weight through the weight matrix A and weighted summation is carried out to be used as the positive sample similarity, and the negative sample similarity is given higher weight through the weight matrix A and weighted summation is carried out to be used as the negative sample similarity. Further calculating the contrast loss, the distance between the positive samples can be reduced by the contrast loss, and the distance between the negative samples is increased, namely, for a certain category, the channels related to the category are more similar under the action of the contrast loss, and the passage unrelated to the category is dredged.
Since the positive and negative samples that were last used for contrast loss computation come from each block, the above operations can extend the block-level semantic information to the global. Further, attention mechanisms are introduced based on several block-level feature maps and several block-level CAMs. The traditional self-attention mechanism model usually adopts a Query-Key-Value model (QKV), namely, input characteristic diagrams are respectively subjected to linear transformation W q 、W k 、W v And obtaining Q, K, V three feature images, and obtaining a correlation matrix by Q and K through dot product scaling and other modes, and obtaining an output feature image by matrix multiplication with V. The application constructs an attention mechanism through the input feature map and the CAM, and the process of acquiring the output feature map comprises the following steps: passing block level CAM throughThe linear transformation of (2) is used as the calculation of the attention mechanism with a plurality of block-level feature maps and a plurality of block-level CAM to construct an output feature map.
I.e. first the channel and class correlation matrix is acquired. And secondly, acquiring long-distance dependence among the classes in the block, and acquiring a global correlation matrix of the long-distance dependence among the global classes by aggregating a plurality of block-level correlation matrices. And finally, calculating contrast loss after sampling positive and negative samples, and returning the loss to a network, so that the model is forced to learn a more structural channel and category correlation matrix through contrast learning. Therefore, long dependency relationships in blocks and long dependency relationships among blocks are established simultaneously, and the dependence of semantic segmentation tasks on global semantic information and fine-grained information is met.
6. Acquiring semantic segmentation masks
Obtained by the operation of contrast attention among blocksFeature map of->And remodel it intoA feature map of a dimension that has the same dimensions as the input feature map. Then, through up-sampling operation, a semantic segmentation mask with the same size as the original image is obtained, so that an accurate and robust segmentation result is obtained, and segmentation loss is calculated by the segmentation result and a segmentation label>
7. Calculate losses and perform gradient pass back
Predicting loss by CAM classification under class-level label guidanceSemantic segmentation loss->And contrast loss of contrast attention between blocks +.>Summarizing to obtain final loss->And gradient back transmission is performed, and the application defines the gradient back transmission as follows:
wherein,,、/>、/>the weight coefficient of each loss is respectively 1, 0.4 and 1 through a large number of experiments.
8. Model test and application
Inputting the test set into a trained segmentation model, outputting a predicted segmentation result, and evaluating the model performance through an average cross-over ratio (mIoU).
The tested model can be used for a site image segmentation task, and for site monitoring and tower crane shooting images, segmented areas of targets in the images are obtained through the segmentation model.
In terms of local information, the present embodiment uses a partitioned attention mechanism to establish correlations between channels and categories of the original feature map, and the intra-block attention mechanism is more beneficial to how the network focuses on mining more fine-grained information than the traditional attention mechanism. In the global information layer, the present embodiment proposes inter-block comparison learning, and first, each channel and a channel with a higher response value in the class correlation matrix are taken as positive samples of the corresponding class, and a channel with a lower response value is taken as a negative sample of the corresponding class. More specifically, positive and negative samples in blocks of each category are expanded to global positive and negative samples, so that the model is promoted to acquire fine-granularity local semantic information while mining more distinguishable global semantic information. It is noted that this embodiment maintains a learnable weight matrix in terms of how to select positive and negative samples, and ensures that the positive and negative samples are selected with a better fit while not losing any positive and negative sample information.
The application digs the dependency relationship of image category to the channel by carrying out the blocking operation to the feature map and the CAM and calculating the channel correlation of the feature map and the CAM in the block dimension. In order to enable the generated feature map and the embedding space of the channel correlation matrix of the CAM to have stronger characterization capability, the application provides a block-level contrast attention mechanism which not only can model long-distance dependency of images, but also establishes short-distance dependency between channels and categories based on block-level features, thereby simultaneously ensuring the requirements of semantic segmentation tasks on coarse-granularity and fine-granularity information. And the correlation between the channels and the categories is emphasized by contrast learning, and block-level positive and negative samples of each category are fused to obtain global positive and negative samples of each category, so that the correlation between the blocks is established, and the segmentation model has better characterization capability and robust segmentation performance.
Example two
The embodiment provides an intelligent image segmentation system based on an inter-block contrast attention mechanism.
An intelligent building image segmentation system based on inter-block contrast attention mechanism, comprising:
a prediction module configured to: based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;
a segmentation model training module configured to: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; establishing a block-level correlation matrix based on a plurality of block-level feature graphs and a plurality of block-level CAMs; mapping the block-level correlation matrix into a global correlation matrix; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix, obtaining an output feature map, obtaining an output result of the segmentation model according to the output feature map, optimizing hyper-parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label, and obtaining a trained segmentation model. It should be noted that the prediction module and the segmentation model training module are the same as the example and application scenario implemented by the steps in the first embodiment, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
Example III
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the intelligent worker image segmentation method based on the inter-block contrast attention mechanism as described in the above embodiment.
Example IV
The present embodiment provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps in the intelligent image segmentation method based on the inter-block contrast attention mechanism according to the above embodiment when executing the program.
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (7)

1. The intelligent map image segmentation method based on the inter-block contrast attention mechanism is characterized by comprising the following steps of:
based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;
the training process of the segmentation model comprises the following steps: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; based on a plurality of block-level feature graphs and a plurality of block-level CAMs, a block-level correlation matrix is established, specifically: respectively carrying out matrix transformation on a plurality of block-level feature graphs and a plurality of block-level CAM, and then carrying out matrix multiplication to obtain a block-level correlation matrix of long dependency relationship between channels and categories; mapping the block-level correlation matrix into a global correlation matrix, specifically: mapping the block-level correlation matrix into a global correlation matrix through a full connection layer; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix to obtain an output feature map, and obtaining an output result of the segmentation model according to the output feature map, wherein the method specifically comprises the following steps of: processing the dimension of the output feature map to be the same as the dimension of the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through up-sampling, namely, obtaining an output result of a segmentation model; and optimizing the super parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label to obtain a trained segmentation model.
2. The intelligent work image segmentation method based on inter-block contrast attention mechanism of claim 1, wherein the calculation process of the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix comprises: and taking the channel with the response value higher than the set value in the block-level correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the block-level correlation matrix and the negative sample similarity of the block-level correlation matrix.
3. The intelligent job image segmentation method based on inter-block contrast attention mechanism according to claim 1, wherein the calculation process of the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix comprises: and taking the channel with the response value higher than the set value in the global correlation matrix as a positive sample, taking the rest as a negative sample, introducing a weight matrix, and performing contrast learning to obtain the positive sample similarity of the global correlation matrix and the negative sample similarity of the global correlation matrix.
4. The intelligent job image segmentation method based on inter-block contrast attention mechanism according to claim 1, wherein the loss function comprises:
under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of prediction loss functions of the block level CAM;
performing a semantic segmentation loss function of the target segmentation region;
and a contrast loss function between inter-block long dependencies and intra-block long dependencies.
5. An intelligent building image segmentation system based on inter-block contrast attention mechanism, comprising:
a prediction module configured to: based on a to-be-segmented construction site scene image, predicting a target segmentation area of the construction site scene image by adopting a trained segmentation model;
a segmentation model training module configured to: acquiring a construction site scene image training sample marked with a segmentation label; extracting a feature map of a scene image training sample of a construction site; performing single-heat encoding treatment on the split labels to obtain a plurality of label vectors; performing block processing and maximum pooling processing on the feature map, and obtaining block class classification labels according to the label vectors; dividing the feature map into a plurality of block-level feature maps; under the supervision of block level classification labels, mapping the block level feature map to obtain a plurality of block level CAMs; based on a plurality of block-level feature graphs and a plurality of block-level CAMs, a block-level correlation matrix is established, specifically: respectively carrying out matrix transformation on a plurality of block-level feature graphs and a plurality of block-level CAM, and then carrying out matrix multiplication to obtain a block-level correlation matrix of long dependency relationship between channels and categories; mapping the block-level correlation matrix into a global correlation matrix, specifically: mapping the block-level correlation matrix into a global correlation matrix through a full connection layer; calculating positive sample similarity of the block-level correlation matrix, negative sample similarity of the block-level correlation matrix, positive sample similarity of the global correlation matrix and negative sample similarity of the global correlation matrix to obtain an output feature map, and obtaining an output result of the segmentation model according to the output feature map, wherein the method specifically comprises the following steps of: processing the dimension of the output feature map to be the same as the dimension of the feature map, and obtaining a semantic segmentation mask with the same size as the feature map through up-sampling, namely, obtaining an output result of a segmentation model; and optimizing the super parameters of the segmentation model by adopting a loss function according to the output result of the segmentation model and the segmentation label to obtain a trained segmentation model.
6. A computer readable storage medium, on which a computer program is stored, which program, when being executed by a processor, implements the steps of the intelligent worker image segmentation method based on inter-block contrast attention mechanism as claimed in any of claims 1-4.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps in the intelligent inter-block contrast attention based method of any of claims 1-4 when the program is executed.
CN202310935833.6A 2023-07-28 2023-07-28 Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism Active CN116664845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310935833.6A CN116664845B (en) 2023-07-28 2023-07-28 Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310935833.6A CN116664845B (en) 2023-07-28 2023-07-28 Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism

Publications (2)

Publication Number Publication Date
CN116664845A CN116664845A (en) 2023-08-29
CN116664845B true CN116664845B (en) 2023-10-13

Family

ID=87717426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310935833.6A Active CN116664845B (en) 2023-07-28 2023-07-28 Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism

Country Status (1)

Country Link
CN (1) CN116664845B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118230261B (en) * 2024-05-27 2024-07-19 四川省建筑科学研究院有限公司 Intelligent construction site construction safety early warning method and system based on image data

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801104A (en) * 2021-01-20 2021-05-14 吉林大学 Image pixel level pseudo label determination method and system based on semantic segmentation
CN113657393A (en) * 2021-08-16 2021-11-16 山东建筑大学 Shape prior missing image semi-supervised segmentation method and system
CN114283162A (en) * 2021-12-27 2022-04-05 河北工业大学 Real scene image segmentation method based on contrast self-supervision learning
CN114359873A (en) * 2022-01-06 2022-04-15 中南大学 Weak supervision vehicle feasible region segmentation method integrating road space prior and region level characteristics
CN115019039A (en) * 2022-05-26 2022-09-06 湖北工业大学 Example segmentation method and system combining self-supervision and global information enhancement
CN115953784A (en) * 2022-12-27 2023-04-11 江南大学 Laser coding character segmentation method based on residual error and feature blocking attention
WO2023056889A1 (en) * 2021-10-09 2023-04-13 百果园技术(新加坡)有限公司 Model training and scene recognition method and apparatus, device, and medium
CN116229465A (en) * 2023-02-27 2023-06-06 哈尔滨工程大学 Ship weak supervision semantic segmentation method
WO2023102223A1 (en) * 2021-12-03 2023-06-08 Innopeak Technology, Inc. Cross-coupled multi-task learning for depth mapping and semantic segmentation

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801104A (en) * 2021-01-20 2021-05-14 吉林大学 Image pixel level pseudo label determination method and system based on semantic segmentation
CN113657393A (en) * 2021-08-16 2021-11-16 山东建筑大学 Shape prior missing image semi-supervised segmentation method and system
WO2023056889A1 (en) * 2021-10-09 2023-04-13 百果园技术(新加坡)有限公司 Model training and scene recognition method and apparatus, device, and medium
WO2023102223A1 (en) * 2021-12-03 2023-06-08 Innopeak Technology, Inc. Cross-coupled multi-task learning for depth mapping and semantic segmentation
CN114283162A (en) * 2021-12-27 2022-04-05 河北工业大学 Real scene image segmentation method based on contrast self-supervision learning
CN114359873A (en) * 2022-01-06 2022-04-15 中南大学 Weak supervision vehicle feasible region segmentation method integrating road space prior and region level characteristics
CN115019039A (en) * 2022-05-26 2022-09-06 湖北工业大学 Example segmentation method and system combining self-supervision and global information enhancement
CN115953784A (en) * 2022-12-27 2023-04-11 江南大学 Laser coding character segmentation method based on residual error and feature blocking attention
CN116229465A (en) * 2023-02-27 2023-06-06 哈尔滨工程大学 Ship weak supervision semantic segmentation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep gated attention networks for large-scale street-level scene segmentation;Zhang, Pingping 等;《PATTERN RECOGNITION》;第702-714页 *
基于全局注意力机制的语义分割方法研究;彭启伟;冯杰;吕进;余磊;程鼎;;现代信息科技(04);第110-112页 *
弱监督学习语义分割方法综述;李宾皑;李颖;郝鸣阳;顾书玉;;数字通信世界(07);第263-265页 *

Also Published As

Publication number Publication date
CN116664845A (en) 2023-08-29

Similar Documents

Publication Publication Date Title
JP2022058915A (en) Method and device for training image recognition model, method and device for recognizing image, electronic device, storage medium, and computer program
US20210406592A1 (en) Method and apparatus for visual question answering, computer device and medium
JP2021531541A (en) Systems and methods for geolocation prediction
CN113379627A (en) Training method of image enhancement model and method for enhancing image
CN109977832B (en) Image processing method, device and storage medium
JP7273129B2 (en) Lane detection method, device, electronic device, storage medium and vehicle
CN113408662B (en) Image recognition and training method and device for image recognition model
CN116664845B (en) Intelligent engineering image segmentation method and system based on inter-block contrast attention mechanism
US20230030431A1 (en) Method and apparatus for extracting feature, device, and storage medium
CN112954399B (en) Image processing method and device and computer equipment
CN115861462B (en) Training method and device for image generation model, electronic equipment and storage medium
CN114913339A (en) Training method and device of feature map extraction model
CN111914809B (en) Target object positioning method, image processing method, device and computer equipment
CN113704276A (en) Map updating method and device, electronic equipment and computer readable storage medium
CN112132867B (en) Remote sensing image change detection method and device
CN113657596B (en) Method and device for training model and image recognition
CN114419338B (en) Image processing method, image processing device, computer equipment and storage medium
CN115937993A (en) Living body detection model training method, living body detection device and electronic equipment
CN116188478A (en) Image segmentation method, device, electronic equipment and storage medium
CN112990041B (en) Remote sensing image building extraction method based on improved U-net
CN115273148A (en) Pedestrian re-recognition model training method and device, electronic equipment and storage medium
CN112529116A (en) Scene element fusion processing method, device and equipment and computer storage medium
CN112215205A (en) Target identification method and device, computer equipment and storage medium
CN116194964A (en) System and method for training machine learning visual attention models
Tan et al. BSIRNet: A road extraction network with bidirectional spatial information reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant