CN117853732A - Self-supervision re-digitizable terahertz image dangerous object instance segmentation method - Google Patents

Self-supervision re-digitizable terahertz image dangerous object instance segmentation method Download PDF

Info

Publication number
CN117853732A
CN117853732A CN202410087737.5A CN202410087737A CN117853732A CN 117853732 A CN117853732 A CN 117853732A CN 202410087737 A CN202410087737 A CN 202410087737A CN 117853732 A CN117853732 A CN 117853732A
Authority
CN
China
Prior art keywords
self
feature
terahertz
dangerous goods
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410087737.5A
Other languages
Chinese (zh)
Inventor
吴衡
郭梓杰
罗劭娟
陈梅云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202410087737.5A priority Critical patent/CN117853732A/en
Publication of CN117853732A publication Critical patent/CN117853732A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a self-supervision and re-digitization terahertz image dangerous object example segmentation method, which comprises the following steps: carrying out data enhancement on the terahertz human body hidden dangerous article security inspection image data set; pre-training a self-supervision learning image mask modeling enhancement model through the enhanced data set, and acquiring encoder parameters of the self-supervision learning image mask modeling enhancement model; extracting a backbone network by taking an encoder of a self-supervision learning image mask modeling enhancement model as an initial characteristic of an example segmentation model; migrating encoder parameters, and fine-tuning an initial feature extraction backbone network of the instance segmentation model through an unreinforced terahertz human body hidden dangerous article security inspection image data set to obtain a feature extraction backbone network of the instance segmentation model; inputting the human body hidden dangerous goods security inspection image to be segmented into a feature extraction backbone network of an example segmentation model, extracting multi-scale features, integrating the multi-scale features, and dynamically decoupling to obtain dangerous goods detection segmentation results.

Description

Self-supervision re-digitizable terahertz image dangerous object instance segmentation method
Technical Field
The invention belongs to the field of terahertz image dangerous object instance segmentation, and particularly relates to a self-supervision and re-digitizable terahertz image dangerous object instance segmentation method.
Background
Terahertz is a radiation between microwave and infrared, and its wavelength range is [30 μm-3000 μm ], and has been widely used because of its strong penetrability and harmlessness to human body. Terahertz security inspection cameras (TSSCs) utilize terahertz rays to perform non-contact scanning on a human body, and can penetrate through various materials to detect hidden objects. Compared with the traditional metal detector, the TSSC can detect explosive liquid and powder, can display the shape and the position of suspicious objects, and has great application potential in the field of security inspection. Early terahertz security inspection image dangerous goods detection technologies are mostly based on manual identification and image processing schemes, a few schemes adopting deep learning technology depend on a large amount of high-quality data, and the problems of low cost, high-precision detection and the like cannot be realized, and have great influence on the practicability of terahertz security inspection equipment. With the development of deep learning technology and self-supervision learning, the self-supervision learning mode can enhance the feature extraction capability of the model without depending on a large amount of labeled data. In addition, the balance of the reasoning speed and the precision can be realized based on the structure re-parameterization method. Therefore, the development of dangerous goods example segmentation with low cost and high precision can be realized under a small amount of sample data with labels, which is very helpful for the application and development of terahertz image detection segmentation technology.
Disclosure of Invention
Aiming at the problems of high cost and low precision of dangerous goods instance segmentation in a terahertz image caused by poor quality of the terahertz image and difficult sample labeling, the invention provides a self-supervision and re-digitization terahertz image dangerous goods instance segmentation method, which is used for reducing the segmentation detection training cost of dangerous goods in the terahertz image and improving the detection precision by combining a self-supervision learning image mask modeling method and a deep learning instance segmentation algorithm.
In order to achieve the above purpose, the invention provides a self-supervision and re-digitizable terahertz image dangerous article example segmentation method, which specifically comprises the following steps:
acquiring a terahertz human body hidden dangerous article security inspection image data set, and carrying out data enhancement on the terahertz human body hidden dangerous article security inspection image data set;
pre-training a self-supervision learning image mask modeling enhancement model through the enhanced terahertz human body hidden dangerous article security inspection image dataset to obtain encoder parameters of the self-supervision learning image mask modeling enhancement model;
extracting a backbone network by taking an encoder of the self-supervision learning image mask modeling enhancement model as an initial feature of an example segmentation model;
migrating the encoder parameters, and fine-tuning an initial feature extraction backbone network of the instance segmentation model through the non-enhanced terahertz human body hidden dangerous goods security inspection image data set to obtain a feature extraction backbone network of the instance segmentation model;
inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the feature extraction backbone network of the example segmentation model, extracting multi-scale features, integrating the multi-scale features, and dynamically decoupling to obtain the detection segmentation result of the dangerous goods.
Optionally, obtaining the terahertz human body hidden dangerous article security inspection image dataset, and performing data enhancement on the terahertz human body hidden dangerous article security inspection image dataset includes:
acquiring a plurality of target images by using terahertz imaging equipment;
labeling outlines of dangerous goods in a plurality of target images, converting coordinates of outline points into tag data, and acquiring a terahertz human body hidden dangerous goods security inspection image data set;
adopting an automatic data enhancement method based on MedAugment to automatically enhance the terahertz human body hidden dangerous article security inspection image data set;
the automatic data enhancement method based on the MedAugment comprises a pixel enhancement space and a space enhancement space.
Optionally, the self-supervision learning image mask modeling enhancement model is used for extracting feature capability by a Spark-based image mask modeling method;
the Spark-based image mask modeling method comprises an encoder and a decoder, wherein the enhanced terahertz human body hidden dangerous goods security inspection image dataset is randomly shielded, non-shielded pixels are used as sparse voxels, sparse convolution is adopted for encoding, the encoder outputs a plurality of layers of characteristic diagrams, the characteristics of target levels are obtained, and a multi-level characteristic fusion module collects a plurality of characteristics of the target levels for optimization and transmits the characteristics to the decoder for image reconstruction.
Optionally, before the multi-level feature fusion module collects a plurality of features of the target level to optimize and transmits the features to the decoder for image reconstruction, the multi-level feature fusion module further includes:
in each hierarchy, based on the characteristics of the target level, adjusting each j-th level through maximum pooling and up-sampling operation to realize characteristic size alignment, and adopting convolution to realize channel number alignment, namely:
wherein i is greater than or equal to 1, j<l, conv (·) represents the number of 1×1 convolutionally changed channels to C i Up (-) represents bilinear interpolation, id (-) represents identity mapping, M (-) represents maximum pooling feature map S i To H i ×W i Resolution, S' ij Is the feature diagram output after the j-th level in the i-th layer is aligned, H i Reference height, W, of the ith layer feature map i And l is the number of output layers of the encoder, which is the reference width of the i-th layer characteristic diagram.
Optionally, pre-training the self-supervised learning image mask modeling enhancement model through the enhanced terahertz human body hidden dangerous article security inspection image dataset, and acquiring encoder parameters of the self-supervised learning image mask modeling enhancement model includes:
the self-supervision learning image mask modeling enhancement model comprises a feature refinement fusion module;
taking the self-supervision learning image mask modeling enhancement model as an encoder for self-supervision learning, and fusing the images in the enhanced terahertz human body hidden dangerous article security inspection image dataset through the feature refinement fusion module to obtain a fused feature map;
filling all blank positions of sparse feature mapping, adopting the decoder to decode and reconstruct the fused feature map, completing the pre-training of the self-supervision learning image mask modeling enhancement model, and further obtaining encoder parameters of the self-supervision learning image mask modeling enhancement model.
Optionally, the pre-training process is:
wherein the method comprises the steps of,Γ i (. Cndot.) and f i Respectively representing hidden functions of an MFRF fusion mechanism and a fused characteristic diagram, i represents an ith output layer, ψ i (. Cndot.) and D i Respectively represent mask embedding [ M ] i ]Filling operation and filled feature map, B i (. Cndot.) represents the implicit function of successive blocks of the decoder, S 1 、S 2 、S 3 Respectively corresponding to the feature maps of different scales in the trunk.
Optionally, migrating the encoder parameter, fine-tuning an initial feature extraction backbone network of the instance segmentation model through the non-enhanced terahertz human body hidden dangerous article security inspection image dataset, and obtaining the feature extraction backbone network of the instance segmentation model includes:
inputting the unreinforced terahertz human body hidden dangerous goods security inspection image data set into the instance segmentation model by migrating the encoder parameters, and performing fine tuning training on an initial feature extraction backbone network of the instance segmentation model by utilizing a structure re-parameterization module to obtain a feature extraction backbone network of the instance segmentation model;
the structure re-parameterization module is used for carrying out convolution and separation on a given initial feature map, obtaining a first feature map and a second feature map, inputting the first feature map into the DBB module, fusing the feature map output by the DBB module with the first feature map and the second feature map, and outputting a target feature map through convolution operation;
the structure re-parameterization module is as follows:
f 1 ,f 2 =split(Conv(f))
f d1 =DBB(f 1 )
D=Conv(Convcat(f 1 ,f 2 ,f d1 ))
wherein f 1 ,f 2 As a characteristic diagram, f d1 Conv (-) and DBB (-) represent the convolution and the hidden function of the DBB module, split (-) and Concat (-) represent the separation and fusion operation respectively, and D is the output feature diagram.
Optionally, performing fine tuning training on the initial feature extraction backbone network of the example segmentation model further includes: optimizing the loss function by adopting an AdamW function, namely:
Loss b =loss CIoU +Loss DFL
wherein Loss (Θ) is a Loss function, N is the number of detection layers, loss b For the frame regression Loss function, loss c To classify the loss function, alpha 1 、α 2 Loss as a weight coefficient of a Loss function CIoU Loss of function for bounding box, loss DFL Is a prediction block loss function.
Optionally, inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the feature extraction backbone network of the example segmentation model, extracting multi-scale features, integrating the multi-scale features, and performing dynamic decoupling, where obtaining the detection segmentation result of the dangerous goods includes:
inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the instance segmentation model, and extracting multi-scale features through a feature extraction backbone network of the instance segmentation model;
integrating the multi-scale features, and dynamically decoupling through a dynamic decoupling head module to obtain the dangerous goods detection segmentation result;
the dangerous goods detection segmentation result comprises a dangerous goods detection frame, a dangerous goods category code number, a dangerous goods segmentation mask and a result image of prediction confidence.
Optionally, the dynamic decoupling head module is configured to give a plurality of feature graphs with sequentially halved sizes, respectively reduce the number of channels by convolution to obtain a plurality of convolved feature graphs, input the feature graphs into the dynamic attention module, and perform dangerous article feature detection segmentation by using the decoupling head;
the dynamic decoupling head module is:
F s =Conv(F 1 ),F m =Conv(F 2 ),F b =Conv(F 3 )
A=Dyhead k (F 1 ,F 2 ,F 3 )
O l =Seg l (A)
wherein F is 1 、F 2 、F 3 、F s 、F m 、F b Are all feature graphs, and F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512 ×20×20 ,F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512×20×20 ,Dyhead k (. Cndot.) represents the hidden function of the dynamic attention module consisting of k consecutive Dyhead blocks, seg (-) represents the hidden function of the decoupled detection head, conv (-) represents the convolution function, O l And (3) representing an output characteristic diagram of the decoupling head of the first layer, wherein A is the output characteristic diagram processed by the dynamic attention module.
The invention has the following beneficial effects:
according to the invention, through a self-supervision learning image mask modeling method and a multi-level feature refinement fusion mechanism, the feature extraction capability of the model can be enhanced under the training of a small number of samples, so that the cost of data annotation and model training is reduced; a structural re-parameterization module is designed in the feature extraction backbone network, so that the reasoning speed and the detection accuracy of the example segmentation model can be improved; in addition, a dynamic decoupling head is designed in the detection head, so that the parameter number of the example segmentation model can be effectively reduced, and the detection precision can be improved; the method is beneficial to application research of the terahertz security inspection image dangerous object example segmentation technology.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a flow chart of a method for segmenting a dangerous article instance of a self-supervision re-digitized terahertz image in an embodiment of the invention;
FIG. 2 is a schematic diagram of a self-supervised learning image mask modeling network architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an example partitioning algorithm network architecture according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a structural re-parameterized module architecture according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a dynamic decoupling head module architecture according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
As shown in fig. 1, in this embodiment, a method for dividing a dangerous article instance of a self-supervision and re-digitizable terahertz image is provided, which includes: firstly, manufacturing a terahertz human body hidden dangerous article security inspection image data set, and then automatically enhancing the data set to construct another data set for self-supervision learning; then, performing self-supervision learning image mask modeling by using the enhanced data set, and using the self-supervision learning image mask modeling for a trunk of a pre-training instance segmentation model; and finally, migrating parameters of the self-supervision pre-training trunk, and finely adjusting a training example segmentation algorithm on an unreinforced data set to obtain a human body hidden dangerous article example segmentation image, wherein the specific implementation steps are as follows:
firstly, manufacturing a terahertz human body hidden dangerous article security inspection image data set, and automatically enhancing the terahertz human body hidden dangerous article security inspection image data set to construct another data set for self-supervision learning;
n target images, each designated as i=1, 2, …, N, were captured with a terahertz imaging apparatus. Marking the outline of dangerous goods in N target images by using points, and connecting the points into an irregular outlineAnd then converting the coordinates of the contour points to generate tag data, and obtaining input image sets A and Q containing Z and M images for instance segmentation training and verification. An automatic data enhancement method based on MedAugment is adopted, as shown in figure 1. The method consists of two enhancement spaces, namely a pixel enhancement space A containing six enhancement operations p And a space enhancement space A containing eight enhancement operations s . Employing a highly randomized operation sampling strategy for sampling operations from two enhancement spaces, each enhancement branch is set to consist of M data enhancement operations performed sequentially, generating n types of slave A p And A s The combination mode of the middle sampling corresponds to the enhancement branch number n. The data set B containing Z' -5 XZ images can be obtained after the Z training images are subjected to automatic data enhancement and is used for self-supervision learning training.
Step two, performing self-supervision learning image mask modeling by using the enhanced data set, wherein the self-supervision learning image mask modeling is used for pre-training a trunk of an instance segmentation model;
as shown in fig. 2, the feature extraction capability of the self-supervised learning image mask modeling enhancement model is realized by a Spark-based image mask modeling method. The method comprises an encoder and a decoder, wherein an image in the enhanced data set is used as an input image, the input image is randomly shielded, non-shielded pixels are used as sparse voxels of the 3D point cloud, and sparse convolution is adopted for encoding. An image is reconstructed from the multi-scale encoded features using a UNet-style layered decoder. After the self-supervised learning training is completed, only the parameters of the encoder are retained. The feature extraction trunk in the example segmentation model is used as an encoder of the image mask reconstruction model, and parameters reserved after training are migrated to the example segmentation model for fine tuning training, so that the feature extraction capacity of the example segmentation model can be enhanced under the condition of few samples, and the data labeling and model training cost can be reduced.
In the feature fusion process of the encoder and the decoder, a multi-level feature refinement fusion Mechanism (MFRF) is designed and integrated into a Spark method to improve the modeling capability of a model to capture multi-level semantic information, thereby improving learning in an instance segmentation taskRepresenting quality. As shown in fig. 2, the designed multi-level feature refinement fusion mechanism is implemented by the following manner: for an enhanced dataset image as input image F.epsilon.R H×W×C The encoder outputs a feature map of level I using S i Features representing the ith level, where 1.ltoreq.i<l, MFRF module gathers features { S 1 ,S 2 ,…,S l-1 Optimizing and delivering to the decoder for further image reconstruction, and delivering the first level features directly to the decoder. In each level i, S is used i For reference, each j-th level is adjusted by a max pooling (Maxpool), up-sampling operation to achieve alignment with i feature size, and a 1×1 convolution is used to achieve channel number alignment, as follows:
wherein i is greater than or equal to 1, j<l, conv (·) represents the number of 1×1 convolutionally changed channels to C i Up (-) represents bilinear interpolation, id (-) represents identity mapping, M (-) represents maximum pooling feature map S i To H i ×W i Resolution, S' ij Is the feature diagram output after the j-th level in the i-th layer is aligned, H i Reference height, W, of the ith layer feature map i And l is the number of output layers of the encoder, which is the reference width of the i-th layer characteristic diagram.
After all the ith level feature maps are aligned, f is obtained by filtering the information and fusing the feature maps by a Feature Refinement Module (FRM) as shown in FIG. 2 i To reduce conflicting information and to improve the characterizability of the tiny features. Taking the second level as an example, MFRF feature size and input feature map S 2 Is aligned, the expression is as follows:
f 2 =FRM(S’ 21 ,S’ 22 ,S’ 23 )
wherein FRM (·) represents the hidden function of the feature refinement module.
The complete self-supervised learning training process can be expressed as follows: taking trunk of example segmentation model as self-monitoringSupervised learning the trained encoder and refining the feature map S with a designed multi-level feature refinement fusion mechanism 1 ,S 2 And S is 3 And fusing, wherein the fused characteristic diagrams respectively correspond to characteristic diagrams with different scales in the trunk. After filling the sparse feature map all blank positions, a block { B ] consisting of three consecutive blocks is used 3 ,B 2 ,B 1 And a decoder formed by the } -and up-sampling layer decodes and reconstructs images on the fused feature images, and then the pre-training is completed. After the pre-training is completed, only parameters of the encoder are reserved and transferred to a backbone of the detector for fine-tuning training. The complete pre-training process expression is as follows:
wherein Γ is i (. Cndot.) and f i Respectively representing hidden functions of an MFRF fusion mechanism and a fused characteristic diagram, i represents an ith output layer, ψ i (. Cndot.) and D i Respectively represent mask embedding [ M ] i ]Filling operation and filled feature map, B i (. Cndot.) represents the implicit function of the decoder's consecutive blocks.
Migrating parameters of a self-supervision pre-training trunk, and fine-tuning a training example segmentation algorithm on an unreinforced data set to obtain a human body hidden dangerous article example segmentation image;
as shown in fig. 3, the example segmentation algorithm obtains an example segmentation image of a dangerous object hidden in a human body by fine tuning an example segmentation algorithm model network consisting of a re-parametrizable feature extraction trunk, an integratable multi-scale feature neck and a lightweight dynamic decoupling head which are pre-trained by self-supervision learning. The unenhanced dataset is input as a training dataset to the instance segmentation model. Specifically, the image I epsilon R of the detected object in the training data set 3×640×640 Inputting the images into an example segmentation network model, and outputting the images to obtain the human hidden dangerous goods segmentation images O E R C×H×W The mathematical model thereof can be expressed as follows:
O=O(x,y,mask)=Φ(I,Ψ)
wherein, O (x, y, mask) represents a dangerous article detection image, phi (·) represents an example segmentation algorithm neural network model, ψ is a parameter of the neural network, (x, y) represents a pixel abscissa of an output detection frame, mask represents a dangerous article segmentation mask, and C, H, W respectively represents the channel number, the height and the width of the image. If dangerous goods exist in the detected object image, outputting an image with a detection frame and a segmentation mask, and further displaying the category code of the dangerous goods and the prediction confidence coefficient of the dangerous goods on the detection frame; otherwise, if no dangerous article exists in the detected object image, the output image is consistent with the input image.
In the feature extraction trunk, the trunk parameters after self-supervision pre-training are subjected to transfer learning, and the instance segmentation algorithm is trained by utilizing structural heavy parameterization and fine tuning training, so that the feature extraction capability of a model is enhanced, the generalization is improved, and the detection precision of hidden dangerous goods is further improved.
A structure re-parameterization module C2DB is designed in the feature extraction process, so that the network realizes decoupling of a multi-branch structure with a complex training stage and a plane structure with an reasoning stage, and further the reasoning speed and the detection precision of the instance segmentation model are improved. As shown in fig. 4, the designed structural re-parameterization module is implemented by the following manner: given a c×h×w=16×160×160 feature map f, the feature map is first convolved and separated to obtain two C/2×h×w=8×160×160 feature maps f 1 And f 2 One of the characteristic diagrams f 1 Input n=1 DBB modules as shown in fig. 4. The DBB module is composed of four branches in the model training stage, and is equivalently converted into a single-branch structure in the reasoning stage to improve the reasoning speed. Finally, outputting the characteristic diagram f of n=1 DBB modules dn And f 1 And f 2 And fusing and outputting a characteristic diagram D after convolution operation. The mathematical model can be expressed as follows:
f 1 ,f 2 =split(Conv(f))
f d1 =DBB(f 1 )
D=Conv(Concat(f 1 ,f 2 ,f d1 ))
wherein,f d1 conv (-) and DBB (-) represent the convolution and hidden functions of the DBB module, split (-) and Concat (-) represent the separation and fusion operations, respectively, for the feature map output by the DBB module.
In the feature fusion neck shown in fig. 3, a neck capable of integrating multi-scale features is designed by utilizing a structural re-parameterization module C2DB, so that the overall parameter number of an example segmentation model is reduced, the feature extraction capability of the model is enhanced, and the detection precision of hidden dangerous goods is improved.
As shown in fig. 3, in order to reduce the number of parameters of the example segmentation model and improve the detection accuracy, a dynamic decoupling head as shown in fig. 5 is designed in the detection head. Given three input feature maps F of successively halved dimensions ii=1, 2,3, map F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512×20×20 The number of channels is reduced by 1X 1 convolution to obtain F s ∈R 64×80×80 ,F m ∈R 64×40×40 ,F b ∈R 64×20×20 And inputting a dynamic attention module consisting of k=6 Dyhead blocks shown in fig. 5, and then carrying out dangerous article feature detection segmentation by adopting a decoupling head shown in fig. 5. The mathematical model can be expressed as follows:
F s =Conv(F 1 ),F m =Conv(F 2 ),F b =Conv(F 3 )
A=Dyhead k (F 1 ,F 2 ,F 3 )
O l =Seg l (A)
wherein A is an output characteristic diagram processed by the dynamic attention module and Dyhead k (. Cndot.) represents the hidden function of the dynamic attention module consisting of k consecutive Dyhead blocks, seg (-) represents the hidden function of the decoupled detection head, conv (-) represents the convolution function, O l And the output characteristic diagram of the decoupling head of the first layer is shown.
In the fine tuning training process of the example segmentation algorithm, an AdamW function is adopted to optimize a Loss function Loss (Θ), and the process is expressed as follows:
Loss b =Loss CIoU +Loss DFL
wherein N is the number of detection layers, loss b For the frame regression Loss function, loss c To classify the loss function, alpha 1 、α 2 Loss as a weight coefficient of a Loss function CIoU Loss of function for bounding box, loss DFL Is a prediction block loss function.
After m=200 training, the optimized parameters can be obtained
For an image I shot by a terahertz security inspection scanning camera, after the feature information is extracted by a re-parametrizable feature extraction trunk, the neck with multi-scale features can be integrated and the head is subjected to a lightweight dynamic decoupling process, dangerous goods detection segmentation result can be obtained, namely, byObtaining a result image comprising a dangerous goods detection frame, a dangerous goods category code, a dangerous goods segmentation mask and a prediction confidence, wherein ∈10>Is the neural network parameter after training and optimization.
The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A self-supervision and re-digitization terahertz image dangerous object example segmentation method is characterized by comprising the following steps of:
acquiring a terahertz human body hidden dangerous article security inspection image data set, and carrying out data enhancement on the terahertz human body hidden dangerous article security inspection image data set;
pre-training a self-supervision learning image mask modeling enhancement model through the enhanced terahertz human body hidden dangerous article security inspection image dataset to obtain encoder parameters of the self-supervision learning image mask modeling enhancement model;
extracting a backbone network by taking an encoder of the self-supervision learning image mask modeling enhancement model as an initial feature of an example segmentation model;
migrating the encoder parameters, and fine-tuning an initial feature extraction backbone network of the instance segmentation model through the non-enhanced terahertz human body hidden dangerous goods security inspection image data set to obtain a feature extraction backbone network of the instance segmentation model;
inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the feature extraction backbone network of the example segmentation model, extracting multi-scale features, integrating the multi-scale features, and dynamically decoupling to obtain the detection segmentation result of the dangerous goods.
2. The method of claim 1, wherein obtaining the terahertz human body concealed dangerous goods security inspection image dataset and performing data enhancement on the terahertz human body concealed dangerous goods security inspection image dataset comprises:
acquiring a plurality of target images by using terahertz imaging equipment;
labeling outlines of dangerous goods in a plurality of target images, converting coordinates of outline points into tag data, and acquiring a terahertz human body hidden dangerous goods security inspection image data set;
adopting an automatic data enhancement method based on MedAugment to automatically enhance the terahertz human body hidden dangerous article security inspection image data set;
the automatic data enhancement method based on the MedAugment comprises a pixel enhancement space and a space enhancement space.
3. The method for segmenting the self-supervision and re-digitizable terahertz image dangerous object example according to claim 1, wherein the self-supervision learning image mask modeling enhancement model is used for extracting feature capability based on a Spark image mask modeling method;
the Spark-based image mask modeling method comprises an encoder and a decoder, wherein the enhanced terahertz human body hidden dangerous goods security inspection image dataset is randomly shielded, non-shielded pixels are used as sparse voxels, sparse convolution is adopted for encoding, the encoder outputs a plurality of layers of characteristic diagrams, the characteristics of target levels are obtained, and a multi-level characteristic fusion module collects a plurality of characteristics of the target levels for optimization and transmits the characteristics to the decoder for image reconstruction.
4. The method for segmenting the self-supervision and re-digitizable terahertz image dangerous object example as claimed in claim 3, wherein the multi-level feature fusion module collects a plurality of features of the target level to optimize and transmits the features to the decoder for image reconstruction, and further comprises:
in each hierarchy, based on the characteristics of the target level, adjusting each j-th level through maximum pooling and up-sampling operation to realize characteristic size alignment, and adopting convolution to realize channel number alignment, namely:
wherein i is greater than or equal to 1, j<l, conv (·) represents the number of 1×1 convolutionally changed channels to C i Up (-) represents bilinear interpolation, id (-) represents identity mapping, M (-) represents maximum pooling feature map S i To H i ×W i Resolution ofRate, S' ij Is the feature diagram output after the j-th level in the i-th layer is aligned, H i Reference height, W, of the ith layer feature map i And l is the number of output layers of the encoder, which is the reference width of the i-th layer characteristic diagram.
5. The method of claim 4, wherein pre-training the self-supervised learning image mask modeling enhancement model from the enhanced terahertz human body suppressed hazardous article security image dataset, obtaining encoder parameters of the self-supervised learning image mask modeling enhancement model comprises:
the self-supervision learning image mask modeling enhancement model comprises a feature refinement fusion module;
taking the self-supervision learning image mask modeling enhancement model as an encoder for self-supervision learning, and fusing the images in the enhanced terahertz human body hidden dangerous article security inspection image dataset through the feature refinement fusion module to obtain a fused feature map;
filling all blank positions of sparse feature mapping, adopting the decoder to decode and reconstruct the fused feature map, completing the pre-training of the self-supervision learning image mask modeling enhancement model, and further obtaining encoder parameters of the self-supervision learning image mask modeling enhancement model.
6. The method for segmenting the dangerous goods instance of the self-supervision re-digitized terahertz image of claim 5, wherein the pre-training process is as follows:
wherein Γ is i (. Cndot.) and f i Respectively representing hidden functions of an MFRF fusion mechanism and a fused characteristic diagram, i represents an ith output layer, ψ i (. Cndot.) and D i Respectively represent mask embedding [ M ] i ]Filling operationMake and fill the characteristic map, B i (. Cndot.) represents the implicit function of successive blocks of the decoder, S 1 、S 2 、S 3 Respectively corresponding to the feature maps of different scales in the trunk.
7. The method of claim 1, wherein migrating the encoder parameters, fine-tuning an initial feature extraction backbone network of the instance segmentation model through the non-enhanced terahertz human-body latent dangerous security image dataset, and obtaining the feature extraction backbone network of the instance segmentation model comprises:
inputting the unreinforced terahertz human body hidden dangerous goods security inspection image data set into the instance segmentation model by migrating the encoder parameters, and performing fine tuning training on an initial feature extraction backbone network of the instance segmentation model by utilizing a structure re-parameterization module to obtain a feature extraction backbone network of the instance segmentation model;
the structure re-parameterization module is used for carrying out convolution and separation on a given initial feature map, obtaining a first feature map and a second feature map, inputting the first feature map into the DBB module, fusing the feature map output by the DBB module with the first feature map and the second feature map, and outputting a target feature map through convolution operation;
the structure re-parameterization module is as follows:
f 1 ,f 2 =split(Conv(f))
f d1 =DBB(f 1 )
D=Conv(Concat(f 1 ,f 2 ,f d1 ))
wherein f 1 ,f 2 As a characteristic diagram, f d1 Conv (-) and DBB (-) represent the convolution and the hidden function of the DBB module, split (-) and Concat (-) represent the separation and fusion operation respectively, and D is the output feature diagram.
8. The method for segmenting the example of the dangerous goods of the self-supervision and re-digitizable terahertz image according to claim 7, wherein the fine tuning training of the initial feature extraction backbone network of the example segmentation model further comprises: optimizing the loss function by adopting an AdamW function, namely:
Loss b =Loss CIoU +Loss DFL
wherein Loss (Θ) is a Loss function, N is the number of detection layers, loss b For the frame regression Loss function, loss c To classify the loss function, alpha 1 、α 2 Loss as a weight coefficient of a Loss function CIoU Loss of function for bounding box, loss DFL Is a prediction block loss function.
9. The method for segmenting the dangerous goods instance of the self-supervision re-digitized terahertz image according to claim 1, wherein inputting the human body hidden dangerous goods security inspection image to be segmented into the feature extraction backbone network of the instance segmentation model, extracting multi-scale features, integrating the multi-scale features and performing dynamic decoupling, and obtaining the dangerous goods detection segmentation result comprises the following steps:
inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the instance segmentation model, and extracting multi-scale features through a feature extraction backbone network of the instance segmentation model;
integrating the multi-scale features, and dynamically decoupling through a dynamic decoupling head module to obtain the dangerous goods detection segmentation result;
the dangerous goods detection segmentation result comprises a dangerous goods detection frame, a dangerous goods category code number, a dangerous goods segmentation mask and a result image of prediction confidence.
10. The method for segmenting the dangerous goods instance of the self-supervision re-digitized terahertz image according to claim 9, wherein the dynamic decoupling head module is used for giving a plurality of feature images with sequentially halved sizes, respectively obtaining a plurality of convolved feature images by using the convolved reduction channel number, inputting the feature images into the dynamic attention module, and carrying out dangerous goods feature detection segmentation by using the decoupling head;
the dynamic decoupling head module is:
Fs=Conv(F 1 ),F m =Conv(F 2 ),F b =Conv(F 3 )
A=Dyhead k (F 1 ,F 2 ,F 3 )
O l =Seg l (A)
wherein F is 1 、F 2 、F 3 、F s 、F m 、F b Are all feature graphs, and F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512×20×20 ,F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512×20×20 ,Dyhead k (. Cndot.) represents the hidden function of the dynamic attention module consisting of k consecutive Dyhead blocks, seg (-) represents the hidden function of the decoupled detection head, conv (-) represents the convolution function, O l And (3) representing an output characteristic diagram of the decoupling head of the first layer, wherein A is the output characteristic diagram processed by the dynamic attention module.
CN202410087737.5A 2024-01-22 2024-01-22 Self-supervision re-digitizable terahertz image dangerous object instance segmentation method Pending CN117853732A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410087737.5A CN117853732A (en) 2024-01-22 2024-01-22 Self-supervision re-digitizable terahertz image dangerous object instance segmentation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410087737.5A CN117853732A (en) 2024-01-22 2024-01-22 Self-supervision re-digitizable terahertz image dangerous object instance segmentation method

Publications (1)

Publication Number Publication Date
CN117853732A true CN117853732A (en) 2024-04-09

Family

ID=90536166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410087737.5A Pending CN117853732A (en) 2024-01-22 2024-01-22 Self-supervision re-digitizable terahertz image dangerous object instance segmentation method

Country Status (1)

Country Link
CN (1) CN117853732A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978892A (en) * 2019-03-21 2019-07-05 浙江啄云智能科技有限公司 A kind of intelligent safety inspection method based on terahertz imaging
CN113947604A (en) * 2021-10-26 2022-01-18 北京地平线信息技术有限公司 Instance segmentation and instance segmentation network training methods and apparatuses, medium, and device
CN115019039A (en) * 2022-05-26 2022-09-06 湖北工业大学 Example segmentation method and system combining self-supervision and global information enhancement
CN115937774A (en) * 2022-12-06 2023-04-07 天津大学 Security inspection contraband detection method based on feature fusion and semantic interaction
CN116630334A (en) * 2023-04-23 2023-08-22 中国科学院自动化研究所 Method, device, equipment and medium for real-time automatic segmentation of multi-segment blood vessel
CN117095158A (en) * 2023-08-23 2023-11-21 广东工业大学 Terahertz image dangerous article detection method based on multi-scale decomposition convolution

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978892A (en) * 2019-03-21 2019-07-05 浙江啄云智能科技有限公司 A kind of intelligent safety inspection method based on terahertz imaging
CN113947604A (en) * 2021-10-26 2022-01-18 北京地平线信息技术有限公司 Instance segmentation and instance segmentation network training methods and apparatuses, medium, and device
CN115019039A (en) * 2022-05-26 2022-09-06 湖北工业大学 Example segmentation method and system combining self-supervision and global information enhancement
CN115937774A (en) * 2022-12-06 2023-04-07 天津大学 Security inspection contraband detection method based on feature fusion and semantic interaction
CN116630334A (en) * 2023-04-23 2023-08-22 中国科学院自动化研究所 Method, device, equipment and medium for real-time automatic segmentation of multi-segment blood vessel
CN117095158A (en) * 2023-08-23 2023-11-21 广东工业大学 Terahertz image dangerous article detection method based on multi-scale decomposition convolution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KEYU TIAN等: ""Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"", 《ARXIV》, 10 January 2023 (2023-01-10), pages 1 - 16 *

Similar Documents

Publication Publication Date Title
CN108230329B (en) Semantic segmentation method based on multi-scale convolution neural network
CN110738697B (en) Monocular depth estimation method based on deep learning
CN109242888B (en) Infrared and visible light image fusion method combining image significance and non-subsampled contourlet transformation
CN109655019A (en) Cargo volume measurement method based on deep learning and three-dimensional reconstruction
CN111612807A (en) Small target image segmentation method based on scale and edge information
CN112651978A (en) Sublingual microcirculation image segmentation method and device, electronic equipment and storage medium
Gu et al. Automatic and robust object detection in x-ray baggage inspection using deep convolutional neural networks
Cui et al. Improved swin transformer-based semantic segmentation of postearthquake dense buildings in urban areas using remote sensing images
CN116311254B (en) Image target detection method, system and equipment under severe weather condition
CN116229452B (en) Point cloud three-dimensional target detection method based on improved multi-scale feature fusion
CN113610070A (en) Landslide disaster identification method based on multi-source data fusion
Hwang et al. Lidar depth completion using color-embedded information via knowledge distillation
TW202225730A (en) High-efficiency LiDAR object detection method based on deep learning through direct processing of 3D point data to obtain a concise and fast 3D feature to solve the shortcomings of complexity and time-consuming of the current voxel network model
CN116664856A (en) Three-dimensional target detection method, system and storage medium based on point cloud-image multi-cross mixing
Wang et al. SAR-to-optical image translation with hierarchical latent features
US20240161304A1 (en) Systems and methods for processing images
CN112633123B (en) Heterogeneous remote sensing image change detection method and device based on deep learning
CN112686830B (en) Super-resolution method of single depth map based on image decomposition
Zhao et al. Squnet: An high-performance network for crater detection with dem data
CN117475428A (en) Three-dimensional target detection method, system and equipment
CN113537397B (en) Target detection and image definition joint learning method based on multi-scale feature fusion
CN117237256A (en) Shallow sea coral reef monitoring data acquisition method, device and equipment
US20230281877A1 (en) Systems and methods for 3d point cloud densification
CN117853732A (en) Self-supervision re-digitizable terahertz image dangerous object instance segmentation method
Aldahoul et al. Space object recognition with stacking of CoAtNets using fusion of RGB and depth images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination