CN117853732A - Self-supervision re-digitizable terahertz image dangerous object instance segmentation method - Google Patents
Self-supervision re-digitizable terahertz image dangerous object instance segmentation method Download PDFInfo
- Publication number
- CN117853732A CN117853732A CN202410087737.5A CN202410087737A CN117853732A CN 117853732 A CN117853732 A CN 117853732A CN 202410087737 A CN202410087737 A CN 202410087737A CN 117853732 A CN117853732 A CN 117853732A
- Authority
- CN
- China
- Prior art keywords
- self
- feature
- terahertz
- dangerous goods
- human body
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 88
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000007689 inspection Methods 0.000 claims abstract description 42
- 238000001514 detection method Methods 0.000 claims abstract description 39
- 238000000605 extraction Methods 0.000 claims abstract description 35
- 230000006870 function Effects 0.000 claims description 42
- 238000010586 diagram Methods 0.000 claims description 30
- 230000004927 fusion Effects 0.000 claims description 20
- 238000005070 sampling Methods 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 5
- 238000000926 separation method Methods 0.000 claims description 5
- 238000002372 labelling Methods 0.000 claims description 4
- 238000003384 imaging method Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 231100001261 hazardous Toxicity 0.000 claims 1
- 238000005516 engineering process Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000013178 mathematical model Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000002360 explosive Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000000843 powder Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a self-supervision and re-digitization terahertz image dangerous object example segmentation method, which comprises the following steps: carrying out data enhancement on the terahertz human body hidden dangerous article security inspection image data set; pre-training a self-supervision learning image mask modeling enhancement model through the enhanced data set, and acquiring encoder parameters of the self-supervision learning image mask modeling enhancement model; extracting a backbone network by taking an encoder of a self-supervision learning image mask modeling enhancement model as an initial characteristic of an example segmentation model; migrating encoder parameters, and fine-tuning an initial feature extraction backbone network of the instance segmentation model through an unreinforced terahertz human body hidden dangerous article security inspection image data set to obtain a feature extraction backbone network of the instance segmentation model; inputting the human body hidden dangerous goods security inspection image to be segmented into a feature extraction backbone network of an example segmentation model, extracting multi-scale features, integrating the multi-scale features, and dynamically decoupling to obtain dangerous goods detection segmentation results.
Description
Technical Field
The invention belongs to the field of terahertz image dangerous object instance segmentation, and particularly relates to a self-supervision and re-digitizable terahertz image dangerous object instance segmentation method.
Background
Terahertz is a radiation between microwave and infrared, and its wavelength range is [30 μm-3000 μm ], and has been widely used because of its strong penetrability and harmlessness to human body. Terahertz security inspection cameras (TSSCs) utilize terahertz rays to perform non-contact scanning on a human body, and can penetrate through various materials to detect hidden objects. Compared with the traditional metal detector, the TSSC can detect explosive liquid and powder, can display the shape and the position of suspicious objects, and has great application potential in the field of security inspection. Early terahertz security inspection image dangerous goods detection technologies are mostly based on manual identification and image processing schemes, a few schemes adopting deep learning technology depend on a large amount of high-quality data, and the problems of low cost, high-precision detection and the like cannot be realized, and have great influence on the practicability of terahertz security inspection equipment. With the development of deep learning technology and self-supervision learning, the self-supervision learning mode can enhance the feature extraction capability of the model without depending on a large amount of labeled data. In addition, the balance of the reasoning speed and the precision can be realized based on the structure re-parameterization method. Therefore, the development of dangerous goods example segmentation with low cost and high precision can be realized under a small amount of sample data with labels, which is very helpful for the application and development of terahertz image detection segmentation technology.
Disclosure of Invention
Aiming at the problems of high cost and low precision of dangerous goods instance segmentation in a terahertz image caused by poor quality of the terahertz image and difficult sample labeling, the invention provides a self-supervision and re-digitization terahertz image dangerous goods instance segmentation method, which is used for reducing the segmentation detection training cost of dangerous goods in the terahertz image and improving the detection precision by combining a self-supervision learning image mask modeling method and a deep learning instance segmentation algorithm.
In order to achieve the above purpose, the invention provides a self-supervision and re-digitizable terahertz image dangerous article example segmentation method, which specifically comprises the following steps:
acquiring a terahertz human body hidden dangerous article security inspection image data set, and carrying out data enhancement on the terahertz human body hidden dangerous article security inspection image data set;
pre-training a self-supervision learning image mask modeling enhancement model through the enhanced terahertz human body hidden dangerous article security inspection image dataset to obtain encoder parameters of the self-supervision learning image mask modeling enhancement model;
extracting a backbone network by taking an encoder of the self-supervision learning image mask modeling enhancement model as an initial feature of an example segmentation model;
migrating the encoder parameters, and fine-tuning an initial feature extraction backbone network of the instance segmentation model through the non-enhanced terahertz human body hidden dangerous goods security inspection image data set to obtain a feature extraction backbone network of the instance segmentation model;
inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the feature extraction backbone network of the example segmentation model, extracting multi-scale features, integrating the multi-scale features, and dynamically decoupling to obtain the detection segmentation result of the dangerous goods.
Optionally, obtaining the terahertz human body hidden dangerous article security inspection image dataset, and performing data enhancement on the terahertz human body hidden dangerous article security inspection image dataset includes:
acquiring a plurality of target images by using terahertz imaging equipment;
labeling outlines of dangerous goods in a plurality of target images, converting coordinates of outline points into tag data, and acquiring a terahertz human body hidden dangerous goods security inspection image data set;
adopting an automatic data enhancement method based on MedAugment to automatically enhance the terahertz human body hidden dangerous article security inspection image data set;
the automatic data enhancement method based on the MedAugment comprises a pixel enhancement space and a space enhancement space.
Optionally, the self-supervision learning image mask modeling enhancement model is used for extracting feature capability by a Spark-based image mask modeling method;
the Spark-based image mask modeling method comprises an encoder and a decoder, wherein the enhanced terahertz human body hidden dangerous goods security inspection image dataset is randomly shielded, non-shielded pixels are used as sparse voxels, sparse convolution is adopted for encoding, the encoder outputs a plurality of layers of characteristic diagrams, the characteristics of target levels are obtained, and a multi-level characteristic fusion module collects a plurality of characteristics of the target levels for optimization and transmits the characteristics to the decoder for image reconstruction.
Optionally, before the multi-level feature fusion module collects a plurality of features of the target level to optimize and transmits the features to the decoder for image reconstruction, the multi-level feature fusion module further includes:
in each hierarchy, based on the characteristics of the target level, adjusting each j-th level through maximum pooling and up-sampling operation to realize characteristic size alignment, and adopting convolution to realize channel number alignment, namely:
wherein i is greater than or equal to 1, j<l, conv (·) represents the number of 1×1 convolutionally changed channels to C i Up (-) represents bilinear interpolation, id (-) represents identity mapping, M (-) represents maximum pooling feature map S i To H i ×W i Resolution, S' ij Is the feature diagram output after the j-th level in the i-th layer is aligned, H i Reference height, W, of the ith layer feature map i And l is the number of output layers of the encoder, which is the reference width of the i-th layer characteristic diagram.
Optionally, pre-training the self-supervised learning image mask modeling enhancement model through the enhanced terahertz human body hidden dangerous article security inspection image dataset, and acquiring encoder parameters of the self-supervised learning image mask modeling enhancement model includes:
the self-supervision learning image mask modeling enhancement model comprises a feature refinement fusion module;
taking the self-supervision learning image mask modeling enhancement model as an encoder for self-supervision learning, and fusing the images in the enhanced terahertz human body hidden dangerous article security inspection image dataset through the feature refinement fusion module to obtain a fused feature map;
filling all blank positions of sparse feature mapping, adopting the decoder to decode and reconstruct the fused feature map, completing the pre-training of the self-supervision learning image mask modeling enhancement model, and further obtaining encoder parameters of the self-supervision learning image mask modeling enhancement model.
Optionally, the pre-training process is:
wherein the method comprises the steps of,Γ i (. Cndot.) and f i Respectively representing hidden functions of an MFRF fusion mechanism and a fused characteristic diagram, i represents an ith output layer, ψ i (. Cndot.) and D i Respectively represent mask embedding [ M ] i ]Filling operation and filled feature map, B i (. Cndot.) represents the implicit function of successive blocks of the decoder, S 1 、S 2 、S 3 Respectively corresponding to the feature maps of different scales in the trunk.
Optionally, migrating the encoder parameter, fine-tuning an initial feature extraction backbone network of the instance segmentation model through the non-enhanced terahertz human body hidden dangerous article security inspection image dataset, and obtaining the feature extraction backbone network of the instance segmentation model includes:
inputting the unreinforced terahertz human body hidden dangerous goods security inspection image data set into the instance segmentation model by migrating the encoder parameters, and performing fine tuning training on an initial feature extraction backbone network of the instance segmentation model by utilizing a structure re-parameterization module to obtain a feature extraction backbone network of the instance segmentation model;
the structure re-parameterization module is used for carrying out convolution and separation on a given initial feature map, obtaining a first feature map and a second feature map, inputting the first feature map into the DBB module, fusing the feature map output by the DBB module with the first feature map and the second feature map, and outputting a target feature map through convolution operation;
the structure re-parameterization module is as follows:
f 1 ,f 2 =split(Conv(f))
f d1 =DBB(f 1 )
D=Conv(Convcat(f 1 ,f 2 ,f d1 ))
wherein f 1 ,f 2 As a characteristic diagram, f d1 Conv (-) and DBB (-) represent the convolution and the hidden function of the DBB module, split (-) and Concat (-) represent the separation and fusion operation respectively, and D is the output feature diagram.
Optionally, performing fine tuning training on the initial feature extraction backbone network of the example segmentation model further includes: optimizing the loss function by adopting an AdamW function, namely:
Loss b =loss CIoU +Loss DFL
wherein Loss (Θ) is a Loss function, N is the number of detection layers, loss b For the frame regression Loss function, loss c To classify the loss function, alpha 1 、α 2 Loss as a weight coefficient of a Loss function CIoU Loss of function for bounding box, loss DFL Is a prediction block loss function.
Optionally, inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the feature extraction backbone network of the example segmentation model, extracting multi-scale features, integrating the multi-scale features, and performing dynamic decoupling, where obtaining the detection segmentation result of the dangerous goods includes:
inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the instance segmentation model, and extracting multi-scale features through a feature extraction backbone network of the instance segmentation model;
integrating the multi-scale features, and dynamically decoupling through a dynamic decoupling head module to obtain the dangerous goods detection segmentation result;
the dangerous goods detection segmentation result comprises a dangerous goods detection frame, a dangerous goods category code number, a dangerous goods segmentation mask and a result image of prediction confidence.
Optionally, the dynamic decoupling head module is configured to give a plurality of feature graphs with sequentially halved sizes, respectively reduce the number of channels by convolution to obtain a plurality of convolved feature graphs, input the feature graphs into the dynamic attention module, and perform dangerous article feature detection segmentation by using the decoupling head;
the dynamic decoupling head module is:
F s =Conv(F 1 ),F m =Conv(F 2 ),F b =Conv(F 3 )
A=Dyhead k (F 1 ,F 2 ,F 3 )
O l =Seg l (A)
wherein F is 1 、F 2 、F 3 、F s 、F m 、F b Are all feature graphs, and F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512 ×20×20 ,F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512×20×20 ,Dyhead k (. Cndot.) represents the hidden function of the dynamic attention module consisting of k consecutive Dyhead blocks, seg (-) represents the hidden function of the decoupled detection head, conv (-) represents the convolution function, O l And (3) representing an output characteristic diagram of the decoupling head of the first layer, wherein A is the output characteristic diagram processed by the dynamic attention module.
The invention has the following beneficial effects:
according to the invention, through a self-supervision learning image mask modeling method and a multi-level feature refinement fusion mechanism, the feature extraction capability of the model can be enhanced under the training of a small number of samples, so that the cost of data annotation and model training is reduced; a structural re-parameterization module is designed in the feature extraction backbone network, so that the reasoning speed and the detection accuracy of the example segmentation model can be improved; in addition, a dynamic decoupling head is designed in the detection head, so that the parameter number of the example segmentation model can be effectively reduced, and the detection precision can be improved; the method is beneficial to application research of the terahertz security inspection image dangerous object example segmentation technology.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a flow chart of a method for segmenting a dangerous article instance of a self-supervision re-digitized terahertz image in an embodiment of the invention;
FIG. 2 is a schematic diagram of a self-supervised learning image mask modeling network architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an example partitioning algorithm network architecture according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a structural re-parameterized module architecture according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a dynamic decoupling head module architecture according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
As shown in fig. 1, in this embodiment, a method for dividing a dangerous article instance of a self-supervision and re-digitizable terahertz image is provided, which includes: firstly, manufacturing a terahertz human body hidden dangerous article security inspection image data set, and then automatically enhancing the data set to construct another data set for self-supervision learning; then, performing self-supervision learning image mask modeling by using the enhanced data set, and using the self-supervision learning image mask modeling for a trunk of a pre-training instance segmentation model; and finally, migrating parameters of the self-supervision pre-training trunk, and finely adjusting a training example segmentation algorithm on an unreinforced data set to obtain a human body hidden dangerous article example segmentation image, wherein the specific implementation steps are as follows:
firstly, manufacturing a terahertz human body hidden dangerous article security inspection image data set, and automatically enhancing the terahertz human body hidden dangerous article security inspection image data set to construct another data set for self-supervision learning;
n target images, each designated as i=1, 2, …, N, were captured with a terahertz imaging apparatus. Marking the outline of dangerous goods in N target images by using points, and connecting the points into an irregular outlineAnd then converting the coordinates of the contour points to generate tag data, and obtaining input image sets A and Q containing Z and M images for instance segmentation training and verification. An automatic data enhancement method based on MedAugment is adopted, as shown in figure 1. The method consists of two enhancement spaces, namely a pixel enhancement space A containing six enhancement operations p And a space enhancement space A containing eight enhancement operations s . Employing a highly randomized operation sampling strategy for sampling operations from two enhancement spaces, each enhancement branch is set to consist of M data enhancement operations performed sequentially, generating n types of slave A p And A s The combination mode of the middle sampling corresponds to the enhancement branch number n. The data set B containing Z' -5 XZ images can be obtained after the Z training images are subjected to automatic data enhancement and is used for self-supervision learning training.
Step two, performing self-supervision learning image mask modeling by using the enhanced data set, wherein the self-supervision learning image mask modeling is used for pre-training a trunk of an instance segmentation model;
as shown in fig. 2, the feature extraction capability of the self-supervised learning image mask modeling enhancement model is realized by a Spark-based image mask modeling method. The method comprises an encoder and a decoder, wherein an image in the enhanced data set is used as an input image, the input image is randomly shielded, non-shielded pixels are used as sparse voxels of the 3D point cloud, and sparse convolution is adopted for encoding. An image is reconstructed from the multi-scale encoded features using a UNet-style layered decoder. After the self-supervised learning training is completed, only the parameters of the encoder are retained. The feature extraction trunk in the example segmentation model is used as an encoder of the image mask reconstruction model, and parameters reserved after training are migrated to the example segmentation model for fine tuning training, so that the feature extraction capacity of the example segmentation model can be enhanced under the condition of few samples, and the data labeling and model training cost can be reduced.
In the feature fusion process of the encoder and the decoder, a multi-level feature refinement fusion Mechanism (MFRF) is designed and integrated into a Spark method to improve the modeling capability of a model to capture multi-level semantic information, thereby improving learning in an instance segmentation taskRepresenting quality. As shown in fig. 2, the designed multi-level feature refinement fusion mechanism is implemented by the following manner: for an enhanced dataset image as input image F.epsilon.R H×W×C The encoder outputs a feature map of level I using S i Features representing the ith level, where 1.ltoreq.i<l, MFRF module gathers features { S 1 ,S 2 ,…,S l-1 Optimizing and delivering to the decoder for further image reconstruction, and delivering the first level features directly to the decoder. In each level i, S is used i For reference, each j-th level is adjusted by a max pooling (Maxpool), up-sampling operation to achieve alignment with i feature size, and a 1×1 convolution is used to achieve channel number alignment, as follows:
wherein i is greater than or equal to 1, j<l, conv (·) represents the number of 1×1 convolutionally changed channels to C i Up (-) represents bilinear interpolation, id (-) represents identity mapping, M (-) represents maximum pooling feature map S i To H i ×W i Resolution, S' ij Is the feature diagram output after the j-th level in the i-th layer is aligned, H i Reference height, W, of the ith layer feature map i And l is the number of output layers of the encoder, which is the reference width of the i-th layer characteristic diagram.
After all the ith level feature maps are aligned, f is obtained by filtering the information and fusing the feature maps by a Feature Refinement Module (FRM) as shown in FIG. 2 i To reduce conflicting information and to improve the characterizability of the tiny features. Taking the second level as an example, MFRF feature size and input feature map S 2 Is aligned, the expression is as follows:
f 2 =FRM(S’ 21 ,S’ 22 ,S’ 23 )
wherein FRM (·) represents the hidden function of the feature refinement module.
The complete self-supervised learning training process can be expressed as follows: taking trunk of example segmentation model as self-monitoringSupervised learning the trained encoder and refining the feature map S with a designed multi-level feature refinement fusion mechanism 1 ,S 2 And S is 3 And fusing, wherein the fused characteristic diagrams respectively correspond to characteristic diagrams with different scales in the trunk. After filling the sparse feature map all blank positions, a block { B ] consisting of three consecutive blocks is used 3 ,B 2 ,B 1 And a decoder formed by the } -and up-sampling layer decodes and reconstructs images on the fused feature images, and then the pre-training is completed. After the pre-training is completed, only parameters of the encoder are reserved and transferred to a backbone of the detector for fine-tuning training. The complete pre-training process expression is as follows:
wherein Γ is i (. Cndot.) and f i Respectively representing hidden functions of an MFRF fusion mechanism and a fused characteristic diagram, i represents an ith output layer, ψ i (. Cndot.) and D i Respectively represent mask embedding [ M ] i ]Filling operation and filled feature map, B i (. Cndot.) represents the implicit function of the decoder's consecutive blocks.
Migrating parameters of a self-supervision pre-training trunk, and fine-tuning a training example segmentation algorithm on an unreinforced data set to obtain a human body hidden dangerous article example segmentation image;
as shown in fig. 3, the example segmentation algorithm obtains an example segmentation image of a dangerous object hidden in a human body by fine tuning an example segmentation algorithm model network consisting of a re-parametrizable feature extraction trunk, an integratable multi-scale feature neck and a lightweight dynamic decoupling head which are pre-trained by self-supervision learning. The unenhanced dataset is input as a training dataset to the instance segmentation model. Specifically, the image I epsilon R of the detected object in the training data set 3×640×640 Inputting the images into an example segmentation network model, and outputting the images to obtain the human hidden dangerous goods segmentation images O E R C×H×W The mathematical model thereof can be expressed as follows:
O=O(x,y,mask)=Φ(I,Ψ)
wherein, O (x, y, mask) represents a dangerous article detection image, phi (·) represents an example segmentation algorithm neural network model, ψ is a parameter of the neural network, (x, y) represents a pixel abscissa of an output detection frame, mask represents a dangerous article segmentation mask, and C, H, W respectively represents the channel number, the height and the width of the image. If dangerous goods exist in the detected object image, outputting an image with a detection frame and a segmentation mask, and further displaying the category code of the dangerous goods and the prediction confidence coefficient of the dangerous goods on the detection frame; otherwise, if no dangerous article exists in the detected object image, the output image is consistent with the input image.
In the feature extraction trunk, the trunk parameters after self-supervision pre-training are subjected to transfer learning, and the instance segmentation algorithm is trained by utilizing structural heavy parameterization and fine tuning training, so that the feature extraction capability of a model is enhanced, the generalization is improved, and the detection precision of hidden dangerous goods is further improved.
A structure re-parameterization module C2DB is designed in the feature extraction process, so that the network realizes decoupling of a multi-branch structure with a complex training stage and a plane structure with an reasoning stage, and further the reasoning speed and the detection precision of the instance segmentation model are improved. As shown in fig. 4, the designed structural re-parameterization module is implemented by the following manner: given a c×h×w=16×160×160 feature map f, the feature map is first convolved and separated to obtain two C/2×h×w=8×160×160 feature maps f 1 And f 2 One of the characteristic diagrams f 1 Input n=1 DBB modules as shown in fig. 4. The DBB module is composed of four branches in the model training stage, and is equivalently converted into a single-branch structure in the reasoning stage to improve the reasoning speed. Finally, outputting the characteristic diagram f of n=1 DBB modules dn And f 1 And f 2 And fusing and outputting a characteristic diagram D after convolution operation. The mathematical model can be expressed as follows:
f 1 ,f 2 =split(Conv(f))
f d1 =DBB(f 1 )
D=Conv(Concat(f 1 ,f 2 ,f d1 ))
wherein,f d1 conv (-) and DBB (-) represent the convolution and hidden functions of the DBB module, split (-) and Concat (-) represent the separation and fusion operations, respectively, for the feature map output by the DBB module.
In the feature fusion neck shown in fig. 3, a neck capable of integrating multi-scale features is designed by utilizing a structural re-parameterization module C2DB, so that the overall parameter number of an example segmentation model is reduced, the feature extraction capability of the model is enhanced, and the detection precision of hidden dangerous goods is improved.
As shown in fig. 3, in order to reduce the number of parameters of the example segmentation model and improve the detection accuracy, a dynamic decoupling head as shown in fig. 5 is designed in the detection head. Given three input feature maps F of successively halved dimensions i ∈i=1, 2,3, map F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512×20×20 The number of channels is reduced by 1X 1 convolution to obtain F s ∈R 64×80×80 ,F m ∈R 64×40×40 ,F b ∈R 64×20×20 And inputting a dynamic attention module consisting of k=6 Dyhead blocks shown in fig. 5, and then carrying out dangerous article feature detection segmentation by adopting a decoupling head shown in fig. 5. The mathematical model can be expressed as follows:
F s =Conv(F 1 ),F m =Conv(F 2 ),F b =Conv(F 3 )
A=Dyhead k (F 1 ,F 2 ,F 3 )
O l =Seg l (A)
wherein A is an output characteristic diagram processed by the dynamic attention module and Dyhead k (. Cndot.) represents the hidden function of the dynamic attention module consisting of k consecutive Dyhead blocks, seg (-) represents the hidden function of the decoupled detection head, conv (-) represents the convolution function, O l And the output characteristic diagram of the decoupling head of the first layer is shown.
In the fine tuning training process of the example segmentation algorithm, an AdamW function is adopted to optimize a Loss function Loss (Θ), and the process is expressed as follows:
Loss b =Loss CIoU +Loss DFL
wherein N is the number of detection layers, loss b For the frame regression Loss function, loss c To classify the loss function, alpha 1 、α 2 Loss as a weight coefficient of a Loss function CIoU Loss of function for bounding box, loss DFL Is a prediction block loss function.
After m=200 training, the optimized parameters can be obtained
For an image I shot by a terahertz security inspection scanning camera, after the feature information is extracted by a re-parametrizable feature extraction trunk, the neck with multi-scale features can be integrated and the head is subjected to a lightweight dynamic decoupling process, dangerous goods detection segmentation result can be obtained, namely, byObtaining a result image comprising a dangerous goods detection frame, a dangerous goods category code, a dangerous goods segmentation mask and a prediction confidence, wherein ∈10>Is the neural network parameter after training and optimization.
The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A self-supervision and re-digitization terahertz image dangerous object example segmentation method is characterized by comprising the following steps of:
acquiring a terahertz human body hidden dangerous article security inspection image data set, and carrying out data enhancement on the terahertz human body hidden dangerous article security inspection image data set;
pre-training a self-supervision learning image mask modeling enhancement model through the enhanced terahertz human body hidden dangerous article security inspection image dataset to obtain encoder parameters of the self-supervision learning image mask modeling enhancement model;
extracting a backbone network by taking an encoder of the self-supervision learning image mask modeling enhancement model as an initial feature of an example segmentation model;
migrating the encoder parameters, and fine-tuning an initial feature extraction backbone network of the instance segmentation model through the non-enhanced terahertz human body hidden dangerous goods security inspection image data set to obtain a feature extraction backbone network of the instance segmentation model;
inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the feature extraction backbone network of the example segmentation model, extracting multi-scale features, integrating the multi-scale features, and dynamically decoupling to obtain the detection segmentation result of the dangerous goods.
2. The method of claim 1, wherein obtaining the terahertz human body concealed dangerous goods security inspection image dataset and performing data enhancement on the terahertz human body concealed dangerous goods security inspection image dataset comprises:
acquiring a plurality of target images by using terahertz imaging equipment;
labeling outlines of dangerous goods in a plurality of target images, converting coordinates of outline points into tag data, and acquiring a terahertz human body hidden dangerous goods security inspection image data set;
adopting an automatic data enhancement method based on MedAugment to automatically enhance the terahertz human body hidden dangerous article security inspection image data set;
the automatic data enhancement method based on the MedAugment comprises a pixel enhancement space and a space enhancement space.
3. The method for segmenting the self-supervision and re-digitizable terahertz image dangerous object example according to claim 1, wherein the self-supervision learning image mask modeling enhancement model is used for extracting feature capability based on a Spark image mask modeling method;
the Spark-based image mask modeling method comprises an encoder and a decoder, wherein the enhanced terahertz human body hidden dangerous goods security inspection image dataset is randomly shielded, non-shielded pixels are used as sparse voxels, sparse convolution is adopted for encoding, the encoder outputs a plurality of layers of characteristic diagrams, the characteristics of target levels are obtained, and a multi-level characteristic fusion module collects a plurality of characteristics of the target levels for optimization and transmits the characteristics to the decoder for image reconstruction.
4. The method for segmenting the self-supervision and re-digitizable terahertz image dangerous object example as claimed in claim 3, wherein the multi-level feature fusion module collects a plurality of features of the target level to optimize and transmits the features to the decoder for image reconstruction, and further comprises:
in each hierarchy, based on the characteristics of the target level, adjusting each j-th level through maximum pooling and up-sampling operation to realize characteristic size alignment, and adopting convolution to realize channel number alignment, namely:
wherein i is greater than or equal to 1, j<l, conv (·) represents the number of 1×1 convolutionally changed channels to C i Up (-) represents bilinear interpolation, id (-) represents identity mapping, M (-) represents maximum pooling feature map S i To H i ×W i Resolution ofRate, S' ij Is the feature diagram output after the j-th level in the i-th layer is aligned, H i Reference height, W, of the ith layer feature map i And l is the number of output layers of the encoder, which is the reference width of the i-th layer characteristic diagram.
5. The method of claim 4, wherein pre-training the self-supervised learning image mask modeling enhancement model from the enhanced terahertz human body suppressed hazardous article security image dataset, obtaining encoder parameters of the self-supervised learning image mask modeling enhancement model comprises:
the self-supervision learning image mask modeling enhancement model comprises a feature refinement fusion module;
taking the self-supervision learning image mask modeling enhancement model as an encoder for self-supervision learning, and fusing the images in the enhanced terahertz human body hidden dangerous article security inspection image dataset through the feature refinement fusion module to obtain a fused feature map;
filling all blank positions of sparse feature mapping, adopting the decoder to decode and reconstruct the fused feature map, completing the pre-training of the self-supervision learning image mask modeling enhancement model, and further obtaining encoder parameters of the self-supervision learning image mask modeling enhancement model.
6. The method for segmenting the dangerous goods instance of the self-supervision re-digitized terahertz image of claim 5, wherein the pre-training process is as follows:
wherein Γ is i (. Cndot.) and f i Respectively representing hidden functions of an MFRF fusion mechanism and a fused characteristic diagram, i represents an ith output layer, ψ i (. Cndot.) and D i Respectively represent mask embedding [ M ] i ]Filling operationMake and fill the characteristic map, B i (. Cndot.) represents the implicit function of successive blocks of the decoder, S 1 、S 2 、S 3 Respectively corresponding to the feature maps of different scales in the trunk.
7. The method of claim 1, wherein migrating the encoder parameters, fine-tuning an initial feature extraction backbone network of the instance segmentation model through the non-enhanced terahertz human-body latent dangerous security image dataset, and obtaining the feature extraction backbone network of the instance segmentation model comprises:
inputting the unreinforced terahertz human body hidden dangerous goods security inspection image data set into the instance segmentation model by migrating the encoder parameters, and performing fine tuning training on an initial feature extraction backbone network of the instance segmentation model by utilizing a structure re-parameterization module to obtain a feature extraction backbone network of the instance segmentation model;
the structure re-parameterization module is used for carrying out convolution and separation on a given initial feature map, obtaining a first feature map and a second feature map, inputting the first feature map into the DBB module, fusing the feature map output by the DBB module with the first feature map and the second feature map, and outputting a target feature map through convolution operation;
the structure re-parameterization module is as follows:
f 1 ,f 2 =split(Conv(f))
f d1 =DBB(f 1 )
D=Conv(Concat(f 1 ,f 2 ,f d1 ))
wherein f 1 ,f 2 As a characteristic diagram, f d1 Conv (-) and DBB (-) represent the convolution and the hidden function of the DBB module, split (-) and Concat (-) represent the separation and fusion operation respectively, and D is the output feature diagram.
8. The method for segmenting the example of the dangerous goods of the self-supervision and re-digitizable terahertz image according to claim 7, wherein the fine tuning training of the initial feature extraction backbone network of the example segmentation model further comprises: optimizing the loss function by adopting an AdamW function, namely:
Loss b =Loss CIoU +Loss DFL
wherein Loss (Θ) is a Loss function, N is the number of detection layers, loss b For the frame regression Loss function, loss c To classify the loss function, alpha 1 、α 2 Loss as a weight coefficient of a Loss function CIoU Loss of function for bounding box, loss DFL Is a prediction block loss function.
9. The method for segmenting the dangerous goods instance of the self-supervision re-digitized terahertz image according to claim 1, wherein inputting the human body hidden dangerous goods security inspection image to be segmented into the feature extraction backbone network of the instance segmentation model, extracting multi-scale features, integrating the multi-scale features and performing dynamic decoupling, and obtaining the dangerous goods detection segmentation result comprises the following steps:
inputting the security inspection image of the hidden dangerous goods of the human body to be segmented into the instance segmentation model, and extracting multi-scale features through a feature extraction backbone network of the instance segmentation model;
integrating the multi-scale features, and dynamically decoupling through a dynamic decoupling head module to obtain the dangerous goods detection segmentation result;
the dangerous goods detection segmentation result comprises a dangerous goods detection frame, a dangerous goods category code number, a dangerous goods segmentation mask and a result image of prediction confidence.
10. The method for segmenting the dangerous goods instance of the self-supervision re-digitized terahertz image according to claim 9, wherein the dynamic decoupling head module is used for giving a plurality of feature images with sequentially halved sizes, respectively obtaining a plurality of convolved feature images by using the convolved reduction channel number, inputting the feature images into the dynamic attention module, and carrying out dangerous goods feature detection segmentation by using the decoupling head;
the dynamic decoupling head module is:
Fs=Conv(F 1 ),F m =Conv(F 2 ),F b =Conv(F 3 )
A=Dyhead k (F 1 ,F 2 ,F 3 )
O l =Seg l (A)
wherein F is 1 、F 2 、F 3 、F s 、F m 、F b Are all feature graphs, and F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512×20×20 ,F 1 ∈R 128×80×80 ,F 2 ∈R 256×40×40 ,F 3 ∈R 512×20×20 ,Dyhead k (. Cndot.) represents the hidden function of the dynamic attention module consisting of k consecutive Dyhead blocks, seg (-) represents the hidden function of the decoupled detection head, conv (-) represents the convolution function, O l And (3) representing an output characteristic diagram of the decoupling head of the first layer, wherein A is the output characteristic diagram processed by the dynamic attention module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410087737.5A CN117853732A (en) | 2024-01-22 | 2024-01-22 | Self-supervision re-digitizable terahertz image dangerous object instance segmentation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410087737.5A CN117853732A (en) | 2024-01-22 | 2024-01-22 | Self-supervision re-digitizable terahertz image dangerous object instance segmentation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117853732A true CN117853732A (en) | 2024-04-09 |
Family
ID=90536166
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410087737.5A Pending CN117853732A (en) | 2024-01-22 | 2024-01-22 | Self-supervision re-digitizable terahertz image dangerous object instance segmentation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117853732A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978892A (en) * | 2019-03-21 | 2019-07-05 | 浙江啄云智能科技有限公司 | A kind of intelligent safety inspection method based on terahertz imaging |
CN113947604A (en) * | 2021-10-26 | 2022-01-18 | 北京地平线信息技术有限公司 | Instance segmentation and instance segmentation network training methods and apparatuses, medium, and device |
CN115019039A (en) * | 2022-05-26 | 2022-09-06 | 湖北工业大学 | Example segmentation method and system combining self-supervision and global information enhancement |
CN115937774A (en) * | 2022-12-06 | 2023-04-07 | 天津大学 | Security inspection contraband detection method based on feature fusion and semantic interaction |
CN116630334A (en) * | 2023-04-23 | 2023-08-22 | 中国科学院自动化研究所 | Method, device, equipment and medium for real-time automatic segmentation of multi-segment blood vessel |
CN117095158A (en) * | 2023-08-23 | 2023-11-21 | 广东工业大学 | Terahertz image dangerous article detection method based on multi-scale decomposition convolution |
-
2024
- 2024-01-22 CN CN202410087737.5A patent/CN117853732A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978892A (en) * | 2019-03-21 | 2019-07-05 | 浙江啄云智能科技有限公司 | A kind of intelligent safety inspection method based on terahertz imaging |
CN113947604A (en) * | 2021-10-26 | 2022-01-18 | 北京地平线信息技术有限公司 | Instance segmentation and instance segmentation network training methods and apparatuses, medium, and device |
CN115019039A (en) * | 2022-05-26 | 2022-09-06 | 湖北工业大学 | Example segmentation method and system combining self-supervision and global information enhancement |
CN115937774A (en) * | 2022-12-06 | 2023-04-07 | 天津大学 | Security inspection contraband detection method based on feature fusion and semantic interaction |
CN116630334A (en) * | 2023-04-23 | 2023-08-22 | 中国科学院自动化研究所 | Method, device, equipment and medium for real-time automatic segmentation of multi-segment blood vessel |
CN117095158A (en) * | 2023-08-23 | 2023-11-21 | 广东工业大学 | Terahertz image dangerous article detection method based on multi-scale decomposition convolution |
Non-Patent Citations (1)
Title |
---|
KEYU TIAN等: ""Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"", 《ARXIV》, 10 January 2023 (2023-01-10), pages 1 - 16 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108230329B (en) | Semantic segmentation method based on multi-scale convolution neural network | |
CN110738697B (en) | Monocular depth estimation method based on deep learning | |
CN109242888B (en) | Infrared and visible light image fusion method combining image significance and non-subsampled contourlet transformation | |
CN109655019A (en) | Cargo volume measurement method based on deep learning and three-dimensional reconstruction | |
CN111612807A (en) | Small target image segmentation method based on scale and edge information | |
CN112651978A (en) | Sublingual microcirculation image segmentation method and device, electronic equipment and storage medium | |
Gu et al. | Automatic and robust object detection in x-ray baggage inspection using deep convolutional neural networks | |
Cui et al. | Improved swin transformer-based semantic segmentation of postearthquake dense buildings in urban areas using remote sensing images | |
CN116311254B (en) | Image target detection method, system and equipment under severe weather condition | |
CN116229452B (en) | Point cloud three-dimensional target detection method based on improved multi-scale feature fusion | |
CN113610070A (en) | Landslide disaster identification method based on multi-source data fusion | |
Hwang et al. | Lidar depth completion using color-embedded information via knowledge distillation | |
TW202225730A (en) | High-efficiency LiDAR object detection method based on deep learning through direct processing of 3D point data to obtain a concise and fast 3D feature to solve the shortcomings of complexity and time-consuming of the current voxel network model | |
CN116664856A (en) | Three-dimensional target detection method, system and storage medium based on point cloud-image multi-cross mixing | |
Wang et al. | SAR-to-optical image translation with hierarchical latent features | |
US20240161304A1 (en) | Systems and methods for processing images | |
CN112633123B (en) | Heterogeneous remote sensing image change detection method and device based on deep learning | |
CN112686830B (en) | Super-resolution method of single depth map based on image decomposition | |
Zhao et al. | Squnet: An high-performance network for crater detection with dem data | |
CN117475428A (en) | Three-dimensional target detection method, system and equipment | |
CN113537397B (en) | Target detection and image definition joint learning method based on multi-scale feature fusion | |
CN117237256A (en) | Shallow sea coral reef monitoring data acquisition method, device and equipment | |
US20230281877A1 (en) | Systems and methods for 3d point cloud densification | |
CN117853732A (en) | Self-supervision re-digitizable terahertz image dangerous object instance segmentation method | |
Aldahoul et al. | Space object recognition with stacking of CoAtNets using fusion of RGB and depth images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |