CN114926667A - Image identification method based on cloud edge-end cooperation - Google Patents

Image identification method based on cloud edge-end cooperation Download PDF

Info

Publication number
CN114926667A
CN114926667A CN202210850570.4A CN202210850570A CN114926667A CN 114926667 A CN114926667 A CN 114926667A CN 202210850570 A CN202210850570 A CN 202210850570A CN 114926667 A CN114926667 A CN 114926667A
Authority
CN
China
Prior art keywords
anchor point
network
sub
model
cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210850570.4A
Other languages
Chinese (zh)
Other versions
CN114926667B (en
Inventor
朱兆亚
朱吕甫
刘鸿涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Jushi Technology Co ltd
Original Assignee
Anhui Jushi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Jushi Technology Co ltd filed Critical Anhui Jushi Technology Co ltd
Priority to CN202210850570.4A priority Critical patent/CN114926667B/en
Publication of CN114926667A publication Critical patent/CN114926667A/en
Application granted granted Critical
Publication of CN114926667B publication Critical patent/CN114926667B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image identification method based on cloud edge-end cooperation, which solves the problems of data uploading delay, large bandwidth requirement, low analysis precision and the like of the conventional end-to-end image detection method, and the main scheme comprises the following steps: s1, constructing a MobileNet detection network model by the edge terminal, carrying out forward propagation calculation on the uploaded image and correspondingly generating anchor point information, extracting and outputting the optimal anchor point information by the model, and judging misjudgment by detecting the optimal anchor point information by the current picture; s2, the cloud end constructs a RetinaNet detection network model, rechecks the side misjudged pictures to finally output whether the optimal anchor point information is consistent with the anchor point information uploaded by the side model, and correspondingly judges whether the side model has misdetection; and S3, the edge-side model extracts anchor point information corresponding to the false detection area in the picture into a feature vector and outputs the feature vector through a sub-network, and cosine similarity matching and judgment are carried out on the feature vector and the feature vector corresponding to the anchor point of the residual false detection picture.

Description

Image identification method based on cloud edge cooperation
Technical Field
The invention relates to the technical field of cloud edge image recognition, in particular to an image recognition method based on cloud edge collaboration.
Background
The intelligent video analysis in the power production environment becomes more and more important for the safe production of power, the traditional intelligent video analysis method has the problem of weak robustness, and the cloud-based or edge-based intelligent video method has the problems of being uneconomical, delayed, misinformation and the like, which can not meet the requirement of the safe production of power. In recent years, cloud computing and deep learning have achieved outstanding achievements in the field of intelligent video analysis, and application of cloud computing and deep learning to safety production of electric power also becomes a research hotspot, but challenges of scattered physical distribution, complex natural conditions and the like exist in an electric power safety scene. Therefore, the intelligent video analysis method based on the cloud edge collaborative framework combined with the deep learning method is adopted, and the safe production of electric power is effectively supported.
Intelligent video analysis by mainstream end-to-end methods faces the following challenges:
although the high-precision deep learning model can be trained through cloud computing if intelligent video analysis is carried out at the cloud end, the cloud end of large-scale video image data collected by terminal equipment often faces the influence of huge uploading delay brought by network bandwidth performance;
although original data such as images and videos can be obtained from nearby terminal nodes by performing edge calculation offline, the limitation of calculation of edge calculation equipment causes that edge equipment often uses some lightweight deep learning models in the video analysis process, so that the analysis accuracy is often not guaranteed.
Therefore, how to implement low-delay and high-precision calculation through the cloud-side collaborative framework to perform intelligent video analysis and guarantee the safe production of electric power becomes an important problem.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides an image identification method based on cloud edge cooperation, which can realize low-delay and high-precision calculation.
In order to solve the technical problems, the invention adopts the technical scheme that: the image identification method based on cloud edge cooperation comprises the following steps:
s1, constructing a MobileNet detection network model by the edge terminal, carrying out forward propagation calculation on the uploaded image and correspondingly generating an anchor point
The method comprises the steps that information is obtained, a model predicts anchor point information through a sub-network, then inhibits the extraction of optimal anchor point information and outputs the optimal anchor point information, judges misjudgment according to the optimal anchor point information detected by a current picture, and uploads a misjudgment picture and the optimal anchor point information in a cloud;
s2, establishing a RetinaNet detection network model by the cloud, and rechecking the images uploaded at the edge end in S1 to finally output the optimal output
If the anchor point information is consistent with the anchor point information uploaded by the side end model, judging whether the side end model has false detection or not correspondingly, and if so, marking a false judgment picture and issuing the false judgment picture to the side end;
s3, the picture is issued in S2 by the asynchronous reasoned side model, and the anchor point information corresponding to the false detection area in the picture is extracted as the characteristics
And outputting the vectors through a sub-network, performing cosine similarity matching on the feature vectors and the feature vectors corresponding to the anchor points of the residual judgment false detection pictures, judging that the feature vectors are false judgment if the feature vectors are higher than a similarity threshold value, and asynchronously uploading the feature vectors lower than the similarity threshold value to the cloud model for rechecking.
Further, the cloud model construction steps are as follows:
constructing a ResNet50 residual network of a backbone network;
using a feature pyramid network FPN to connect different feature levels of ResNet50 through bottom-up, top-down, and cross-connect
Fusing and correspondingly generating a feature map;
adding different sizes to the anchor point sizes corresponding to the characteristic diagram, and giving each anchor point a one-hot vector with the length of K and a vector with the length of 4, wherein K is the number of categories, 4 is the coordinate of box, and IoU is greater than 0.5 and is regarded as a positive sample;
and constructing a sub-network comprising a classification sub-network for predicting the occurrence probability of the target and a frame sub-network for predicting the coordinate offset of the anchor point generation candidate area, wherein the classification sub-network Loss function is calculated by cross entry Loss and the frame sub-network Loss function is calculated by Smooth L1 Loss.
Further, the side end model construction steps are as follows:
sequentially constructing 1-10 layers of convolution neural networks, wherein the 1 st layer and the 2 nd layer are subjected to dimension reduction through two-dimensional convolution with convolution kernel size of 3,
the 3 rd, 4 th, 5 th, 7 th and 9 th layers are inverse residual convolution layers, and the 6 th, 8 th and 10 th layers are inverse residual convolution layers introducing a space attention mechanism;
fusing different feature layers of the MobileNet by using a feature pyramid network FPN through bottom-up, top-down and transverse connection to correspondingly generate a feature map;
adding different sizes to the sizes of anchor points corresponding to the characteristic diagram, and endowing each anchor point with a one-hot vector with the length of K and a vector with the length of 4, wherein K is the number of categories, 4 is the coordinate of box, and IoU is greater than 0.5, and the anchor points are taken as positive samples;
and constructing a sub-network comprising a classification sub-network, a regression sub-network and a full-connection sub-network, wherein the classification sub-network, the regression sub-network and the corresponding loss functions are similar to those of the cloud model, and the loss functions of the full-connection sub-network are calculated based on softmax loss.
Further, the crossEncopy Loss function is defined as follows:
Figure 641847DEST_PATH_IMAGE001
wherein N is the number of samples, C is the number of classes,
Figure 893837DEST_PATH_IMAGE002
is whether the ith sample belongs to class cThe tag information of (1) belongs to the c-th class, and the value is 1, otherwise is 0,
Figure 793660DEST_PATH_IMAGE003
in order to be a hyper-parameter,
Figure 422087DEST_PATH_IMAGE004
is the confidence that the ith sample predicted to belong to class c,
definition of
Figure 287537DEST_PATH_IMAGE005
Is as follows
Figure 710428DEST_PATH_IMAGE006
Coordinate vector of relative position of each prediction area and anchor point reference area
Figure 159864DEST_PATH_IMAGE007
Figure 529666DEST_PATH_IMAGE008
Coordinate vector of relative position of ith target real area and anchor point reference area
Figure 482578DEST_PATH_IMAGE009
Figure 863922DEST_PATH_IMAGE010
Wherein, the first and the second end of the pipe are connected with each other,
Figure 535075DEST_PATH_IMAGE011
representing the center coordinates;
Figure 708568DEST_PATH_IMAGE012
indicating the area border height and width;
Figure 781566DEST_PATH_IMAGE013
respectively representing the central abscissas of the prediction region, the anchor point and the artificial marked real region;
Figure 811839DEST_PATH_IMAGE014
respectively representing the central vertical coordinates of the prediction region, the anchor point and the artificially marked real region,
Figure 100002_DEST_PATH_IMAGE015
respectively represents the frame heights of a prediction area, an anchor point and an artificially marked real area,
Figure 471752DEST_PATH_IMAGE016
respectively represents the frame widths of the prediction region, the anchor point and the artificial marked real region,
the Smooth L1 Loss function is defined as follows:
Figure 100002_DEST_PATH_IMAGE017
further, the cosine similarity matching calculation formula is as follows:
Figure 245673DEST_PATH_IMAGE018
wherein, the first and the second end of the pipe are connected with each other,
Figure 173178DEST_PATH_IMAGE019
the feature vector of the anchor point information corresponding to the false detection area,
Figure 374352DEST_PATH_IMAGE020
determining false detection picture anchor points for the rest of the edge model
The corresponding feature vector.
Compared with the prior art, the invention has the beneficial effects that: the cloud edge collaborative framework is used for realizing low delay and high precision calculation, the whole framework cloud model and the edge end model are cooperatively operated, reliability and stability are achieved, large-batch false detection pictures in the same time period can be judged, the model judgment speed is high, the precision is high, and safety production of electric power can be further guaranteed in intelligent video analysis.
Drawings
The disclosure of the present invention is illustrated with reference to the accompanying drawings. It is to be understood that the drawings are designed solely for the purposes of illustration and not as a definition of the limits of the invention. In the drawings, like reference numerals are used to refer to like parts. Wherein:
fig. 1 schematically shows a cloud edge collaboration flow chart according to an embodiment of the present invention;
fig. 2 schematically shows a network structure diagram of a cloud model according to an embodiment of the invention;
FIG. 3 is a diagram schematically illustrating an edge-side model inverse residual structure backbone network according to an embodiment of the present invention;
FIG. 4 schematically shows a diagram of a proposed edge model network framework according to an embodiment of the present invention;
fig. 5 schematically shows a side end model sub-network framework diagram proposed according to an embodiment of the present invention.
Detailed Description
It is easily understood that, according to the technical solution of the present invention, a person skilled in the art can propose various alternative structural modes and implementation modes without changing the spirit of the present invention. Therefore, the following detailed description and the accompanying drawings are merely illustrative of the technical aspects of the present invention, and should not be construed as all of the present invention or as limitations or limitations on the technical aspects of the present invention.
An embodiment according to the present invention is shown in conjunction with fig. 1-5.
The image identification method based on cloud edge cooperation comprises the following steps:
s1, constructing a MobileNet detection network model by the edge terminal, carrying out forward propagation calculation on the uploaded image and correspondingly generating an anchor point
The method comprises the steps that information is obtained, a model predicts anchor point information through a sub-network, then inhibits the extraction of optimal anchor point information and outputs the optimal anchor point information, judges misjudgment according to the optimal anchor point information detected by a current picture, and uploads a misjudgment picture and the optimal anchor point information in a cloud;
s2, constructing a RetinaNet detection network model at the cloud end, and rechecking the edge-end uploaded pictures in S1 to finally output the best images
If the anchor point information is consistent with the anchor point information uploaded by the side end model, judging whether the side end model has false detection or not correspondingly, and if so, marking a false judgment picture and issuing the false judgment picture to the side end;
s3, the picture is issued in S2 by the asynchronous reasoned side model, and the anchor point information corresponding to the false detection area in the picture is extracted as the characteristics
And outputting the vectors through a sub-network, performing cosine similarity matching on the feature vectors and the feature vectors corresponding to the anchor points of the residual judgment false detection pictures, judging that the feature vectors are false judgment if the feature vectors are higher than a similarity threshold value, and asynchronously uploading the feature vectors lower than the similarity threshold value to the cloud model for rechecking.
As shown in fig. 2, for the establishment of the cloud RetinaNet detection network model:
backbone network
A backbone network ResNet50 residual error network is constructed, 5 blocks of Res1, Res2, Res3, Res4 and Res5 are sequentially constructed based on a residual error mapping method of H (x) = F (x) + x, the down-sampling rates of the 5 blocks are respectively 2^1,2^2, 2^3, 2^4 and 2^5, and generally, RetinaNet selects 3 modules as initial detection layers of Res3, Res4 and Res 5.
Feature pyramid network
The different feature layers of the ResNet50 are fused by bottom-up, top-down, and cross-connect using a feature pyramid network FPN. Characteristic maps of Res3, Res4, Res5, P3, P4, P5, P6, P7 and the like are generated respectively by top-down lines and bottom-up lines, wherein P3 to P5 are calculated by Res3 to Res5, and P6 to P7 are used for enabling the model to better detect a large object, and benefiting from a larger perception field, the operation can guarantee that each layer has proper resolution and strong semantic features and is matched with a target detection algorithm and Focal Loss, and therefore the detection performance of the object is improved.
Feature map anchor points for models
Retina Net takes advantage of the idea of regional candidate networks (RPN) in Faster R-CNN, the sizes of anchors at 5 levels P3, P4, P5, P6, P7 are 32^2 to 512^2, the ratio of the lengths and the widths of each pyramid level is {1:2,1:1,2:1}, in order to predict denser targets, the anchors of each aspect ratio are further added with three different sizes {2^0,2^1,2^2}, each level has 9 anchors in total, each anchor is assigned with a one-hot vector of length K and a vector of length 4, where K is the number of categories, the coordinates of 4-bit box are similar to RPN, and anchors of which IoU is greater than 0.5 are regarded as positive samples.
Sub-network and loss function
The classification subnetwork may predict for each Anchor the probability of the target occurring. The classification subnetwork is a small FCN attached to the FPN, 4 3 x 3 convolutions are superimposed on feature of each hierarchy, each convolution layer has C filters and is activated by ReLU, and finally 3 x 3 convolution layers of K x A filters are attached, and KA represents the probability that A frames are respectively K categories.
Finally, cross entropy Loss (Cross Entry Loss) is used for predicting categories, and hyper-parameters are introduced according to the phenomenon of unbalance of positive and negative samples
Figure 957780DEST_PATH_IMAGE021
New loss function for controlling the weight of the contribution of positive and negative samples to the overall classification loss
Figure 768348DEST_PATH_IMAGE022
The definition is as follows:
Figure 815939DEST_PATH_IMAGE023
where N is the number of samples, C is the number of classes,
Figure 188014DEST_PATH_IMAGE024
is the label information of whether the ith sample belongs to the class c, the value of the class c is 1, otherwise, the value is 0,
Figure 258738DEST_PATH_IMAGE025
is the confidence that the ith sample prediction belongs to class c.
To solve the problem of difficult sample division
Figure 374462DEST_PATH_IMAGE026
Is added with a regulating factor
Figure 778024DEST_PATH_IMAGE027
Wherein
Figure 258684DEST_PATH_IMAGE028
Is a hyper-parameter, and the Focal local function is defined as follows:
Figure 613442DEST_PATH_IMAGE029
the bounding sub-network is used for localization, which can predict the coordinate offset of each Anchor generating candidate region. The frame prediction sub-network and the classification sub-network are processed in parallel, the two structures are similar, and 4 3 × 3 convolutions are superposed on feature of each hierarchy, each convolution layer has C filters and is activated by ReLU, and finally 3 × 3 convolution layers with 4 × A filters are added, wherein 4 is prediction of frame regression 4 coordinates. In the bounding box regression task, the Loss function typically uses Smooth L1 Loss. Let ti denote the coordinate vector of the relative position of the ith prediction region and the Anchor reference region
Figure 470539DEST_PATH_IMAGE030
Wherein x, y, w, h respectively represent the x coordinate and y coordinate of the center of the prediction region and the width and height,
Figure 227143DEST_PATH_IMAGE031
coordinate vector representing relative position of ith target real area and Anchor reference area
Figure 941021DEST_PATH_IMAGE032
Figure 547189DEST_PATH_IMAGE033
Wherein, the first and the second end of the pipe are connected with each other,
Figure 942398DEST_PATH_IMAGE034
which represents the coordinates of the center of the circle,
Figure 553508DEST_PATH_IMAGE035
indicating the height and width of the region's bounding box,
Figure 641550DEST_PATH_IMAGE036
respectively representing the central abscissas of the predicted area, Anchor and the artificially marked real area,
Figure 970900DEST_PATH_IMAGE037
respectively represents the central vertical coordinates of the prediction region, Anchor and the artificially marked real region,
Figure 232117DEST_PATH_IMAGE015
respectively represents the frame heights of the prediction region, Anchor and the artificially marked real region,
Figure 199198DEST_PATH_IMAGE016
and the frame widths of the prediction region, the Anchor and the artificially marked real region are respectively represented.
The Smooth L1 Loss is defined as follows:
Figure 458141DEST_PATH_IMAGE038
as shown in fig. 3, 4 and 5, for the establishment of the detection network model based on MobileNet at the edge:
backbone network
The specific steps of the whole construction of the neural network structure are as follows:
build the first layer of the neural network, the zeroth layer being convolutional layer (conv 2d _ 1): the convolution kernel size is 3 × 3, the number of kernels is 32, and the step size is 2. An input image having an input size of 416 × 416 × 3 is subjected to convolution processing, and the output image size is 208 × 208 × 32.
Building a second layer of the neural network, the second layer being a convolutional layer (conv 2d _ 2): the convolution kernel size is 3 × 3, the number of kernels is 64, and the step size is 2. An input image having a size of 208 × 208 × 32 is subjected to convolution processing, and the output image has a size of 208 × 208 × 64.
A third layer of the neural network is constructed, which is an inverse residual convolution layer (bneck _ 1): the inverse residual convolution includes 2 1 × 1 convolutions and 13 × 3 convolutions, each convolution layer being followed by a BN layer and a ReLU activation function. After the characteristic diagram of 208 multiplied by 64 is subjected to inverse residual convolution, the size of the output characteristic diagram is 208 multiplied by 64, and the output characteristic diagram is output to bneck _ 2;
a fourth layer of the neural network is constructed, which is an inverse residual convolution layer (bneck _ 2): the inverse residual convolution includes 2 1 × 1 convolutions and 13 × 3 convolutions, each convolution layer being followed by a BN layer and a ReLU activation function. After the characteristic diagram of 208 multiplied by 64 is subjected to inverse residual convolution, the size of the output characteristic diagram is 104 multiplied by 128, and the output characteristic diagram is output to bneck _ 3;
fifth layer of the neural network is constructed, and fifth layer is an inverse residual convolution layer (bneck _ 3): the feature map with the size of 104 × 104 × 128 is subjected to inverse residual convolution, and the output feature map with the size of 52 × 52 × 256 is output to samBnegk _ 1.
And constructing a sixth layer of the neural network, wherein the sixth layer is sam inverse residual convolution (samBnegk _ 1): the feature map with the size of 52 × 52 × 256 is subjected to sam inverse residual convolution, and the output feature map with the size of 52 × 52 × 256 is output to bneck _ 4.
Constructing a seventh layer of the neural network, the seventh layer being an inverse residual convolutional layer (bneck _ 4): the feature map with the size of 52 × 52 × 256 is subjected to inverse residual convolution, and the output feature map with the size of 26 × 26 × 512 is output to samBnegk _ 2.
And constructing an eighth layer of the neural network, wherein the eighth layer is sam inverse residual convolution (samBnegk _ 2), the feature map with the size of 26 multiplied by 512 outputs the feature map with the size of 26 multiplied by 512 after the sam inverse residual convolution, and bneck _5 is output.
And constructing a ninth layer of the neural network, wherein the ninth layer is an inverse residual convolution layer (bneck _ 5), the feature map with the size of 26 multiplied by 512 outputs a feature map with the size of 13 multiplied by 1024 after inverse residual convolution, and samBeck _3 is output.
And constructing a tenth layer of the neural network, wherein the tenth layer is sam inverse residual convolution 1 (samBnegk _ 3), the feature map with the size of 13 multiplied by 1024 outputs a feature map with the size of 13 multiplied by 1024 after the sam inverse residual convolution, and bneck _6 is output.
In the whole 10 layers of convolutional neural network, the layers 3, 4, 5, 7 and 9 are inverse residual convolutional layers. Inverse residual structure (Inverted Residuals) in the inverse residual convolutional layer as shown in fig. 3, residual concatenation is performed if and only if the input and output have the same number of channels.
In the entire 10-layer convolutional neural network, layers 6, 8 and 10 are inverse residual convolutional layers introducing a Spatial Attention mechanism (Spatial Attention Module). In all of these 3 layers we add a spatial attention module to the inverse residual convolution layer, which is shown in fig. 4. The module extracts three-dimensional features from the feature extraction network
Figure 274788DEST_PATH_IMAGE039
As an input, a two-dimensional vector is generated that represents the degree of importance of each region. Considering that the weight information of the local features cannot be only referred to the features of the current region, but also the context information should be considered, the network does not directly adopt the convolution of 1 × 1, but uses the two-dimensional convolution pair with the convolution kernel size of 3
Figure 74116DEST_PATH_IMAGE039
And reducing the dimension to enable the output channel to be 1/r of the original one until the output channel is smaller than r.
Feature pyramid network
And fusing different feature layers of the MobileNet by using a feature pyramid network FPN through bottom-to-top, top-to-bottom and transverse connection. The top-down and bottom-up lines respectively generate samBnegk _1, samBnegk _2, samBnegk _3, P1, P2 and P3 feature maps, wherein P1, P2 and P3 are calculated by samBnegk _1, samBnegk _2 and samBnegk _ 3. Due to the fact that the larger receptive field is obtained, the operation can guarantee that each layer has proper resolution and strong semantic features, and therefore the detection performance of the object is improved.
Feature map anchor points for models
The sizes of anchors corresponding to 3 levels P1, P2 and P3 are respectively 13^2, 26^2 and 52^2, the length-width ratio of each pyramid level is {1:2,1:1,2:1}, in order to predict denser objects, the anchors of each aspect ratio are further added with three different sizes {2^0,2^1 and 2^2}, each level has 9 anchors in total, each anchor is endowed with a one-hot vector with the length of K and a vector with the length of 4, wherein K is the category number, the coordinate of 4-bit box is similar to RetinaNet, and the anchors with the length of IoU more than 0.5 are regarded as positive samples.
Sub-networks
Compared with RetineNet, MobileNet further adds a fully connected sub-network for embedding space learning on the basis of the existing classification sub-network and regression sub-network. The classification and regression sub-networks and the corresponding penalty functions are similar to those of RetinaNet and are not described in detail here. The fully-connected sub-network structure is shown in fig. 5, a prediction head is converted into a 1-dimensional vector through a scatter operation, and a fully-connected network is used for converting a one-dimensional vector after the scatter into a 128-dimensional vector to further learn an embedding space.
Loss function
For a fully connected sub-network, its loss function is based on the additive angular margin loss modified by softmax loss.
The softmax loss function is as follows,
Figure 597502DEST_PATH_IMAGE040
wherein m is the number of samples, n is the number of classes,
Figure 89663DEST_PATH_IMAGE041
is the feature vector of the ith sample,
Figure 626561DEST_PATH_IMAGE042
is the category to which the ith sample belongs,
Figure 167264DEST_PATH_IMAGE043
is a weight vector for the j-th class,
Figure 873052DEST_PATH_IMAGE044
is the bias term of class j.
The offset bj is first set to 0, and then the inner product of the weight and the input is represented by the following equation,
Figure 208218DEST_PATH_IMAGE045
when in use
Figure 999457DEST_PATH_IMAGE046
Regularization processing
Figure 140588DEST_PATH_IMAGE047
So that
Figure 638565DEST_PATH_IMAGE048
Figure 708415DEST_PATH_IMAGE049
The regularization is to
Figure 986950DEST_PATH_IMAGE050
Each value in the vector is divided by
Figure 869455DEST_PATH_IMAGE051
Thereby obtaining a new
Figure 346573DEST_PATH_IMAGE052
New, by
Figure 289121DEST_PATH_IMAGE052
The modulus of (2) is 1. From equation (1) the following equation can be derived,
Figure 287907DEST_PATH_IMAGE053
then on the one hand input
Figure 770841DEST_PATH_IMAGE054
Also use
Figure 243411DEST_PATH_IMAGE055
Regularizing, and multiplying by a scale parameter s; on the other hand will
Figure 153598DEST_PATH_IMAGE056
By using
Figure 141146DEST_PATH_IMAGE057
This part is the core of the MobileNet detection network, the formula is also very simple, and m is 0.5 by default. Thus, the following formula (4) is obtained,
Figure 929235DEST_PATH_IMAGE058
namely, an additive angular margin loss. In the constraint (5), the first two are exactly for the weights and input features
Figure 256312DEST_PATH_IMAGE059
The process of the regularization is carried out,
Figure 602979DEST_PATH_IMAGE060
subject to
Figure 77823DEST_PATH_IMAGE061
after the cloud side end models are respectively established, the specific detection steps are as follows:
for edge detection:
the method comprises the steps that a side end receives monitoring images uploaded by video equipment, the images are input into a trained MobileNet detection model, forward propagation calculation is carried out on the model, high-level and bottom-level semantic fusion is carried out on feature maps of different scales generated by forward propagation on the basis of an FPN structure, feature maps of 3 different scales are generated, and corresponding 9 pieces of anchor point information are generated in 5 different feature maps;
the feature graph and the anchor point information respectively enter a classification sub-network and a regression sub-network, the classification sub-network predicts the category information of the anchor point, the regression sub-network predicts the position information of the anchor point, and the full-connection sub-network is used for predicting 128 feature vectors of an embedding space, so that the extraction of the feature of the detection target is realized;
and if the current picture detects the target, the edge terminal uploads the current picture to the cloud for rechecking through the network and judges whether the current picture has false detection or not.
After the cloud receives the recheck picture:
the cloud receives images uploaded by the edge, the images are input into a trained RetinaNet detection model, forward propagation calculation is carried out on the model, high-level and bottom-level semantic fusion is carried out on feature maps of different scales generated by forward propagation on the basis of an FPN structure, feature maps of 3 different scales are generated, and corresponding 9 pieces of anchor point information are generated in 5 different feature maps;
the feature map and the anchor point information respectively enter a classification subnetwork and a regression subnetwork, the classification subnetwork predicts the class information of the anchor points, the regression subnetwork predicts the position information of the anchor points, all the anchor points are subjected to non-maximum suppression, the optimal target detection anchor point is extracted, and the rest anchor points are ignored;
outputting the final predicted target and position information of the RetinaNet detection model, judging whether the final predicted target and position information of the RetinaNet detection model are consistent with the target and position information uploaded by the MobileNet detection model on the side end, if not, judging that the prediction result of the MobileNet detection model on the side end is misjudged, namely the detection result of the current picture is misjudged on the side end, and marking the picture as misjudgment by the cloud end and issuing the picture to the side end through the network.
After the edge end asynchronously receives the photo and the false detection information issued by the cloud end:
the edge asynchronously reasons the photos sent by the cloud, extracts the false detection area as 128-dimensional features, asynchronously stores the 128-dimensional features and the marking information, outputs 128-dimensional feature vectors, and outputs the 128-dimensional feature vectors through a full-connection sub-network;
similarity matching, namely if the output of the MobileNet detection model has a position target and position information, outputting the 128-dimensional characteristic vector of the MobileNet detection model network
Figure 840243DEST_PATH_IMAGE062
Misdetection feature vector with edge storage
Figure 84142DEST_PATH_IMAGE063
Cosine similarity matching is carried out, and a similarity matching calculation formula is as follows
Figure 100246DEST_PATH_IMAGE064
And setting a false alarm threshold value to be 0.6, in the subsequent picture detection process, if the matched similarity value is higher than the threshold value, considering the current detected picture as false alarm, if the matched similarity value is lower than the threshold value, judging that the target is identified by the side end, and simultaneously, asynchronously uploading the picture to the cloud for rechecking.
The technical scope of the present invention is not limited to the above description, and those skilled in the art can make various changes and modifications to the above-described embodiments without departing from the technical spirit of the present invention, and such changes and modifications should fall within the protective scope of the present invention.

Claims (5)

1. The image identification method based on cloud edge-end cooperation is characterized by comprising the following steps:
s1, constructing a MobileNet detection network model by the edge terminal, carrying out forward propagation calculation on the uploaded image and correspondingly generating an anchor point
The method comprises the steps that information is obtained, a model predicts anchor point information through a sub-network, then inhibits the extraction of optimal anchor point information and outputs the optimal anchor point information, judges misjudgment according to the optimal anchor point information detected by a current picture, and uploads a misjudgment picture and the optimal anchor point information in a cloud;
s2, constructing a RetinaNet detection network model at the cloud end, and rechecking the edge-end uploaded pictures in S1 to finally output the best images
If the anchor point information is consistent with the anchor point information uploaded by the side end model, judging whether the side end model has false detection or not correspondingly, and if so, marking a false judgment picture and issuing the false judgment picture to the side end;
s3, issuing pictures in the asynchronous reasoned S2 of the edge-side model, extracting anchor point information corresponding to the false detection area in the pictures into a feature vector and outputting the feature vector through a sub-network, performing cosine similarity matching on the feature vector and the feature vector corresponding to the anchor point of the residual judged false detection pictures, judging whether the feature vector is higher than a similarity threshold value as false judgment, and asynchronously uploading the image lower than the similarity threshold value to the cloud-side model for reexamination.
2. The image recognition method based on cloud-edge collaboration as claimed in claim 1, wherein the cloud model construction step is as follows:
constructing a ResNet50 residual network of a backbone network;
using a feature pyramid network FPN to connect different feature levels of ResNet50 through bottom-up, top-down, and cross-connect
Fusing and correspondingly generating a feature map;
adding different sizes to the anchor point sizes corresponding to the characteristic diagram, and giving each anchor point a one-hot vector with the length of K and a vector with the length of 4, wherein K is the number of categories, 4 is the coordinate of box, and IoU is greater than 0.5 and is regarded as a positive sample;
and constructing a sub-network comprising a classification sub-network for predicting the occurrence probability of the target and a frame sub-network for predicting the coordinate offset of the anchor point generation candidate area, wherein the classification sub-network Loss function is calculated by cross entry Loss and the frame sub-network Loss function is calculated by Smooth L1 Loss.
3. The image recognition method based on cloud-edge collaboration as claimed in claim 1 or 2, wherein the edge model construction step is as follows:
sequentially constructing 1-10 layers of convolutional neural networks, wherein the 1 st layer and the 2 nd layer are reduced in dimension through two-dimensional convolution with the convolution kernel size of 3,
the 3 rd, 4 th, 5 th, 7 th and 9 th layers are inverse residual convolution layers, and the 6 th, 8 th and 10 th layers are inverse residual convolution layers introducing a space attention mechanism;
fusing different feature layers of the MobileNet by using a feature pyramid network FPN through bottom-up, top-down and transverse connection to correspondingly generate a feature map;
adding different sizes to the anchor point sizes corresponding to the characteristic diagram, and giving each anchor point a one-hot vector with the length of K and a vector with the length of 4, wherein K is the number of categories, 4 is the coordinate of box, and IoU is greater than 0.5 and is regarded as a positive sample;
and constructing a sub-network comprising a classification sub-network, a regression sub-network and a full-connection sub-network, wherein the classification sub-network, the regression sub-network and the corresponding loss functions are similar to those of the cloud model, and the loss functions of the full-connection sub-network are calculated based on softmax loss.
4. The image recognition method based on cloud-edge collaboration as claimed in claim 2, wherein: the crossEntrol Loss function is defined as follows:
Figure DEST_PATH_IMAGE001
wherein N is the number of samples, C is the number of classes,
Figure DEST_PATH_IMAGE002
is the label information of whether the ith sample belongs to the class c, the value of the class c is 1, otherwise, the value is 0,
Figure DEST_PATH_IMAGE003
in order to be a hyper-parameter,
Figure DEST_PATH_IMAGE004
is the confidence that the ith sample predicted to belong to class c,
definition of
Figure DEST_PATH_IMAGE005
Is a first
Figure DEST_PATH_IMAGE006
Coordinate vector of relative position of each prediction area and anchor point reference area
Figure DEST_PATH_IMAGE007
Figure DEST_PATH_IMAGE008
Coordinate vector of relative position of ith target real area and anchor point reference area
Figure DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE010
Wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE011
representing the center coordinates;
Figure DEST_PATH_IMAGE012
indicating the area border height and width;
Figure DEST_PATH_IMAGE013
respectively representing the central abscissa of a prediction region, an anchor point and an artificially marked real region;
Figure DEST_PATH_IMAGE014
respectively representing the central vertical coordinates of the prediction region, the anchor point and the artificially marked real region,
Figure DEST_PATH_IMAGE015
respectively represents the frame heights of a prediction area, an anchor point and an artificially marked real area,
Figure DEST_PATH_IMAGE016
respectively represents the frame widths of the prediction region, the anchor point and the artificial marked real region,
the Smooth L1 Loss function is defined as follows:
Figure DEST_PATH_IMAGE017
5. the image recognition method based on cloud-edge cooperation according to claim 1, wherein the cosine similarity matching calculation formula is as follows:
Figure DEST_PATH_IMAGE018
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE019
the feature vectors of the anchor information corresponding to the false detection regions,
Figure DEST_PATH_IMAGE020
determining false detection picture anchor points for the rest of the edge model
The corresponding feature vector.
CN202210850570.4A 2022-07-20 2022-07-20 Image identification method based on cloud edge cooperation Active CN114926667B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210850570.4A CN114926667B (en) 2022-07-20 2022-07-20 Image identification method based on cloud edge cooperation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210850570.4A CN114926667B (en) 2022-07-20 2022-07-20 Image identification method based on cloud edge cooperation

Publications (2)

Publication Number Publication Date
CN114926667A true CN114926667A (en) 2022-08-19
CN114926667B CN114926667B (en) 2022-11-08

Family

ID=82815564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210850570.4A Active CN114926667B (en) 2022-07-20 2022-07-20 Image identification method based on cloud edge cooperation

Country Status (1)

Country Link
CN (1) CN114926667B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115934298A (en) * 2023-01-12 2023-04-07 南京南瑞信息通信科技有限公司 Front-end and back-end cooperation electric power monitoring MEC unloading method, system and storage medium
CN116055338A (en) * 2023-03-28 2023-05-02 杭州觅睿科技股份有限公司 False alarm eliminating method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784685A (en) * 2020-07-17 2020-10-16 国网湖南省电力有限公司 Power transmission line defect image identification method based on cloud edge cooperative detection
CN111967305A (en) * 2020-07-01 2020-11-20 华南理工大学 Real-time multi-scale target detection method based on lightweight convolutional neural network
CN113408087A (en) * 2021-05-25 2021-09-17 国网湖北省电力有限公司检修公司 Substation inspection method based on cloud side system and video intelligent analysis
CN113989209A (en) * 2021-10-21 2022-01-28 武汉大学 Power line foreign matter detection method based on fast R-CNN
WO2022082692A1 (en) * 2020-10-23 2022-04-28 华为技术有限公司 Lithography hotspot detection method and apparatus, and storage medium and device
CN114697324A (en) * 2022-03-07 2022-07-01 南京理工大学 Real-time video analysis and processing method based on edge cloud cooperation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967305A (en) * 2020-07-01 2020-11-20 华南理工大学 Real-time multi-scale target detection method based on lightweight convolutional neural network
CN111784685A (en) * 2020-07-17 2020-10-16 国网湖南省电力有限公司 Power transmission line defect image identification method based on cloud edge cooperative detection
WO2022082692A1 (en) * 2020-10-23 2022-04-28 华为技术有限公司 Lithography hotspot detection method and apparatus, and storage medium and device
CN113408087A (en) * 2021-05-25 2021-09-17 国网湖北省电力有限公司检修公司 Substation inspection method based on cloud side system and video intelligent analysis
CN113989209A (en) * 2021-10-21 2022-01-28 武汉大学 Power line foreign matter detection method based on fast R-CNN
CN114697324A (en) * 2022-03-07 2022-07-01 南京理工大学 Real-time video analysis and processing method based on edge cloud cooperation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115934298A (en) * 2023-01-12 2023-04-07 南京南瑞信息通信科技有限公司 Front-end and back-end cooperation electric power monitoring MEC unloading method, system and storage medium
CN115934298B (en) * 2023-01-12 2024-05-31 南京南瑞信息通信科技有限公司 Front-end and back-end collaborative power monitoring MEC unloading method, system and storage medium
CN116055338A (en) * 2023-03-28 2023-05-02 杭州觅睿科技股份有限公司 False alarm eliminating method, device, equipment and medium
CN116055338B (en) * 2023-03-28 2023-08-11 杭州觅睿科技股份有限公司 False alarm eliminating method, device, equipment and medium

Also Published As

Publication number Publication date
CN114926667B (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN112308019B (en) SAR ship target detection method based on network pruning and knowledge distillation
CN114926667B (en) Image identification method based on cloud edge cooperation
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
Bakkay et al. BSCGAN: Deep background subtraction with conditional generative adversarial networks
Kim et al. High-speed drone detection based on yolo-v8
CN114462555B (en) Multi-scale feature fusion power distribution network equipment identification method based on raspberry group
KR20200007084A (en) Ship detection method and system based on multi-dimensional features of scene
CN108537824B (en) Feature map enhanced network structure optimization method based on alternating deconvolution and convolution
CN109063549B (en) High-resolution aerial video moving target detection method based on deep neural network
CN113469050A (en) Flame detection method based on image subdivision classification
CN111768388A (en) Product surface defect detection method and system based on positive sample reference
CN113536972B (en) Self-supervision cross-domain crowd counting method based on target domain pseudo label
CN111753677A (en) Multi-angle remote sensing ship image target detection method based on characteristic pyramid structure
CN116485717B (en) Concrete dam surface crack detection method based on pixel-level deep learning
KR102391853B1 (en) System and Method for Processing Image Informaion
CN115862066A (en) Improved YOLOv5 lightweight community scene downlight detection method
CN111753732A (en) Vehicle multi-target tracking method based on target center point
CN109447014A (en) A kind of online behavioral value method of video based on binary channels convolutional neural networks
CN110503049B (en) Satellite video vehicle number estimation method based on generation countermeasure network
Yang et al. C-RPNs: Promoting object detection in real world via a cascade structure of Region Proposal Networks
CN111507353A (en) Chinese field detection method and system based on character recognition
Li et al. Fire flame image detection based on transfer learning
JP2020064364A (en) Learning device, image generating device, learning method, and learning program
CN114120076B (en) Cross-view video gait recognition method based on gait motion estimation
CN116912670A (en) Deep sea fish identification method based on improved YOLO model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Image recognition method based on cloud edge collaboration

Granted publication date: 20221108

Pledgee: Hefei high tech Company limited by guarantee

Pledgor: ANHUI JUSHI TECHNOLOGY CO.,LTD.

Registration number: Y2024980013371

PE01 Entry into force of the registration of the contract for pledge of patent right