CN110009679A - A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks - Google Patents

A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks Download PDF

Info

Publication number
CN110009679A
CN110009679A CN201910148554.9A CN201910148554A CN110009679A CN 110009679 A CN110009679 A CN 110009679A CN 201910148554 A CN201910148554 A CN 201910148554A CN 110009679 A CN110009679 A CN 110009679A
Authority
CN
China
Prior art keywords
feature
layer
gradient
classification
pyramid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910148554.9A
Other languages
Chinese (zh)
Other versions
CN110009679B (en
Inventor
孙俊
周以鹏
吴豪
吴小俊
方伟
陈祺东
李超
游琪
冒钟杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Uni Entropy Intelligent Technology Wuxi Co Ltd
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN201910148554.9A priority Critical patent/CN110009679B/en
Publication of CN110009679A publication Critical patent/CN110009679A/en
Application granted granted Critical
Publication of CN110009679B publication Critical patent/CN110009679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks, belongs to computer vision field.The problems such as this method is lacked, is marked without positioning for data set label segments many in practical application, propose the Weakly supervised localization method based on Analysis On Multi-scale Features convolutional neural networks, its core concept utilizes the characteristic of neural network layering, it is mapped on multilayer convolutional layer using gradient weighting Class Activation, generate grad pyramid model, and feature centroid position is calculated by mean filter, subtract the pixel fragment that module generates connection using confidence intensity mapping and threshold value ladder, carries out Weakly supervised positioning around maximum boundary mark.It is on standard testing collection the experimental results showed that, algorithm can be completed there are a large amount of classifications, multi-scale image target positioning, accuracy with higher.

Description

A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks
Technical field
The invention belongs to computer vision fields, and in particular to a kind of target based on Analysis On Multi-scale Features convolutional neural networks Localization method.
Background technique
Target positioning is important one of the research direction of computer vision field.The purpose of target positioning is to determine a mesh The position of mark in the picture.Localization method common at present is to be believed using supervised learning algorithm according to the classification of target and position Target positioning is completed in breath training in test set.In many practical applications, such as small target deteection, traffic target, multi-modal mesh In the tasks such as mark detection, medical target detection, data shortage, mark missing is numerous, is unable to satisfy neural network detection detection and appoints The demand of business.And in such applications, otherness is larger between categories of datasets, and the mark missing of partial target is serious dirty Background characteristics space has been contaminated, classifier is difficult to differentiate between out the otherness of known class and current goal, and easy mistake is divided into background classes, To obscure the judgement of monitor model, this results in the accuracy of model.
The improvement of algorithm is from two aspects.In character representation level, with the development of deep learning, Feature Engineering is got over Come it is huger, in face of complex environment and target information it is weaker in the case where, effective target signature, merge various dimensions, more rulers It is highly important that degree feature, which improves object module,.From model method level, Weakly supervised learning method is independent of target mark Label lack when data set marks, when detection classification but data set scale are not enough to complete to train known to data set, are easy to be extended to In new object class.Researcher has carried out many work in the character representation level of image object.Early stage is in image procossing Characteristic point abundant and corresponding Feature Descriptor are stablized for that can extract on textured example goal, texture object in field Body can be accurately identified and be detected based on these characteristic points and Feature Descriptor.As SIFT algorithm, other identification features are retouched State sub- PCA-SIFT algorithm, SURF algorithm.Subsequent Dalal et al. proposes to make using image local gradient orientation histogram (HOG) It is characterized, carries out pedestrian detection as classifier using support vector machines (SVM), manual feature request designer possesses more Professional domain knowledge.With the development of neural network and deep learning, Ross Girshick et al. proposes R-CNN, Fast- RCNN and Faster-RCNN series of algorithms constructs feature hierarchy structure abundant for accurate mesh using convolutional neural networks Mark detection and semantic segmentation have only used the output of the last layer characteristic pattern, but have not made full use of target Analysis On Multi-scale Features.He Et al. propose SPP-NET after the last layer convolution, be added space pond layer so that the characteristic pattern of arbitrary size can be converted At the feature vector of fixed size.Liu et al. proposes that SSD Web vector graphic single order detection structure, Analysis On Multi-scale Features figure are predicted, Detection accuracy is improved, but does not utilize bottom feature while having lacked the mutual building between different characteristic figure layer.Lin etc. People proposes feature pyramid model, on the basis of combining Analysis On Multi-scale Features figure, joined on low-level image feature figure and characteristic pattern Sampling fusion, more perfect object module.
Using Weakly supervised localization method equally there are many research in convolutional neural networks, entire figure is only used only in this method As class label positions the object in image.In recent years, Vinyals et al. proposes Class Activation mapping method (Class Activation Map, CAM), this method has modified the convolutional neural networks framework of image classification, flat with convolutional layer and the overall situation Equal pondization replaces full articulamentum, the disadvantage is that the network architecture requirement Feature Mapping needs before layer of classifying, causes in addition to dividing General network configuration is likely lower than outside generic task.Lu et al. using global maximum pondization and logarithm summarize pond have studied it is similar Method.Selvaraju et al. introduces gradient signal assemblage characteristic mapping (Gradient on the basis of Class Activation maps Class Activation Map, Grad-CAM) method, do not need to modify to the primary network architecture, using more The gradient and Fusion Features of scale feature.Other methods carry out target positioning using the disturbance of classification input picture.Zeiler and Fergus et al. is by blocking patch and being classified shielded image come disturbance input, when these objects are blocked, usually The classification score that will lead to related object reduces.Quab et al. classifies to many patches comprising a pixel, then right The classification score of these patches is averaged, and to provide the classification score of pixel, operation includes that multiple forward and backward calculates, effect Rate is lower.Zhang et al. introduces the marginal winning probability (c-MWP) of comparison, for simulating the nerve point that can protrude distinguishable region The top-down attention of class model, is only applicable to image classification task, and target positioning is poor.
Summary of the invention
The present invention is intended to provide a kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks, this method are bases In the end-to-end Weakly supervised location algorithm of Analysis On Multi-scale Features, make full use of deep neural network Analysis On Multi-scale Features, by gradient plus Class Activation mapping is weighed, grad pyramid model is generated, generates grad pyramid for each prediction classification, and pass through mean filter Feature centroid position is calculated, subtracts the pixel fragment that module generates connection using confidence intensity mapping and threshold value ladder, surrounds maximum boundary Carry out Weakly supervised positioning.It is fixed to show that algorithm can complete precision target in the case where providing less label by multiple experiments Position, performance are better than other methods.
Technical solution of the present invention:
A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks, steps are as follows:
The single scale image of arbitrary size is inputted convolutional neural networks ConNet by step 1, utilizes feature pyramid model And gradient class Mean mapping Grad-CAM algorithm, calculate the cross entropy error L of classificationcross-entrop, calculate corresponding guidance Back-propagation gradientEach layer of convolutional neural networks ConNet output is { C2,C3,...,Cl, it is rolled up by trunk Product network query function predicts classification c, for the score y of classificationc, the size w*h of input picture I;Its multilayer feature figure and output phase Correspond to { F2,F3,...,Fl}。
Step 2, the weights of importance for calculating each layerPixel-level spatial-intensity and benefit are calculated on multilayer feature figure With ReLU activation primitive
Step 3 is directed to every layer of grad pyramid, carries out up-sampling and lateral connection operation, finds out superimposed intensity, i.e.,
Step 4, for superimposedAfter calculating thermodynamic chart, global peak γ is calculated, is contracted with zoom factor σ It puts, as local maxima threshold value.It is corresponding to calculate Largest Mean filter to each thermodynamic chart application maximal filter and minimum filters After waveIt is filtered with minimum meanAnd difference thermodynamic chart is calculated, the constant pixel of difference is set 0, with Obtain the Probability Area with local maxima mass center.
Step 5, by repeatedly expanding, generate multiple candidate points, find out best mass center, then utilize the global peaks after scaling Value carries out ladder and subtracts.
Step 6, the maximum boundary after terraced subtract, select the coordinate [xmin, ymin, xmax, ymax] of maximum rectangle frame. Export the target prediction classification D of all imagesclassWith coordinate intersection Dloc
Beneficial effects of the present invention: it is rare in data set missing, mark in order to improve, widely apply scene to lack target fixed Position Information Problems, propose a kind of Weakly supervised target location algorithm based on grad pyramid.The present invention using gradient passback and Neural network structure constructs grad pyramid, is subtracted on the basis of only classification information by threshold value ladder and completes target positioning, calculated There are two advantages for method: 1) making full use of multiple dimensioned depth characteristic information, realize to intentional shallow structure and Deep Semantics information Fusion Features;2) by finding suitable characteristics mass center, strategy is subtracted with threshold value ladder and accurately completes target location tasks.By The comparison of algorithm on data set shows that the algorithm can effectively utilize Analysis On Multi-scale Features information, improves and appoints in Weakly supervised positioning Performance in business has preferable generalization.The goal in research of next step is to design adaptive threshold strategy and height based on classification The Weakly supervised non-maxima suppression of robust, to solve the Weakly supervised target positioning of no location tags.
Detailed description of the invention
Fig. 1 is the Weakly supervised positioning network frame of target based on grad pyramid.
Fig. 2 is grad pyramid.
Fig. 3 is Weakly supervised positioning flow figure.
Fig. 4 is experiment effect figure.Wherein, (a-1) -- (a-4) is after original image pre-processes, and (b-1) -- (b-4) is prediction Classification guiding passback Error Graph, (c-1) -- (c-4) are that grad pyramid generates thermodynamic chart, (d-1) -- the gradient of (d-4) guiding Pyramid, (e-1) -- the Weakly supervised estimation range (e-4) returns frame.
Fig. 5 is experiment prediction block and true tag effect picture.Wherein, (a)-(h) is respectively the experiment prediction block of 8 kinds of targets With true tag.
Fig. 6 is PASCAL VOC2012 contrast and experiment figure.
Fig. 7 is the target locating effect figure of fine grit classification.Wherein, (a-1) -- (a-3) is (b- after original image pretreatment 1) -- (b-3) is grad pyramid thermodynamic chart, and (c-1) -- (c-3) is that Weakly supervised estimation range returns frame.
Specific embodiment
Technical solution of the present invention is further detailed below in conjunction with specific embodiments and the drawings.
1. data set and evaluation index
(1)ImageNet-ILSVRC2012
ImageNe data set is to promote the development of computer image recognition technology and set up one large-scale picture number According to collection, every year creates extensive visual identity challenge match-ILSVRC.Image can be applied to image classification, and target positions, Target detection, video object detection, a variety of Computer Vision Tasks such as scene classification, image include that with clearly defined objective classification marks With the mark of objects in images position.The LSVRC2012 that we use is the announcement data sets in 2012 of ImageNet, includes 1000 classifications, each classification chooses about 1000 pictures, wherein have 1,200,000 trained pictures, 50,000 verifying pictures and 150,000 Open test picture.We carry out the verifying of Weakly supervised target location tasks using verifying collection, and experimental evaluation index is divided into Top1 mistake Difference, Top5 error, specially the prediction first kind and the classification of first five classification target and location error.Wherein yiFor correct sample, m For total number of samples, D is sample set.
Wherein, location error is to hand over and than 0.5 for the positive negative sample of threshold determination.
Wherein RpredFor estimation range region.RgtFor actual range region.The lower its numerical value of error the better.
(2) PASCAL VOC2012 data set
PASCAL VOC (Visual Object Classes) racing data collection is mainly used for target identification, provides Data set includes 20 type objects.Picture pixels size is different.Training and verifying collection data have 11,530 images, include 27,450 mark objects and 6,929 semantic segmentations.
2. parameter setting
Experiment is based on pytouch depth library, and hardware configuration is Centos operating system, and processor is Intel Xeon E5, Video card is Nvidia-tesla-K80, inside saves as 64G.Picture pretreatment is 224*224, port number 3, and on three channels With average value [0.485,0.456,0.406], standard deviation [0.229,0.224,0.225] is normalized.In VGG-19 network Upper to use the 8th, 17,26,35 layer of [conv2, conv3, conv4, the conv5] as convolutional network, output is characterized size Respectively [56*56,28*28,14*14,7*7].ImageNet data set threshold value ladder subtracting coefficient is set as 0.85, VOC data set and sets It is 0.75.
3. grad pyramid
Feature pyramid model is the module for detecting the target of different convolutional layers in depth network.Utilize convolution mind Pyramid feature level through network is had semantic structure from low to high between level, is constructed in the whole process with this The feature pyramid for having high-level semantics.Method is using the single scale image of arbitrary size as input, and in a manner of full convolution Export the Feature Mapping of the appropriate size of multilayer.For process independently of main convolutional coding structure, pyramid structure mainly includes two aspects, the On the one hand the path from bottom to top on feedforward calculates calculates the Analysis On Multi-scale Features for being 2 by scale step-length and maps the spy formed Level is levied, meanwhile, select the output of the last layer in each stage as Feature Mapping reference set.For ResNet network, make The feature activation exported with the residual error in each stage.For different convolutional layer CONV2, CONV3, CONV4, CONV5, residual block Output is { C2,C3,C4,C5, step-length is respectively { S2,S3,S4,S5A pixel.Second aspect is top-down on characteristic pattern The lateral connection in path and feature interlayer.Still semantic information is stronger for high-level characteristic relative coarseness, passes through top-down path It is mapped with lateral connection Enhanced feature, carries out more accurate positioning.Up-sampling is 2 times in spatial resolution, then passes through member Element, which is added, merges up-sampling information with current layer information.Iteration completes this process, until pyramid construction.Feature Mapping collection For { P2,P3,P4,P5, correspond to { C2,C3,C4,C5, it is respectively provided with identical size.
4.Grad-CAM algorithm
In gradient class Mean mapping algorithm (Grad-CAM), because convolutional neural networks can capture deeper view Feel structure, the gradient information for inputting network the last one convolutional layer is understood into each neuron for target determines and is important Property.Positioning figure is differentiated in order to obtain the classification of the width u and height ν of any classification cFirst Calculate the gradient score of each classification c, i.e. ycFor the characteristic pattern A of convolutional layerkLocal derviation, k is characterized every height in figure Block, i.e.,These gradients are handled by global average pond, obtain neuron weights of importance
The weightIllustrate the neural network structure after being linearized, obtain characteristic pattern k for target category c for Importance.Later, algorithm is carried out before the weighting of characteristic pattern using ReLU activation primitive to activation:
By ReLU function, algorithm, which only focuses on, has actively target category The feature of influence, that is, the intensity for increasing pixel are equal to the judgement confidence level for increasing class label, and negative pixel may belong to figure Other classifications as in.Algorithm provides the method for visualizing of Pixel-level spatial gradient, has the discriminating power of fine granularity feature. On the other hand, algorithm up-samples input picture by bilinearity difference, recycles point-by-point multiplication, will be oriented to backpropagation It is fused together with Grad-CAM visualization.Method has the discriminating power of target local feature and classification.
5. grad pyramid model
The basic framework of algorithm is as shown in Figure 1.In order to fit in complex environment sum number that may be present in target positioning According to condition, for example, data information amount it is few, without in the visual tasks such as mark.We are based on the characteristics of convolutional network structure, With inherent multi-Scale Pyramid shape, feature level is successively calculated.Its method is not concerned only with deep layer language information, and can Take into account texture, the marginal information of intentional shallow, feature-rich space.We select to make full use of the gold of convolutional network feature level Word tower structure creates on all scales all with the feature of powerful semanteme, gradient is carried out on each hierarchy characteristic figure Passback, is combined by top-down path and lateral connection, and building gradient class maps pyramid model, and model increases not With the importance intensity of feature under scale.Herein, our structure understands different dimensional using fused gradient information Spend feature.Calculating every level output after present image feedforward calculates first is { C2,C3,...,Cl, wherein l corresponds to as not Same convolutional layer, by the output of every level-one directly as the characteristic pattern { F of return2,F3,...,Fl, because first layer is too close to input Image, network discriminant information is insufficient, therefore does not use first layer.Later, network query function prediction output classification c, finds out each Gradient score of the score of classification c relative to all characteristic layers, i.e. output ycFor the characteristic pattern of l convolutional layerLocal derviationLocal derviation information is carried out global average pond operation processing to obtainWherein, each characteristic pattern Corresponding to the corresponding pond range of sub-block k is i, j, it is known that:
To the different characteristic figure under each level, correspondence is { m, n, k }, i.e. the length and width and port number of single feature.Through ReLU layers of activation primitive are crossed,
Obtain the feature score of current every layer of grad pyramid
In each Gradient Features figureWe carry out two step operations.Firstly, will be sampled as thereon twice, make its with it is next Layer gradient map shape is identical.Later with next layer of gradient intensity figureCarry out lateral connection enhancing shallow-layer characteristic strength and deep layer Characteristic strength is merged.Operation between every layer are as follows:
Gradient Features figure output for bottommost, we are available:
Top characteristic pattern possesses bigger weight relative to low-level image feature figure, because the semantic information of high-level characteristic figure is more Add concentration, more visual structures can be captured.Lateral connection between figure layer, enhances gradient intensity step by step.It can be seen that base It is richer in the characteristic information of grad pyramid, more judgment basis are provided for Computer Vision Task.
6. Weakly supervised positioning
Algorithm using in Weakly supervised location tasks, by neural network forecast target category, reversely passes grad pyramid structure Generation grad pyramid is broadcast, after mean filter, utilized confidence intensity mapping and threshold value to pick and subtracts module, and determined target The validity feature region of classification, to carry out the Weakly supervised positioning of target.
Fig. 3 is the flow chart of Weakly supervised positioning.Firstly, we calculate prediction classification c by trunk convolutional network.With classification Score generates superimposed characteristic strength according to grad pyramidAfter calculating thermodynamic chart, we select global peak γ, It is zoomed in and out by certain maximum intensity factor, it is alternatively that the threshold value of local maximum point, for local location, its intensity is sufficiently high. The setting of the maximum intensity factor depends on the priori knowledge of data set, and a part depends on target sizes in data set and accounts for full images The average proportions of element, a part depend on the fine granularity degree of image classification.We are worked as with average proportions for initial value, Zhi Houzuo It is adjusted for hyper parameter.In order to choose the remarkable characteristic in thermodynamic chart, we are to each thermodynamic chart application maximal filter And minimum filters, and difference thermodynamic chart is calculated, to obtain the Probability Area with local maxima mass center.
All local maximums for the threshold value being greater than in image, we accumulate them using dilation procedure multiple Candidate point, and select center of the mass center of accumulation component as predicted boundary frame.On the basis of mass center determines, we set threshold Value ladder subtracts, with the peak value γ after scalinglocalObtain center of mass point percent area.For multiple local mass center points mutually apart from each other, We choose target posting using non-maxima suppression.
Weakly supervised network structure based on grad pyramid does not need the training for primary network again, only depends on original The classification judgement of raw network, speed is faster.Meanwhile model has the interpretable of height on the basis of feature visualization Property, it is different from other network structures, for every picture, we are clear which Partial Feature determines for target classification Plan generates active influence, and it is more credible that the intensity based on feature carries out target position decision.On the basis of gradient thermodynamic chart, We provide the spatial visualization method for fine granularity classification importance simultaneously, on the basis of navigating to target area, The backpropagation of derivative and grad pyramid are fused together by we by being multiplied point by point, construct the grad pyramid of guiding Ec,(GGP,GuidedGrad-Pyramid)。
Ec=Sc⊙I (9)
Wherein ScFor the gradient intensity figure of classification c, I is that the derivative of error relative image reversely returns.This visualization side Method both has high-resolution, while having classification discriminating power, fine granularity feature (such as the item for identifying target of image clearly Line, ear, eyes etc.), be conducive to us and the discriminant classification ability of model is assessed, to instruct adjustment model into one Walk the Weakly supervised positioning work of precision target.
7. comparative experiments in group
In order to verify in the different pyramidal validity of convolutional network structure gradient, for different network VGG-19, ResNet50, ResNet101 carry out network structure comparative experiments.Experiment is directly directly to be pushed away on the raw network of network source Disconnected, the raw network in source is sorter network, and label has only used classification information.All pictures do not utilize target position information into Row training, entire data set are considered as no position labeled data collection.
1 core network comparative experiments of table
Table 1 lists effect of this paper algorithm in three kinds of trunk convolutional networks, 50,000 in experimental verification LSVRC2012 The location error and error in classification of picture.Experimental result can be seen that under different core network structures, algorithm can be complete It is positioned at preferable target.The error of classification depends on the pre-training process of network.Meanwhile the experimental results showed that in deeper depth It spends under network structure, the syncretizing effect of grad pyramid is better.
For the improvement network structure of grad pyramid, we are transported on IMAGENET data set using VGG core network It calculates, grad pyramid generates and the fusion of grad pyramid multilayer has carried out corresponding computational complexity experiment.
2 grad pyramid network structure operation time of table
Wherein, the size for saving 4 characteristic patterns before and after core network into calculating process compares preservation with other algorithms Intermediate operations processes, do not increase operation time additionally.Meanwhile in sample and stack operation, every layer of characteristic pattern shape is because be Fixed, gradient map shape is bigger, and operation time accordingly increases.When single layer sample and stack operation time is much smaller than gradient map operation Between.The complexity of its operation may be summarized to be constant core network time τ and grad pyramid generates time O (n), and wherein n is Stacking fold, but because former layer network gradient informations are unobvious, it is general only after 4 layers as characteristic pattern.It can be seen by table 2 Out, the average time of each layer of operation.After pretreatment operation is added, data set average calculating operation time is 10FPS.
8. comparative experiments is analyzed
In order to verify the Weakly supervised performance of grad pyramid, we are had chosen into Backprop, c-MWP, Grad-CAM, 3 The algorithm that kind occurs in recent years compares.Backprop algorithm is directly visualized using back-propagation gradient, not plus Pondization operation and activation;C-MWP algorithm is using having entered to compare marginal winning probability, for simulating the nerve that can protrude distinguishable region Disaggregated model.Grad-CAM algorithm is returned merely with the last layer character gradient.Table 3 is various algorithms in ImageNet- Weakly supervised locating effect on ILSVRC2012.Error is divided into premium class positioning and error in classification, and the positioning of first five class is missed with classification Difference.The lower numerical value the better.
3 algorithm comparative experiments of table
In order to assess the effect with other algorithms, we use VGG-19 network instead of ResNet101 network.From table 2 as can be seen that our algorithm ranks first in standard index value.It is higher than second place Grad- in optimal classification error 4.1 percentage points of CAM algorithm, it is higher than 18 percentage points of c-MWP algorithm, it is outstanding embodies our algorithm in main target Target locating effect, meanwhile, in first five class location error, our algorithm is higher than 0.8 percentage point of second place algorithm, is higher than 18 percentage points of c-MWP algorithm, when predicting multiple fine granularity classifications, particle position positioning is more accurate.In error in classification, Because all employing identical core network, extra training is not carried out, so error in classification is unchanged.
We are finely tuned on VOC2012 data set using training simultaneously, and trim process is not added just for classification task Targeting information.Entire data set is considered as no position labeled data collection.
It is corresponding as 4 class targets in 20 type objects, calculate frame number and all target frame numbers of the prediction IOU higher than 0.5 Ratio, experiment effect such as Fig. 6:
It can be found out with algorithm and be better than vehicles class in animal, indoor article, mankind's prediction effect, meanwhile, for 4 classes Not, this paper algorithm is better than other algorithm expression effects.
9. analysis of experimental results
Fig. 4 shows the Weakly supervised locating effect that algorithm is put in 4 classifications.It can be seen that in different types of target, Our algorithm successfully identifies the profile and edge details of target being oriented to when passback, raw in grad pyramid thermodynamic chart At having strongly connected depth characteristic, reliable foundation is provided for goal task decision.Fig. 5 show our algorithm with The contrast effect of physical tags, it can be seen that algorithm has navigated to the edge and contour structure of target in depth characteristic, more Target optimal location is had found on scaled target.Classify for fine granularity class, different classifications simultaneously, passes through appropriate threshold value ladder Subtract, Weakly supervised target location tasks are accurately completed under several scenes environment, plurality of classes.Fig. 7 shows algorithm thin Target locating effect under granularity classification for three kinds is all the not lower subclass of dog class in figure, it can be seen that position and be absorbed in target Face has the characteristic area of high confidence level, and the signature contributions degree in four limbs region is less, and background contribution degree is ignored substantially, high confidence Feature is spent to be conducive to we determined that nucleus carries out Weakly supervised positioning.

Claims (3)

1. a kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks, which is characterized in that steps are as follows:
The single scale image I of arbitrary size is inputted convolutional neural networks ConNet by step 1, using feature pyramid model with And gradient class Mean mapping Grad-CAM algorithm, calculate the cross entropy error L of classificationcross-entrop, it is anti-to calculate corresponding guidance To disease gradientEach layer of convolutional neural networks ConNet output is { C2,C3,...,Cl, pass through trunk convolution Network query function predicts classification c, for the score y of classificationc, the size w*h of input picture I;Its multilayer feature figure and output phase pair It should be { F2,F3,...,Fl};
Step 2, the weights of importance for calculating each layerPixel-level spatial-intensity is calculated on multilayer feature figure and is utilized ReLU activation primitive
Step 3 is directed to every layer of grad pyramid, carries out up-sampling and lateral connection operation, finds out superimposed intensity, i.e.,
Step 4, for superimposedAfter calculating thermodynamic chart, global peak γ is calculated, is zoomed in and out with zoom factor σ, made For local maxima threshold value;To each thermodynamic chart application maximal filter and minimum filters, after corresponding calculating Largest Mean filtering 'sIt is filtered with minimum meanAnd difference thermodynamic chart is calculated, the constant pixel of difference is set 0, to obtain Probability Area with local maxima mass center;
Step 5, by repeatedly expanding, generate multiple candidate points, find out best mass center, then using the global peak after scaling into Row ladder subtracts;
Step 6, the maximum boundary after terraced subtract, select the coordinate [xmin, ymin, xmax, ymax] of maximum rectangle frame;Output The target prediction classification D of all imagesclassWith coordinate intersection Dloc
2. object localization method according to claim 1, which is characterized in that in the step 1, in feature pyramid model Pyramid structure mainly includes two aspects: path from bottom to top of the first aspect on feedforward calculates, calculating are walked by scale The feature level of a length of 2 Analysis On Multi-scale Features mapping composition, meanwhile, select the output of the last layer in each stage as feature Map reference set;For ResNet network, the feature activation exported using the residual error in each stage;It is residual for different convolutional layers The output of poor block is { C2,C3,...,Cl, step-length is respectively { S2,S3,...,SlA pixel;Second aspect is on characteristic pattern The lateral connection in top-down path and feature interlayer;Still semantic information is stronger for high-level characteristic relative coarseness, passes through from top Downward path and the mapping of lateral connection Enhanced feature carry out more accurate positioning;Up-sampling is 2 times in spatial resolution, Then information will be up-sampled by being added by element merges with current layer information;Iteration completes this process, until pyramid construction; Feature Mapping collection is { P2,P3,...,Pl, correspond to { C2,C3,...,Cl, it is respectively provided with identical size.
3. object localization method according to claim 1 or 2, which is characterized in that the grad pyramid uses fusion Gradient information afterwards understands different dimensions feature, the specific steps are as follows:
(1.1) using the single scale image of arbitrary size as input, calculating every level output after present image feedforward calculates is {C2,C3,...,Cl, wherein l is different convolutional layers, by the output of every level-one directly as the characteristic pattern { F of return2,F3,..., Fl};
(1.2) network query function prediction output classification c, the score for finding out each classification c are obtained relative to the gradient of all characteristic layers Point, i.e. output ycFor the characteristic pattern F of convolutional layer ll kLocal derviationLocal derviation information is subjected to global average pond operation processing It obtainsWherein, the corresponding pond range of correspondence sub-block k of each characteristic pattern is i, j, it is known that:
To the different characteristic figure under each level, correspondence is { m, n, k }, i.e. the length and width and port number of single feature;Through too drastic It is function ReLU layers living,
Obtain the feature score of current every layer of grad pyramid
In each Gradient Features figureIt carries out two step operations: firstly, will be sampled as thereon twice, making itself and next layer of gradient map Shape is identical;Later with next layer of gradient intensity figureCarry out lateral connection enhancing shallow-layer characteristic strength and further feature intensity It is merged;Operation between every layer are as follows:
Wherein,Indicate up-sampling function, image interpolation value method;
Gradient Features figure output for bottommost, obtains:
Wherein, L indicates the network number of plies;
Top characteristic pattern possesses bigger weight relative to low-level image feature figure, because the semantic information of high-level characteristic figure more collects In, more visual structures can be captured;Lateral connection between figure layer, gradient intensity enhance step by step.
CN201910148554.9A 2019-02-28 2019-02-28 Target positioning method based on multi-scale feature convolutional neural network Active CN110009679B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910148554.9A CN110009679B (en) 2019-02-28 2019-02-28 Target positioning method based on multi-scale feature convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910148554.9A CN110009679B (en) 2019-02-28 2019-02-28 Target positioning method based on multi-scale feature convolutional neural network

Publications (2)

Publication Number Publication Date
CN110009679A true CN110009679A (en) 2019-07-12
CN110009679B CN110009679B (en) 2022-01-04

Family

ID=67166129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910148554.9A Active CN110009679B (en) 2019-02-28 2019-02-28 Target positioning method based on multi-scale feature convolutional neural network

Country Status (1)

Country Link
CN (1) CN110009679B (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110504029A (en) * 2019-08-29 2019-11-26 腾讯医疗健康(深圳)有限公司 A kind of medical image processing method, medical image recognition method and device
CN110517771A (en) * 2019-08-29 2019-11-29 腾讯医疗健康(深圳)有限公司 A kind of medical image processing method, medical image recognition method and device
CN110619369A (en) * 2019-09-23 2019-12-27 常熟理工学院 Fine-grained image classification method based on feature pyramid and global average pooling
CN110752028A (en) * 2019-10-21 2020-02-04 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN110852324A (en) * 2019-08-23 2020-02-28 上海撬动网络科技有限公司 Deep neural network-based container number detection method
CN110910366A (en) * 2019-11-18 2020-03-24 湖北工业大学 Visualization method of brain nuclear magnetic resonance abnormal image based on 3D CAM
CN111104538A (en) * 2019-12-06 2020-05-05 深圳久凌软件技术有限公司 Fine-grained vehicle image retrieval method and device based on multi-scale constraint
CN111104962A (en) * 2019-11-05 2020-05-05 北京航空航天大学青岛研究院 Semantic segmentation method and device for image, electronic equipment and readable storage medium
CN111275688A (en) * 2020-01-19 2020-06-12 合肥工业大学 Small target detection method based on context feature fusion screening of attention mechanism
CN111414910A (en) * 2020-03-18 2020-07-14 上海嘉沃光电科技有限公司 Small target enhancement detection method and device based on double convolutional neural network
CN111461182A (en) * 2020-03-18 2020-07-28 北京小米松果电子有限公司 Image processing method, image processing apparatus, and storage medium
CN111652350A (en) * 2020-05-07 2020-09-11 清华大学深圳国际研究生院 Neural network visual interpretation method and weak supervision object positioning method
CN111709294A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Express delivery personnel identity identification method based on multi-feature information
CN111754519A (en) * 2020-05-27 2020-10-09 浙江工业大学 Countermeasure defense method based on class activation mapping
CN111986150A (en) * 2020-07-17 2020-11-24 万达信息股份有限公司 Interactive marking refinement method for digital pathological image
CN112163530A (en) * 2020-09-30 2021-01-01 江南大学 SSD small target detection method based on feature enhancement and sample selection
CN112287999A (en) * 2020-10-27 2021-01-29 厦门大学 Weak supervision target positioning method utilizing convolutional neural network to correct gradient
CN112420174A (en) * 2020-11-04 2021-02-26 湖北工业大学 Autism cerebral magnetic resonance image visualization method based on 3D Grad-CAM
CN112465909A (en) * 2020-12-07 2021-03-09 南开大学 Class activation mapping target positioning method and system based on convolutional neural network
CN112580661A (en) * 2020-12-11 2021-03-30 江南大学 Multi-scale edge detection method under deep supervision
CN112651407A (en) * 2020-12-31 2021-04-13 中国人民解放军战略支援部队信息工程大学 CNN visualization method based on discriminative deconvolution
CN112686249A (en) * 2020-12-22 2021-04-20 中国人民解放军战略支援部队信息工程大学 Grad-CAM attack method based on anti-patch
CN113505670A (en) * 2021-06-29 2021-10-15 西南交通大学 Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
CN113537555A (en) * 2021-06-03 2021-10-22 太原理工大学 Traffic sub-region model prediction sliding mode boundary control method considering disturbance
CN113762412A (en) * 2021-09-26 2021-12-07 国网四川省电力公司电力科学研究院 Power distribution network single-phase earth fault identification method, system, terminal and medium
CN113855048A (en) * 2021-10-22 2021-12-31 武汉大学 Electroencephalogram signal visualization distinguishing method and system for autism spectrum disorder
CN114067316A (en) * 2021-11-23 2022-02-18 燕山大学 Rapid identification method based on fine-grained image classification
CN114241274A (en) * 2021-11-30 2022-03-25 电子科技大学 Small target detection method based on super-resolution multi-scale feature fusion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017079529A1 (en) * 2015-11-04 2017-05-11 Nec Laboratories America, Inc. Universal correspondence network
US10007863B1 (en) * 2015-06-05 2018-06-26 Gracenote, Inc. Logo recognition in images and videos
CN108846446A (en) * 2018-07-04 2018-11-20 国家新闻出版广电总局广播科学研究院 The object detection method of full convolutional network is merged based on multipath dense feature
CN109035251A (en) * 2018-06-06 2018-12-18 杭州电子科技大学 One kind being based on the decoded image outline detection method of Analysis On Multi-scale Features
CN109166094A (en) * 2018-07-11 2019-01-08 华南理工大学 A kind of insulator breakdown positioning identifying method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10007863B1 (en) * 2015-06-05 2018-06-26 Gracenote, Inc. Logo recognition in images and videos
WO2017079529A1 (en) * 2015-11-04 2017-05-11 Nec Laboratories America, Inc. Universal correspondence network
CN109035251A (en) * 2018-06-06 2018-12-18 杭州电子科技大学 One kind being based on the decoded image outline detection method of Analysis On Multi-scale Features
CN108846446A (en) * 2018-07-04 2018-11-20 国家新闻出版广电总局广播科学研究院 The object detection method of full convolutional network is merged based on multipath dense feature
CN109166094A (en) * 2018-07-11 2019-01-08 华南理工大学 A kind of insulator breakdown positioning identifying method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ADITYA CHATTOPADHAY 等: "Grad-CAM++: Generalized Gradient-based Visual Explanations for Deep Convolutional Networks", 《2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION》 *
SUNGMIN LEE 等: "Robust Tumor Localization with Pyramid Grad-CAM", 《ARXIV:1805.11393V1》 *

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852324A (en) * 2019-08-23 2020-02-28 上海撬动网络科技有限公司 Deep neural network-based container number detection method
CN110504029A (en) * 2019-08-29 2019-11-26 腾讯医疗健康(深圳)有限公司 A kind of medical image processing method, medical image recognition method and device
CN110517771A (en) * 2019-08-29 2019-11-29 腾讯医疗健康(深圳)有限公司 A kind of medical image processing method, medical image recognition method and device
CN110504029B (en) * 2019-08-29 2022-08-19 腾讯医疗健康(深圳)有限公司 Medical image processing method, medical image identification method and medical image identification device
WO2021036616A1 (en) * 2019-08-29 2021-03-04 腾讯科技(深圳)有限公司 Medical image processing method, medical image recognition method and device
CN110619369A (en) * 2019-09-23 2019-12-27 常熟理工学院 Fine-grained image classification method based on feature pyramid and global average pooling
CN110619369B (en) * 2019-09-23 2020-12-11 常熟理工学院 Fine-grained image classification method based on feature pyramid and global average pooling
CN110752028A (en) * 2019-10-21 2020-02-04 腾讯科技(深圳)有限公司 Image processing method, device, equipment and storage medium
CN111104962A (en) * 2019-11-05 2020-05-05 北京航空航天大学青岛研究院 Semantic segmentation method and device for image, electronic equipment and readable storage medium
CN111104962B (en) * 2019-11-05 2023-04-18 北京航空航天大学青岛研究院 Semantic segmentation method and device for image, electronic equipment and readable storage medium
CN110910366A (en) * 2019-11-18 2020-03-24 湖北工业大学 Visualization method of brain nuclear magnetic resonance abnormal image based on 3D CAM
CN110910366B (en) * 2019-11-18 2023-10-24 湖北工业大学 Visualization method of brain nuclear magnetic resonance abnormal image based on 3D CAM
CN111104538A (en) * 2019-12-06 2020-05-05 深圳久凌软件技术有限公司 Fine-grained vehicle image retrieval method and device based on multi-scale constraint
CN111275688B (en) * 2020-01-19 2023-12-12 合肥工业大学 Small target detection method based on context feature fusion screening of attention mechanism
CN111275688A (en) * 2020-01-19 2020-06-12 合肥工业大学 Small target detection method based on context feature fusion screening of attention mechanism
CN111461182A (en) * 2020-03-18 2020-07-28 北京小米松果电子有限公司 Image processing method, image processing apparatus, and storage medium
CN111414910A (en) * 2020-03-18 2020-07-14 上海嘉沃光电科技有限公司 Small target enhancement detection method and device based on double convolutional neural network
CN111414910B (en) * 2020-03-18 2023-05-02 上海嘉沃光电科技有限公司 Small target enhancement detection method and device based on double convolution neural network
CN111461182B (en) * 2020-03-18 2023-04-18 北京小米松果电子有限公司 Image processing method, image processing apparatus, and storage medium
CN111652350A (en) * 2020-05-07 2020-09-11 清华大学深圳国际研究生院 Neural network visual interpretation method and weak supervision object positioning method
CN111709294A (en) * 2020-05-18 2020-09-25 杭州电子科技大学 Express delivery personnel identity identification method based on multi-feature information
CN111709294B (en) * 2020-05-18 2023-07-14 杭州电子科技大学 Express delivery personnel identity recognition method based on multi-feature information
CN111754519B (en) * 2020-05-27 2024-04-30 浙江工业大学 Class activation mapping-based countermeasure method
CN111754519A (en) * 2020-05-27 2020-10-09 浙江工业大学 Countermeasure defense method based on class activation mapping
CN111986150B (en) * 2020-07-17 2024-02-09 万达信息股份有限公司 The method comprises the following steps of: digital number pathological image Interactive annotation refining method
CN111986150A (en) * 2020-07-17 2020-11-24 万达信息股份有限公司 Interactive marking refinement method for digital pathological image
CN112163530B (en) * 2020-09-30 2024-04-09 江南大学 SSD small target detection method based on feature enhancement and sample selection
CN112163530A (en) * 2020-09-30 2021-01-01 江南大学 SSD small target detection method based on feature enhancement and sample selection
CN112287999B (en) * 2020-10-27 2022-06-14 厦门大学 Weak supervision target positioning method for correcting gradient by using convolutional neural network
CN112287999A (en) * 2020-10-27 2021-01-29 厦门大学 Weak supervision target positioning method utilizing convolutional neural network to correct gradient
CN112420174A (en) * 2020-11-04 2021-02-26 湖北工业大学 Autism cerebral magnetic resonance image visualization method based on 3D Grad-CAM
CN112465909B (en) * 2020-12-07 2022-09-20 南开大学 Class activation mapping target positioning method and system based on convolutional neural network
CN112465909A (en) * 2020-12-07 2021-03-09 南开大学 Class activation mapping target positioning method and system based on convolutional neural network
CN112580661A (en) * 2020-12-11 2021-03-30 江南大学 Multi-scale edge detection method under deep supervision
CN112580661B (en) * 2020-12-11 2024-03-08 江南大学 Multi-scale edge detection method under deep supervision
CN112686249B (en) * 2020-12-22 2022-01-25 中国人民解放军战略支援部队信息工程大学 Grad-CAM attack method based on anti-patch
CN112686249A (en) * 2020-12-22 2021-04-20 中国人民解放军战略支援部队信息工程大学 Grad-CAM attack method based on anti-patch
CN112651407B (en) * 2020-12-31 2023-10-20 中国人民解放军战略支援部队信息工程大学 CNN visualization method based on discriminative deconvolution
CN112651407A (en) * 2020-12-31 2021-04-13 中国人民解放军战略支援部队信息工程大学 CNN visualization method based on discriminative deconvolution
CN113537555A (en) * 2021-06-03 2021-10-22 太原理工大学 Traffic sub-region model prediction sliding mode boundary control method considering disturbance
CN113505670B (en) * 2021-06-29 2023-06-23 西南交通大学 Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
CN113505670A (en) * 2021-06-29 2021-10-15 西南交通大学 Remote sensing image weak supervision building extraction method based on multi-scale CAM and super-pixels
CN113762412A (en) * 2021-09-26 2021-12-07 国网四川省电力公司电力科学研究院 Power distribution network single-phase earth fault identification method, system, terminal and medium
CN113855048A (en) * 2021-10-22 2021-12-31 武汉大学 Electroencephalogram signal visualization distinguishing method and system for autism spectrum disorder
CN114067316A (en) * 2021-11-23 2022-02-18 燕山大学 Rapid identification method based on fine-grained image classification
CN114067316B (en) * 2021-11-23 2024-05-03 燕山大学 Rapid identification method based on fine-granularity image classification
CN114241274A (en) * 2021-11-30 2022-03-25 电子科技大学 Small target detection method based on super-resolution multi-scale feature fusion
CN114241274B (en) * 2021-11-30 2023-04-07 电子科技大学 Small target detection method based on super-resolution multi-scale feature fusion

Also Published As

Publication number Publication date
CN110009679B (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN110009679A (en) A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks
Gou et al. Vehicle license plate recognition based on extremal regions and restricted Boltzmann machines
CN108319964B (en) Fire image recognition method based on mixed features and manifold learning
WO2017190574A1 (en) Fast pedestrian detection method based on aggregation channel features
CN111914664A (en) Vehicle multi-target detection and track tracking method based on re-identification
CN108875600A (en) A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO
Chen et al. Research on recognition of fly species based on improved RetinaNet and CBAM
CN108305260B (en) Method, device and equipment for detecting angular points in image
CN107368778A (en) Method for catching, device and the storage device of human face expression
CN105893946A (en) Front face image detection method
Zhang et al. A pedestrian detection method based on SVM classifier and optimized Histograms of Oriented Gradients feature
CN110119726A (en) A kind of vehicle brand multi-angle recognition methods based on YOLOv3 model
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN104504395A (en) Method and system for achieving classification of pedestrians and vehicles based on neural network
CN109711416A (en) Target identification method, device, computer equipment and storage medium
CN107480585A (en) Object detection method based on DPM algorithms
CN108734200A (en) Human body target visible detection method and device based on BING features
CN110084284A (en) Target detection and secondary classification algorithm and device based on region convolutional neural networks
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
Cai et al. Vehicle Detection Based on Deep Dual‐Vehicle Deformable Part Models
CN113902978B (en) Depth learning-based interpretable SAR image target detection method and system
CN112347967B (en) Pedestrian detection method fusing motion information in complex scene
CN109284752A (en) A kind of rapid detection method of vehicle
Huang et al. Occluded suspect search via channel-guided mechanism
CN107679528A (en) A kind of pedestrian detection method based on AdaBoost SVM Ensemble Learning Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220812

Address after: Room 1603-12, No. 8, Financial Second Street, Economic Development Zone, Wuxi City, Jiangsu Province, 214125

Patentee after: Uni-Entropy Intelligent Technology (Wuxi) Co., Ltd.

Address before: 1800 No. 214122 Jiangsu city of Wuxi Province Li Lake Avenue

Patentee before: Jiangnan University