CN106650813A - Image understanding method based on depth residual error network and LSTM - Google Patents

Image understanding method based on depth residual error network and LSTM Download PDF

Info

Publication number
CN106650813A
CN106650813A CN201611226528.6A CN201611226528A CN106650813A CN 106650813 A CN106650813 A CN 106650813A CN 201611226528 A CN201611226528 A CN 201611226528A CN 106650813 A CN106650813 A CN 106650813A
Authority
CN
China
Prior art keywords
image
lstm
residual error
depth residual
error network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611226528.6A
Other languages
Chinese (zh)
Other versions
CN106650813B (en
Inventor
胡丹
袁东芝
余卫宇
李楚怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201611226528.6A priority Critical patent/CN106650813B/en
Publication of CN106650813A publication Critical patent/CN106650813A/en
Application granted granted Critical
Publication of CN106650813B publication Critical patent/CN106650813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image understanding method based on a depth residual error network and an LSTM; the method comprises the following steps: firstly building a depth residual error network model so as to extract image abstract features, and storing the features as a feature matrix; using a dynamic attention mechanism in a LSTM model to dynamically form a proper feature vector according to the feature matrix; finally using the LSTM model to form a natural language (English) according to the feature vector. The method uses the advantages of the depth residual error network on image feature extraction and LSTM advantages on time sequence modeling; the depth residual error network and the LSTM model can form an encode-decode framework so as to convert the image content information into the natural language, thus extracting the deep information from the image.

Description

It is a kind of based on depth residual error network and the image understanding method of LSTM
Technical field
The present invention relates to image, semantic understands, deep learning field, particularly one kind are based on depth residual error network and LSTM The image understanding method of (Long Short-term Memory).
Background technology
Image understanding refers to the understanding to image, semantic.It is that, with image as object, knowledge is core, research image in what There is a science what scene correlation between what target, target, image are position.
Image understanding input is view data, the High-level content for being knowledge, belonging to image procossing research field of output. It on the basis of images steganalysis it is important that further study the property and its correlation of each target in image, and draw Understanding and the explanation to original objective scene to picture material implication, and then instruct and planning behavior.
At present conventional image understanding method is mainly based upon method of the low-level image feature in combination with grader, first using little The image processing algorithms such as wave conversion, Scale invariant features transform (SIFT), edge extracting carry out feature extraction to image, then make Calculated with the image recognitions such as potential Di Li Crays distribution (LDA), HMM (HMM), SVMs (SVM) and reasoning Method carries out Classification and Identification and sets up semantic model to the feature for extracting.Realize from algorithm, at present conventional image reason Resolving Algorithm has that generalization is poor, robustness is low, local dependence is strong, realize difficult, the low shortcoming of discrimination.
The content of the invention
The invention discloses it is a kind of based on depth residual error network and the image understanding method of LSTM, this process employs depth Residual error network in image characteristics extraction and LSTM to the advantage in terms of time series modeling, depth residual error network and LSTM models into One coding-decoding framework, by image content information natural language is changed into, and reaches the mesh of the profound information for extracting image 's.
The purpose of the present invention is realized by following technical scheme:Based on depth residual error network and the image understanding side of LSTM Method, it is characterised in that:It is applied to extract the depth residual error network model of abstract characteristics from input picture, is given birth to according to abstract characteristics Into the LSTM models of natural language;Specifically include following steps:
S1:Download training dataset;
S2:Data in step S1 data set are pre-processed;
S3:Training depth residual error network model;
S4:Training LSTM models;
S5:The abstract characteristics of images to be recognized are extracted with the depth residual error network model trained in step S3;
S6:The feature extracted in step S5 is input in the LSTM models that step S4 is trained, LSTM models are according to spy Levy generation natural language.
Preferably, training dataset is downloaded in step S1:Respectively from http://www.image-net.org、http:// Mscoco.org downloads ImageNet, MS-COCO common image data set in the two websites;ImageNet data sets are divided into training Image set and test chart image set, MS-COCO data sets are divided into training image set test image collection, corresponding, have 5 per pictures The individual natural language sentence for describing its content information.
Preferably, step S2 pretreatment is included to two kinds of situations of ImageNet data sets and MS-COCO data sets:
For ImageNet data sets:Each image, scales the images to 256 × 256 sizes, then from image in Intercept at lower left and right 55 it is big it is little be 224 × 224 standard size image, and the classification by standard size image corresponding thereto Preserve in pairs, one " standard size image-classification " is to as a data;
It is as follows for the step of MS-COCO data sets, pretreatment:
S2.1, the corresponding image of each natural language sentence is preserved in pairs, one " image-nature sentence " is right As a data;
S2.2, the image of " image-nature sentence " centering is maintained length-width ratio constant and scaled, be cut into 224 × 224 mark Quasi- sized image, and the classification by standard size image corresponding thereto preserves in pairs, " standard size an image-nature language Sentence " is to as a data;
The word occurred in S2.3, all natural sentences of statistics, duplicate removal, sequence, word total number is designated as K;By each Word all represents with the column vector of 1 × K, in column vector under be designated as word sequence number and dispose 1, other positions 0, such a vector Referred to as word is vectorial, and all of " word, word vector " is to constituting dictionary DIC of the length for K;
S2.4, by the natural sentence of " image-nature sentence " centering with based on dictionary DIC word vector representation, one Length can be expressed as the natural sentence y of C:
Preferably, depth residual error network model is trained in step S3:(" conv+ subscripts " table is used comprising 46 convolution blocks Show), 2 pond layers, 1 full articulamentum and 1 softmax grader;In each convolution block, first with batch normalization (BN) side Then method carries out nonlinear transformation using amendment linear unit (ReLu) to data normalization to data, finally carries out convolution behaviour Make.Using stochastic gradient descent (SGD) and back-propagation method (BP) during training, with pretreated ImageNet data sets (" standard size image-classification " to) is used as sample;For each sample, standard size image is propagated forward in a network, Jing Output prediction classification after softmax layers is crossed, then prediction classification is propagated backward into network header with the difference of concrete class, reversely Stochastic gradient descent algorithm adjustment network parameter used in communication process.The process of repeated sample input, until network convergence.
Preferably, LSTM models are trained in step S4:The basic structure of LSTM models is made up of LSTM neurons.LSTM moulds Type includes C layer LSTM neurons (C is the maximum length of natural sentence set in advance), can be sequentially output C word;Here make It is pretreated MS-COCO data sets (" standard size image-nature sentence " to) as sample;Training LSTM models Step is as follows:
S4.1, standard size image is input in the depth residual error network of step S3, from conv5_3_c convolution blocks end Abstract characteristics matrix is extracted, size is 7*7*2048=49*2048, used Represent;
S4.2, for each moment t, a picture material vector is generated according to below equation dynamic:
eti=fatt(ai,ht-1)
Wherein, aiIt is the vector in abstraction matrix a, ht-1It was the hidden state amount at a upper moment, fattIt it is one based on many The attention model of layer perceptron, can automatically determine the abstract characteristics that moment t more notes, αtiIt is and aiCorresponding weight, The picture material vector being dynamically generated;
S4.3, for each moment t, the forward conduction process of LSTM neurons can be expressed as:
ht=ottanh(ct)
Wherein, σ is sigmoid functions, σ (x)=(1+e-x)-1, it、ft、ct、ot、htT input gate is represented respectively, lost Forget the state variable corresponding to door, mnemon, out gate, hidden layer;Wi、Ui、Zi、Wf、Uf、Zf、Wo、Uo、Zo、Wc、Uc、ZcFor The weight matrix that LSTM model learnings are arrived, bi、bf、bc、boIt is bias term that LSTM model learnings are arrived,It is one The embeded matrix of random initializtion, m is a constant, yt-1It is the word of upper moment LSTM model output;C during t=0t、ht Initialize by formula below:
Wherein, fiinit,c、fiinit,hIt is two independent multi-layer perception (MLP)s;
S4.4, for each moment t, try to achieve the word y of output by maximizing following formulat
Wherein, λ is a constant, and C is the maximum length of natural sentence in sample;
S4.5, the difference that natural sentence in nature sentence and sample is predicted according to cross entropy costing bio disturbance, then using anti- Stochastic gradient descent (SGD) Algorithm for Training to propagation algorithm (BP) and based on RMSProp, makes cross entropy minimum.
S4.6, for each sample in MS-COCO data sets, repeat S4.1-S4.5 steps.
S4.7, repetition S4.1-S4.6 steps 20 time.
Preferably, the feature that images to be recognized is extracted in step S5 is concretely comprised the following steps:
S7.1:The image of Imagenet data sets is pre-processed using in step S2;
S7.2:Pretreated image is input in the depth residual error network that step S3 is trained, from bottom convolution Abstract characteristics matrix is extracted in block end, and size is 7*7*2048=49*2048.
Preferably, LSTM models generate nature sentence according to characteristics of image in step S6, for each moment t, wherein 0≤ t<C, using step S4.1-S4.4 a word is generated, and all words are sequentially connected composition nature sentence.
The present invention compared with prior art, has the advantage that and beneficial effect:
1st, this method is theoretical using deep learning, using great amount of images sample training depth residual error network model and LSTM moulds Type, can automatically learn the universal pattern in image, and strong robustness is applied widely.
2nd, the depth residual error network that the inventive method is adopted has 50 layers of profound structure, can fully extract in image Abstract characteristics;Meanwhile, the inventive method employs LSTM models, and rightly the time series such as natural language can be modeled, will Characteristic vector changes into natural language.Depth residual error network and LSTM network integrations, have been obviously improved the degree of accuracy of image understanding.
3rd, invention introduces a kind of dynamic attention mechanism, can move according to the eigenmatrix that depth residual error network extraction is arrived The generation suitable characteristics vector of state so that LSTM has the advantages that diverse location of the dynamic focusing to image.
Description of the drawings
Fig. 1 is a kind of based on depth residual error network and the idiographic flow of the image understanding method of LSTM of the embodiment of the present invention Figure;
Fig. 2 is step (3) in a kind of image understanding method based on depth residual error network and LSTM of the embodiment of the present invention Depth residual error network architecture;
Fig. 3 is step (3) in a kind of image understanding method based on depth residual error network and LSTM of the embodiment of the present invention Depth residual error network model in convolution block concrete structure;
Fig. 4 is step (4) in a kind of image understanding method based on depth residual error network and LSTM of the embodiment of the present invention LSTM models in LSTM neurons structure.
Specific embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited In this.
Embodiment
Method of the present invention flow chart is illustrated in figure 1, is comprised the steps:
(1), training dataset is downloaded:Respectively from http://www.image-net.org、http://mscoco.org this Download ImageNet, MS-COCO common image data set in two websites.ImageNet data sets are divided into training image collection and test Image set, training image collection contains the picture of 1000 classifications, each classification 1300, and test chart image set contains 50000 pictures; MS-COCO data sets are divided into training image set test image collection, and training image collection includes 82783 pictures, test chart image set It is corresponding comprising 40504 pictures, there are 5 natural language sentences for describing its content information per pictures.
(2), pre-process:
For ImageNet data sets:Each image, scales the images to 256 × 256 sizes, then from image in Intercept at lower left and right 55 it is big it is little be 224 × 224 standard size image, and the classification by standard size image corresponding thereto Preserve in pairs, one " standard size image-classification " is to as a data;
It is as follows for the step of MS-COCO data sets, pretreatment:
2.1st, the corresponding image of each natural language sentence is preserved in pairs, one " image-nature sentence " is right As a data;
2.2nd, maintain length-width ratio constant the image of " image-nature sentence " centering and scale, be cut into 224 × 224 mark Quasi- sized image, and the classification by standard size image corresponding thereto preserves in pairs, " standard size an image-nature language Sentence " is to as a data;
2.3rd, the word occurred in all natural sentences is counted, duplicate removal, sequence, word total number is designated as K;By each list Word all represents with the column vector of 1 × K, in column vector under be designated as word sequence number and dispose 1, other positions 0, such a vector claims For word vector, all of " word, word vector " is to constituting dictionary DIC of the length for K;
2.4th, by word vector representation of the natural sentence of " image-nature sentence " centering based on dictionary DIC, one long Spending the natural sentence y for C can be expressed as:
(3), depth residual error network model is trained:Depth residual error network structure comprising 46 convolution blocks as shown in Fig. 2 (use " conv+ subscripts " represent), 2 pond layers, 1 full articulamentum and a softmax graders.In each convolution block, first use Normalization (BN) method is criticized to data normalization, then nonlinear transformation is carried out to data using amendment linear unit (ReLu), Most with carrying out convolution operation.Using stochastic gradient descent (SGD) and back-propagation method (BP) during training, with pretreated ImageNet data sets (" standard size image-classification " to) are used as sample.Design parameter indicates in fig. 2, for example, " conv2_1_a, 1*1,64,1 " represents the entitled conv2_1_a of the convolution block, and convolution kernel size is 1 × 1, and step-length is 1, output 64 characteristic patterns.
(4), LSTM models are trained:Shown in Fig. 4 is that the basic structure of LSTM models is made up of LSTM neurons.LSTM moulds Type includes C layer LSTM neurons (C is the maximum length of natural sentence set in advance), can be sequentially output C word.Here make It is pretreated MS-COCO data sets (" standard size image-nature sentence " to) as sample.Training LSTM models Step is as follows:
4.1st, standard size image is input in the depth residual error network of step (3), from conv5_3_c convolution blocks end Abstract characteristics matrix is extracted, size is 7*7*2048=49*2048, used Represent;
4.2nd, for each moment t, a picture material vector is generated according to below equation dynamic:
eti=fatt(ai,ht-1)
Wherein, aiIt is the vector in abstraction matrix a, ht-1It was the hidden state amount at a upper moment, fattIt it is one based on many The attention model of layer perceptron, can automatically determine the abstract characteristics that moment t more notes, αtiIt is and aiCorresponding weight, The picture material vector being dynamically generated;
4.3rd, for each moment t, the forward conduction process of LSTM neurons can be expressed as:
ht=ottanh(ct)
Wherein, σ is sigmoid functions, σ (x)=(1+e-x)-1, it、ft、ct、ot、htT input gate is represented respectively, lost Forget the state variable corresponding to door, mnemon, out gate, hidden layer.Wi、Ui、Zi、Wf、Uf、Zf、Wo、Uo、Zo、Wc、Uc、ZcFor The weight matrix that LSTM, model learning are arrived, bi、bf、bc、boIt is bias term that LSTM model learnings are arrived,It is one The embeded matrix of random initializtion, m is a constant, yt-1It is the word of upper moment LSTM model output;C during t=0t、ht Initialize by formula below:
Wherein, finit,c、finit,hIt is two independent multi-layer perception (MLP)s;
4.4th, for each moment t, by maximizing following formula the word y of output is tried to achievet
Wherein, λ is a constant, and C is the maximum length of natural sentence in sample;
4.5th, the difference of natural sentence in nature sentence and sample is predicted according to cross entropy costing bio disturbance, then using reverse Propagation algorithm (BP) and stochastic gradient descent (SGD) Algorithm for Training based on RMSProp, make cross entropy minimum.
4.6th, for each sample in MS-COCO data sets, 4.1-4.5 steps are repeated.
4.7th, 4.1-4.6 steps 20 time are repeated.
(5) abstract characteristics of images to be recognized, are extracted with the depth residual error network model trained in step (3).First The image of Imagenet data sets is pre-processed using in step (2), then pretreated image is input into step (3) in the depth residual error network for training, abstract characteristics matrix is extracted from bottom convolution block end, size is 7*7*2048= 49*2048。
(6), the abstract characteristics extracted in step (5) are input in the LSTM models that step (4) is trained, for each Moment t, wherein 0≤t<C, using step S4.1-S4.4 a word is generated, and all words are sequentially connected composition nature sentence.
Above-described embodiment is the present invention preferably embodiment, but embodiments of the present invention not by above-described embodiment Limit, other any Spirit Essences without departing from the present invention and the change, modification, replacement made under principle, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims (8)

1. a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that:It is applied to from input picture Extract abstract characteristics depth residual error network model, according to abstract characteristics generate natural language LSTM models;Specifically include as Lower step:
S1:Download training dataset;
S2:Data in step S1 data set are pre-processed;
S3:Training depth residual error network model;
S4:Training LSTM models;
S5:The abstract characteristics of images to be recognized are extracted with the depth residual error network model trained in step S3;
S6:The feature extracted in step S5 is input in the LSTM models that step S4 is trained, LSTM models are given birth to according to feature Into natural sentence.
2. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that Data set in described step S1 is download two common image data sets of ImageNet, MS-COCO.
3. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that The pretreatment of step S2 is included to two kinds of situations of ImageNet data sets and MS-COCO data sets:
For ImageNet data sets:Each image, scales the images to 256 × 256 sizes, then left from image upper, middle and lower Intercept at right 55 it is big it is little be 224 × 224 standard size image, and the classification by standard size image corresponding thereto is paired Preserve, one " standard size image-classification " is to as a data;
It is as follows for the step of MS-COCO data sets, pretreatment:
S2.1, the corresponding image of each natural language sentence is preserved in pairs, one " image-nature sentence " is to conduct One data;
S2.2, the image of " image-nature sentence " centering is maintained length-width ratio constant and scaled, be cut into 224 × 224 gauge Very little image, and the classification by standard size image corresponding thereto preserves in pairs, one " standard size image-nature sentence " is right As a data;
The word occurred in S2.3, all natural sentences of statistics, duplicate removal, sequence, word total number is designated as K;By each word All represented with the column vector of 1 × K, in column vector under be designated as word sequence number dispose 1, other positions 0, such a vector is referred to as Word vector, all of " word, word vector " is to constituting dictionary DIC of the length for K;
S2.4, by the natural sentence of " image-nature sentence " centering with based on dictionary DIC word vector representation, a length Natural sentence y for C can be expressed as:
4. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that The structure of depth residual error network model includes multilayer convolution block, pond layer, full articulamentum and softmax classification in step S3 Device;In each convolution block, first then data normalization is carried out using amendment linear unit to data with batch method for normalizing Nonlinear transformation, finally carries out convolution operation.
5. a kind of according to claim 1 or 4 based on depth residual error network and the image understanding method of LSTM, its feature exists In training depth residual error network model uses stochastic gradient descent and back-propagation method in step S3, after pretreatment ImageNet data sets in " standard size image-classification " to as sample;For each sample, standard size image is in net Propagate forward in network, the output prediction classification after softmax layers, then the difference backpropagation of classification and concrete class will be predicted To network header, stochastic gradient descent algorithm adjustment network parameter used in back-propagation process;The process of repeated sample input, Until network convergence.
6. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that In step S4, LSTM models include C layer LSTM neurons, and wherein C is the maximum length of natural sentence set in advance, according to C word of secondary output;Using pretreated MS-COCO data sets " standard size image-nature sentence " to as sample;Instruction Practice LSTM model steps as follows:
S4.1, standard size image is input in the depth residual error network of step S3, bottom convolution block end is extracted abstract Eigenmatrix, size is 7*7*2048=49*2048, abstraction matrix a={ a1,…,aL,Wherein L=49, D= 2048,0≤i≤L;
S4.2, for each moment t, a picture material vector is generated according to below equation dynamic:
eti=fatt(ai,ht-1)
&alpha; t i = exp ( e t i ) &Sigma; k = 1 L exp ( e t k )
z t ^ = &Sigma; i = 1 L &alpha; t i a i
Wherein, aiIt is the vector in abstraction matrix a, ht-1It was the hidden state amount at a upper moment, fattIt is one and is based on Multilayer Perception The attention model of machine, can automatically determine the abstract characteristics that moment t more notes, αtiIt is and aiCorresponding weight,It is dynamic The picture material vector of generation;
S4.3, for each moment t, the forward conduction process of LSTM neurons can be expressed as:
i t = &sigma; ( W i Ey t - 1 + U i h t - 1 + Z i z t ^ + b i )
f t = &sigma; ( W f Ey t - 1 + U f h t - 1 + Z f z t ^ + b f )
c t = f c c t - 1 + i t tanh ( W c Ey t - 1 + U c h t - 1 + Z c z t ^ + b c )
o t = &sigma; ( W o Ey t - 1 + U o h t - 1 + Z o z t ^ + b o )
ht=ottanh(ct)
Wherein, σ is sigmoid functions, σ (x)=(1+e-x)-1, it、ft、ct、ot、htT input gate is represented respectively, forgotten State variable corresponding to door, mnemon, out gate, hidden layer;Wi、Ui、Zi、Wf、Uf、Zf、Wo、Uo、Zo、Wc、Uc、ZcFor The weight matrix that LSTM model learnings are arrived, bi、bf、bc、boIt is bias term that LSTM model learnings are arrived,It is one The embeded matrix of random initializtion, m is a constant, yt-1It is the word of upper moment LSTM model output;C during t=0t、ht Initialize by formula below:
c 0 = f i n i t , c ( 1 L &Sigma; i L a i )
h 0 = f i n i t , h ( 1 L &Sigma; i L a i )
Wherein, finit,c、finit,hIt is two independent multi-layer perception (MLP)s;
S4.4, for each moment t, try to achieve the word y of output by solving following optimization problemt
min ( - l o g ( p ( y t | a , y t - 1 ) ) + &lambda;&Sigma; i L ( 1 - &Sigma; t C &alpha; t i ) 2 )
Wherein, λ is a constant, and C is the maximum length of natural sentence in sample;
S4.5, for each moment t, the difference of natural sentence in nature sentence and sample is predicted according to cross entropy costing bio disturbance, Then trained using back-propagation algorithm and the stochastic gradient descent algorithm based on RMSProp, make cross entropy minimum;
S4.6, for each sample in MS-COCO data sets, repeat S4.1-S4.5 steps;
S4.7, repetition S4.1-S4.6 steps 20 time.
7. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that The feature of extraction images to be recognized concretely comprises the following steps in the S5:
S7.1, the image of Imagenet data sets is pre-processed using in step S2;
S7.2, pretreated image is input in the depth residual error network that step S3 is trained, from bottom convolution block end Abstract characteristics matrix is extracted at end, and size is 7*7*2048=49*2048.
8. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that LSTM models generate nature sentence according to feature in step S6, for each moment t, wherein 0≤t<C, using step S4.1-S4.4 generates a word, and all words are sequentially connected composition nature sentence.
CN201611226528.6A 2016-12-27 2016-12-27 A kind of image understanding method based on depth residual error network and LSTM Active CN106650813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611226528.6A CN106650813B (en) 2016-12-27 2016-12-27 A kind of image understanding method based on depth residual error network and LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611226528.6A CN106650813B (en) 2016-12-27 2016-12-27 A kind of image understanding method based on depth residual error network and LSTM

Publications (2)

Publication Number Publication Date
CN106650813A true CN106650813A (en) 2017-05-10
CN106650813B CN106650813B (en) 2019-11-15

Family

ID=58832759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611226528.6A Active CN106650813B (en) 2016-12-27 2016-12-27 A kind of image understanding method based on depth residual error network and LSTM

Country Status (1)

Country Link
CN (1) CN106650813B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368831A (en) * 2017-07-19 2017-11-21 中国人民解放军国防科学技术大学 English words and digit recognition method in a kind of natural scene image
CN107590443A (en) * 2017-08-23 2018-01-16 上海交通大学 Limiter stage live video automatic testing method and system based on the study of depth residual error
CN107608943A (en) * 2017-09-08 2018-01-19 中国石油大学(华东) Merge visual attention and the image method for generating captions and system of semantic notice
CN107633520A (en) * 2017-09-28 2018-01-26 福建帝视信息科技有限公司 A kind of super-resolution image method for evaluating quality based on depth residual error network
CN107657271A (en) * 2017-09-02 2018-02-02 西安电子科技大学 Hyperspectral image classification method based on long memory network in short-term
CN107742128A (en) * 2017-10-20 2018-02-27 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN107766894A (en) * 2017-11-03 2018-03-06 吉林大学 Remote sensing images spatial term method based on notice mechanism and deep learning
CN107844743A (en) * 2017-09-28 2018-03-27 浙江工商大学 A kind of image multi-subtitle automatic generation method based on multiple dimensioned layering residual error network
CN108090558A (en) * 2018-01-03 2018-05-29 华南理工大学 A kind of automatic complementing method of time series missing values based on shot and long term memory network
CN108111860A (en) * 2018-01-11 2018-06-01 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on depth residual error network
CN108416059A (en) * 2018-03-22 2018-08-17 北京市商汤科技开发有限公司 Training method and device, equipment, medium, the program of image description model
CN108427729A (en) * 2018-02-23 2018-08-21 浙江工业大学 Large-scale picture retrieval method based on depth residual error network and Hash coding
CN108648195A (en) * 2018-05-09 2018-10-12 联想(北京)有限公司 A kind of image processing method and device
CN108921911A (en) * 2018-08-01 2018-11-30 中国科学技术大学 The method that structuring picture is automatically converted to source code
CN109101984A (en) * 2017-06-20 2018-12-28 北京中科奥森数据科技有限公司 A kind of image-recognizing method and device based on convolutional neural networks
CN109117781A (en) * 2018-08-07 2019-01-01 北京飞搜科技有限公司 Method for building up, device and the more attribute recognition approaches of more attribute Recognition Models
CN109146858A (en) * 2018-08-03 2019-01-04 诚亿电子(嘉兴)有限公司 The secondary method of calibration of automatic optical inspection device problem
CN109543699A (en) * 2018-11-28 2019-03-29 北方工业大学 Image abstract generation method based on target detection
CN109558774A (en) * 2017-09-27 2019-04-02 中国海洋大学 Object automatic recognition system based on depth residual error network and support vector machines
CN109559799A (en) * 2018-10-12 2019-04-02 华南理工大学 The construction method and the model of medical image semantic description method, descriptive model
CN109670164A (en) * 2018-04-11 2019-04-23 东莞迪赛软件技术有限公司 Healthy the analysis of public opinion method based on the more word insertion Bi-LSTM residual error networks of deep layer
CN109846477A (en) * 2019-01-29 2019-06-07 北京工业大学 A kind of brain electricity classification method based on frequency band attention residual error network
CN109948691A (en) * 2019-03-14 2019-06-28 齐鲁工业大学 Iamge description generation method and device based on depth residual error network and attention
CN110032739A (en) * 2019-04-18 2019-07-19 清华大学 Chinese electronic health record name entity abstracting method and system
WO2019169816A1 (en) * 2018-03-09 2019-09-12 中山大学 Deep neural network for fine recognition of vehicle attributes, and training method thereof
CN110321755A (en) * 2018-03-28 2019-10-11 中移(苏州)软件技术有限公司 A kind of recognition methods and device
CN111667495A (en) * 2020-06-08 2020-09-15 北京环境特性研究所 Image scene analysis method and device
WO2020248841A1 (en) * 2019-06-13 2020-12-17 平安科技(深圳)有限公司 Au detection method and apparatus for image, and electronic device and storage medium
CN114338199A (en) * 2021-12-30 2022-04-12 广东工业大学 Attention mechanism-based malicious flow detection method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463878A (en) * 2014-12-11 2015-03-25 南京理工大学 Novel depth image local descriptor method
US20150117760A1 (en) * 2013-10-30 2015-04-30 Nec Laboratories America, Inc. Regionlets with Shift Invariant Neural Patterns for Object Detection
CN105631479A (en) * 2015-12-30 2016-06-01 中国科学院自动化研究所 Imbalance-learning-based depth convolution network image marking method and apparatus
CN105930841A (en) * 2016-05-13 2016-09-07 百度在线网络技术(北京)有限公司 Method and device for automatic semantic annotation of image, and computer equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150117760A1 (en) * 2013-10-30 2015-04-30 Nec Laboratories America, Inc. Regionlets with Shift Invariant Neural Patterns for Object Detection
CN104463878A (en) * 2014-12-11 2015-03-25 南京理工大学 Novel depth image local descriptor method
CN105631479A (en) * 2015-12-30 2016-06-01 中国科学院自动化研究所 Imbalance-learning-based depth convolution network image marking method and apparatus
CN105930841A (en) * 2016-05-13 2016-09-07 百度在线网络技术(北京)有限公司 Method and device for automatic semantic annotation of image, and computer equipment

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101984B (en) * 2017-06-20 2022-04-08 北京中科奥森数据科技有限公司 Image identification method and device based on convolutional neural network
CN109101984A (en) * 2017-06-20 2018-12-28 北京中科奥森数据科技有限公司 A kind of image-recognizing method and device based on convolutional neural networks
CN107368831A (en) * 2017-07-19 2017-11-21 中国人民解放军国防科学技术大学 English words and digit recognition method in a kind of natural scene image
CN107368831B (en) * 2017-07-19 2019-08-02 中国人民解放军国防科学技术大学 English words and digit recognition method in a kind of natural scene image
CN107590443A (en) * 2017-08-23 2018-01-16 上海交通大学 Limiter stage live video automatic testing method and system based on the study of depth residual error
CN107657271B (en) * 2017-09-02 2019-11-15 西安电子科技大学 Hyperspectral image classification method based on long memory network in short-term
CN107657271A (en) * 2017-09-02 2018-02-02 西安电子科技大学 Hyperspectral image classification method based on long memory network in short-term
CN107608943A (en) * 2017-09-08 2018-01-19 中国石油大学(华东) Merge visual attention and the image method for generating captions and system of semantic notice
CN109558774A (en) * 2017-09-27 2019-04-02 中国海洋大学 Object automatic recognition system based on depth residual error network and support vector machines
CN107633520A (en) * 2017-09-28 2018-01-26 福建帝视信息科技有限公司 A kind of super-resolution image method for evaluating quality based on depth residual error network
CN107844743B (en) * 2017-09-28 2020-04-28 浙江工商大学 Image multi-subtitle automatic generation method based on multi-scale hierarchical residual error network
CN107844743A (en) * 2017-09-28 2018-03-27 浙江工商大学 A kind of image multi-subtitle automatic generation method based on multiple dimensioned layering residual error network
CN107742128A (en) * 2017-10-20 2018-02-27 百度在线网络技术(北京)有限公司 Method and apparatus for output information
CN107766894B (en) * 2017-11-03 2021-01-22 吉林大学 Remote sensing image natural language generation method based on attention mechanism and deep learning
CN107766894A (en) * 2017-11-03 2018-03-06 吉林大学 Remote sensing images spatial term method based on notice mechanism and deep learning
CN108090558A (en) * 2018-01-03 2018-05-29 华南理工大学 A kind of automatic complementing method of time series missing values based on shot and long term memory network
CN108090558B (en) * 2018-01-03 2021-06-08 华南理工大学 Automatic filling method for missing value of time sequence based on long-term and short-term memory network
CN108111860A (en) * 2018-01-11 2018-06-01 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on depth residual error network
CN108111860B (en) * 2018-01-11 2020-04-14 安徽优思天成智能科技有限公司 Video sequence lost frame prediction recovery method based on depth residual error network
CN108427729A (en) * 2018-02-23 2018-08-21 浙江工业大学 Large-scale picture retrieval method based on depth residual error network and Hash coding
WO2019169816A1 (en) * 2018-03-09 2019-09-12 中山大学 Deep neural network for fine recognition of vehicle attributes, and training method thereof
CN108416059B (en) * 2018-03-22 2021-05-18 北京市商汤科技开发有限公司 Training method and device of image description model, equipment and medium
CN108416059A (en) * 2018-03-22 2018-08-17 北京市商汤科技开发有限公司 Training method and device, equipment, medium, the program of image description model
CN110321755A (en) * 2018-03-28 2019-10-11 中移(苏州)软件技术有限公司 A kind of recognition methods and device
CN109670164A (en) * 2018-04-11 2019-04-23 东莞迪赛软件技术有限公司 Healthy the analysis of public opinion method based on the more word insertion Bi-LSTM residual error networks of deep layer
CN108648195A (en) * 2018-05-09 2018-10-12 联想(北京)有限公司 A kind of image processing method and device
CN108921911B (en) * 2018-08-01 2021-03-09 中国科学技术大学 Method for automatically converting structured picture into source code
CN108921911A (en) * 2018-08-01 2018-11-30 中国科学技术大学 The method that structuring picture is automatically converted to source code
CN109146858B (en) * 2018-08-03 2021-09-17 诚亿电子(嘉兴)有限公司 Secondary checking method for problem points of automatic optical checking equipment
CN109146858A (en) * 2018-08-03 2019-01-04 诚亿电子(嘉兴)有限公司 The secondary method of calibration of automatic optical inspection device problem
CN109117781A (en) * 2018-08-07 2019-01-01 北京飞搜科技有限公司 Method for building up, device and the more attribute recognition approaches of more attribute Recognition Models
CN109117781B (en) * 2018-08-07 2020-09-08 北京一维大成科技有限公司 Multi-attribute identification model establishing method and device and multi-attribute identification method
CN109559799A (en) * 2018-10-12 2019-04-02 华南理工大学 The construction method and the model of medical image semantic description method, descriptive model
CN109543699A (en) * 2018-11-28 2019-03-29 北方工业大学 Image abstract generation method based on target detection
CN109846477A (en) * 2019-01-29 2019-06-07 北京工业大学 A kind of brain electricity classification method based on frequency band attention residual error network
CN109948691B (en) * 2019-03-14 2022-02-18 齐鲁工业大学 Image description generation method and device based on depth residual error network and attention
CN109948691A (en) * 2019-03-14 2019-06-28 齐鲁工业大学 Iamge description generation method and device based on depth residual error network and attention
CN110032739A (en) * 2019-04-18 2019-07-19 清华大学 Chinese electronic health record name entity abstracting method and system
WO2020248841A1 (en) * 2019-06-13 2020-12-17 平安科技(深圳)有限公司 Au detection method and apparatus for image, and electronic device and storage medium
CN111667495A (en) * 2020-06-08 2020-09-15 北京环境特性研究所 Image scene analysis method and device
CN114338199A (en) * 2021-12-30 2022-04-12 广东工业大学 Attention mechanism-based malicious flow detection method and system
CN114338199B (en) * 2021-12-30 2024-01-09 广东工业大学 Malicious traffic detection method and system based on attention mechanism

Also Published As

Publication number Publication date
CN106650813B (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN106650813A (en) Image understanding method based on depth residual error network and LSTM
US20200380213A1 (en) Multitask Learning As Question Answering
Nwankpa et al. Activation functions: Comparison of trends in practice and research for deep learning
Caterini et al. Deep neural networks in a mathematical framework
LeCun et al. Deep learning
US20220043972A1 (en) Answer generating device, answer learning device, answer generating method, and answer generating program
Abe Neural networks and fuzzy systems: theory and applications
CN107871014A (en) A kind of big data cross-module state search method and system based on depth integration Hash
CN107688850A (en) A kind of deep neural network compression method
US20210232753A1 (en) Ml using n-gram induced input representation
CN112131886A (en) Method for analyzing aspect level emotion of text
CN111353040A (en) GRU-based attribute level emotion analysis method
Du et al. Efficient network construction through structural plasticity
CN114254645A (en) Artificial intelligence auxiliary writing system
CN109948163B (en) Natural language semantic matching method for dynamic sequence reading
CN107562729A (en) The Party building document representation method strengthened based on neutral net and theme
Varshitha et al. Natural language processing using convolutional neural network
CN112732879B (en) Downstream task processing method and model of question-answering task
JPWO2019187696A1 (en) Vectorizers, language processing methods and programs
Harikrishnan et al. Handwritten digit recognition with feed-forward multi-layer perceptron and convolutional neural network architectures
Jin et al. Improving deep belief networks via delta rule for sentiment classification
Habeeb et al. Reducing error rate of deep learning using auto encoder and genetic algorithms
Damadi et al. The Backpropagation algorithm for a math student
WO2022164613A1 (en) Ml using n-gram induced input representation
CN114464267A (en) Method and device for model training and product prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant