CN106650813A - Image understanding method based on depth residual error network and LSTM - Google Patents
Image understanding method based on depth residual error network and LSTM Download PDFInfo
- Publication number
- CN106650813A CN106650813A CN201611226528.6A CN201611226528A CN106650813A CN 106650813 A CN106650813 A CN 106650813A CN 201611226528 A CN201611226528 A CN 201611226528A CN 106650813 A CN106650813 A CN 106650813A
- Authority
- CN
- China
- Prior art keywords
- image
- lstm
- residual error
- depth residual
- error network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image understanding method based on a depth residual error network and an LSTM; the method comprises the following steps: firstly building a depth residual error network model so as to extract image abstract features, and storing the features as a feature matrix; using a dynamic attention mechanism in a LSTM model to dynamically form a proper feature vector according to the feature matrix; finally using the LSTM model to form a natural language (English) according to the feature vector. The method uses the advantages of the depth residual error network on image feature extraction and LSTM advantages on time sequence modeling; the depth residual error network and the LSTM model can form an encode-decode framework so as to convert the image content information into the natural language, thus extracting the deep information from the image.
Description
Technical field
The present invention relates to image, semantic understands, deep learning field, particularly one kind are based on depth residual error network and LSTM
The image understanding method of (Long Short-term Memory).
Background technology
Image understanding refers to the understanding to image, semantic.It is that, with image as object, knowledge is core, research image in what
There is a science what scene correlation between what target, target, image are position.
Image understanding input is view data, the High-level content for being knowledge, belonging to image procossing research field of output.
It on the basis of images steganalysis it is important that further study the property and its correlation of each target in image, and draw
Understanding and the explanation to original objective scene to picture material implication, and then instruct and planning behavior.
At present conventional image understanding method is mainly based upon method of the low-level image feature in combination with grader, first using little
The image processing algorithms such as wave conversion, Scale invariant features transform (SIFT), edge extracting carry out feature extraction to image, then make
Calculated with the image recognitions such as potential Di Li Crays distribution (LDA), HMM (HMM), SVMs (SVM) and reasoning
Method carries out Classification and Identification and sets up semantic model to the feature for extracting.Realize from algorithm, at present conventional image reason
Resolving Algorithm has that generalization is poor, robustness is low, local dependence is strong, realize difficult, the low shortcoming of discrimination.
The content of the invention
The invention discloses it is a kind of based on depth residual error network and the image understanding method of LSTM, this process employs depth
Residual error network in image characteristics extraction and LSTM to the advantage in terms of time series modeling, depth residual error network and LSTM models into
One coding-decoding framework, by image content information natural language is changed into, and reaches the mesh of the profound information for extracting image
's.
The purpose of the present invention is realized by following technical scheme:Based on depth residual error network and the image understanding side of LSTM
Method, it is characterised in that:It is applied to extract the depth residual error network model of abstract characteristics from input picture, is given birth to according to abstract characteristics
Into the LSTM models of natural language;Specifically include following steps:
S1:Download training dataset;
S2:Data in step S1 data set are pre-processed;
S3:Training depth residual error network model;
S4:Training LSTM models;
S5:The abstract characteristics of images to be recognized are extracted with the depth residual error network model trained in step S3;
S6:The feature extracted in step S5 is input in the LSTM models that step S4 is trained, LSTM models are according to spy
Levy generation natural language.
Preferably, training dataset is downloaded in step S1:Respectively from http://www.image-net.org、http://
Mscoco.org downloads ImageNet, MS-COCO common image data set in the two websites;ImageNet data sets are divided into training
Image set and test chart image set, MS-COCO data sets are divided into training image set test image collection, corresponding, have 5 per pictures
The individual natural language sentence for describing its content information.
Preferably, step S2 pretreatment is included to two kinds of situations of ImageNet data sets and MS-COCO data sets:
For ImageNet data sets:Each image, scales the images to 256 × 256 sizes, then from image in
Intercept at lower left and right 55 it is big it is little be 224 × 224 standard size image, and the classification by standard size image corresponding thereto
Preserve in pairs, one " standard size image-classification " is to as a data;
It is as follows for the step of MS-COCO data sets, pretreatment:
S2.1, the corresponding image of each natural language sentence is preserved in pairs, one " image-nature sentence " is right
As a data;
S2.2, the image of " image-nature sentence " centering is maintained length-width ratio constant and scaled, be cut into 224 × 224 mark
Quasi- sized image, and the classification by standard size image corresponding thereto preserves in pairs, " standard size an image-nature language
Sentence " is to as a data;
The word occurred in S2.3, all natural sentences of statistics, duplicate removal, sequence, word total number is designated as K;By each
Word all represents with the column vector of 1 × K, in column vector under be designated as word sequence number and dispose 1, other positions 0, such a vector
Referred to as word is vectorial, and all of " word, word vector " is to constituting dictionary DIC of the length for K;
S2.4, by the natural sentence of " image-nature sentence " centering with based on dictionary DIC word vector representation, one
Length can be expressed as the natural sentence y of C:
Preferably, depth residual error network model is trained in step S3:(" conv+ subscripts " table is used comprising 46 convolution blocks
Show), 2 pond layers, 1 full articulamentum and 1 softmax grader;In each convolution block, first with batch normalization (BN) side
Then method carries out nonlinear transformation using amendment linear unit (ReLu) to data normalization to data, finally carries out convolution behaviour
Make.Using stochastic gradient descent (SGD) and back-propagation method (BP) during training, with pretreated ImageNet data sets
(" standard size image-classification " to) is used as sample;For each sample, standard size image is propagated forward in a network, Jing
Output prediction classification after softmax layers is crossed, then prediction classification is propagated backward into network header with the difference of concrete class, reversely
Stochastic gradient descent algorithm adjustment network parameter used in communication process.The process of repeated sample input, until network convergence.
Preferably, LSTM models are trained in step S4:The basic structure of LSTM models is made up of LSTM neurons.LSTM moulds
Type includes C layer LSTM neurons (C is the maximum length of natural sentence set in advance), can be sequentially output C word;Here make
It is pretreated MS-COCO data sets (" standard size image-nature sentence " to) as sample;Training LSTM models
Step is as follows:
S4.1, standard size image is input in the depth residual error network of step S3, from conv5_3_c convolution blocks end
Abstract characteristics matrix is extracted, size is 7*7*2048=49*2048, used Represent;
S4.2, for each moment t, a picture material vector is generated according to below equation dynamic:
eti=fatt(ai,ht-1)
Wherein, aiIt is the vector in abstraction matrix a, ht-1It was the hidden state amount at a upper moment, fattIt it is one based on many
The attention model of layer perceptron, can automatically determine the abstract characteristics that moment t more notes, αtiIt is and aiCorresponding weight,
The picture material vector being dynamically generated;
S4.3, for each moment t, the forward conduction process of LSTM neurons can be expressed as:
ht=ottanh(ct)
Wherein, σ is sigmoid functions, σ (x)=(1+e-x)-1, it、ft、ct、ot、htT input gate is represented respectively, lost
Forget the state variable corresponding to door, mnemon, out gate, hidden layer;Wi、Ui、Zi、Wf、Uf、Zf、Wo、Uo、Zo、Wc、Uc、ZcFor
The weight matrix that LSTM model learnings are arrived, bi、bf、bc、boIt is bias term that LSTM model learnings are arrived,It is one
The embeded matrix of random initializtion, m is a constant, yt-1It is the word of upper moment LSTM model output;C during t=0t、ht
Initialize by formula below:
Wherein, fiinit,c、fiinit,hIt is two independent multi-layer perception (MLP)s;
S4.4, for each moment t, try to achieve the word y of output by maximizing following formulat:
Wherein, λ is a constant, and C is the maximum length of natural sentence in sample;
S4.5, the difference that natural sentence in nature sentence and sample is predicted according to cross entropy costing bio disturbance, then using anti-
Stochastic gradient descent (SGD) Algorithm for Training to propagation algorithm (BP) and based on RMSProp, makes cross entropy minimum.
S4.6, for each sample in MS-COCO data sets, repeat S4.1-S4.5 steps.
S4.7, repetition S4.1-S4.6 steps 20 time.
Preferably, the feature that images to be recognized is extracted in step S5 is concretely comprised the following steps:
S7.1:The image of Imagenet data sets is pre-processed using in step S2;
S7.2:Pretreated image is input in the depth residual error network that step S3 is trained, from bottom convolution
Abstract characteristics matrix is extracted in block end, and size is 7*7*2048=49*2048.
Preferably, LSTM models generate nature sentence according to characteristics of image in step S6, for each moment t, wherein 0≤
t<C, using step S4.1-S4.4 a word is generated, and all words are sequentially connected composition nature sentence.
The present invention compared with prior art, has the advantage that and beneficial effect:
1st, this method is theoretical using deep learning, using great amount of images sample training depth residual error network model and LSTM moulds
Type, can automatically learn the universal pattern in image, and strong robustness is applied widely.
2nd, the depth residual error network that the inventive method is adopted has 50 layers of profound structure, can fully extract in image
Abstract characteristics;Meanwhile, the inventive method employs LSTM models, and rightly the time series such as natural language can be modeled, will
Characteristic vector changes into natural language.Depth residual error network and LSTM network integrations, have been obviously improved the degree of accuracy of image understanding.
3rd, invention introduces a kind of dynamic attention mechanism, can move according to the eigenmatrix that depth residual error network extraction is arrived
The generation suitable characteristics vector of state so that LSTM has the advantages that diverse location of the dynamic focusing to image.
Description of the drawings
Fig. 1 is a kind of based on depth residual error network and the idiographic flow of the image understanding method of LSTM of the embodiment of the present invention
Figure;
Fig. 2 is step (3) in a kind of image understanding method based on depth residual error network and LSTM of the embodiment of the present invention
Depth residual error network architecture;
Fig. 3 is step (3) in a kind of image understanding method based on depth residual error network and LSTM of the embodiment of the present invention
Depth residual error network model in convolution block concrete structure;
Fig. 4 is step (4) in a kind of image understanding method based on depth residual error network and LSTM of the embodiment of the present invention
LSTM models in LSTM neurons structure.
Specific embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited
In this.
Embodiment
Method of the present invention flow chart is illustrated in figure 1, is comprised the steps:
(1), training dataset is downloaded:Respectively from http://www.image-net.org、http://mscoco.org this
Download ImageNet, MS-COCO common image data set in two websites.ImageNet data sets are divided into training image collection and test
Image set, training image collection contains the picture of 1000 classifications, each classification 1300, and test chart image set contains 50000 pictures;
MS-COCO data sets are divided into training image set test image collection, and training image collection includes 82783 pictures, test chart image set
It is corresponding comprising 40504 pictures, there are 5 natural language sentences for describing its content information per pictures.
(2), pre-process:
For ImageNet data sets:Each image, scales the images to 256 × 256 sizes, then from image in
Intercept at lower left and right 55 it is big it is little be 224 × 224 standard size image, and the classification by standard size image corresponding thereto
Preserve in pairs, one " standard size image-classification " is to as a data;
It is as follows for the step of MS-COCO data sets, pretreatment:
2.1st, the corresponding image of each natural language sentence is preserved in pairs, one " image-nature sentence " is right
As a data;
2.2nd, maintain length-width ratio constant the image of " image-nature sentence " centering and scale, be cut into 224 × 224 mark
Quasi- sized image, and the classification by standard size image corresponding thereto preserves in pairs, " standard size an image-nature language
Sentence " is to as a data;
2.3rd, the word occurred in all natural sentences is counted, duplicate removal, sequence, word total number is designated as K;By each list
Word all represents with the column vector of 1 × K, in column vector under be designated as word sequence number and dispose 1, other positions 0, such a vector claims
For word vector, all of " word, word vector " is to constituting dictionary DIC of the length for K;
2.4th, by word vector representation of the natural sentence of " image-nature sentence " centering based on dictionary DIC, one long
Spending the natural sentence y for C can be expressed as:
(3), depth residual error network model is trained:Depth residual error network structure comprising 46 convolution blocks as shown in Fig. 2 (use
" conv+ subscripts " represent), 2 pond layers, 1 full articulamentum and a softmax graders.In each convolution block, first use
Normalization (BN) method is criticized to data normalization, then nonlinear transformation is carried out to data using amendment linear unit (ReLu),
Most with carrying out convolution operation.Using stochastic gradient descent (SGD) and back-propagation method (BP) during training, with pretreated
ImageNet data sets (" standard size image-classification " to) are used as sample.Design parameter indicates in fig. 2, for example,
" conv2_1_a, 1*1,64,1 " represents the entitled conv2_1_a of the convolution block, and convolution kernel size is 1 × 1, and step-length is 1, output
64 characteristic patterns.
(4), LSTM models are trained:Shown in Fig. 4 is that the basic structure of LSTM models is made up of LSTM neurons.LSTM moulds
Type includes C layer LSTM neurons (C is the maximum length of natural sentence set in advance), can be sequentially output C word.Here make
It is pretreated MS-COCO data sets (" standard size image-nature sentence " to) as sample.Training LSTM models
Step is as follows:
4.1st, standard size image is input in the depth residual error network of step (3), from conv5_3_c convolution blocks end
Abstract characteristics matrix is extracted, size is 7*7*2048=49*2048, used Represent;
4.2nd, for each moment t, a picture material vector is generated according to below equation dynamic:
eti=fatt(ai,ht-1)
Wherein, aiIt is the vector in abstraction matrix a, ht-1It was the hidden state amount at a upper moment, fattIt it is one based on many
The attention model of layer perceptron, can automatically determine the abstract characteristics that moment t more notes, αtiIt is and aiCorresponding weight,
The picture material vector being dynamically generated;
4.3rd, for each moment t, the forward conduction process of LSTM neurons can be expressed as:
ht=ottanh(ct)
Wherein, σ is sigmoid functions, σ (x)=(1+e-x)-1, it、ft、ct、ot、htT input gate is represented respectively, lost
Forget the state variable corresponding to door, mnemon, out gate, hidden layer.Wi、Ui、Zi、Wf、Uf、Zf、Wo、Uo、Zo、Wc、Uc、ZcFor
The weight matrix that LSTM, model learning are arrived, bi、bf、bc、boIt is bias term that LSTM model learnings are arrived,It is one
The embeded matrix of random initializtion, m is a constant, yt-1It is the word of upper moment LSTM model output;C during t=0t、ht
Initialize by formula below:
Wherein, finit,c、finit,hIt is two independent multi-layer perception (MLP)s;
4.4th, for each moment t, by maximizing following formula the word y of output is tried to achievet:
Wherein, λ is a constant, and C is the maximum length of natural sentence in sample;
4.5th, the difference of natural sentence in nature sentence and sample is predicted according to cross entropy costing bio disturbance, then using reverse
Propagation algorithm (BP) and stochastic gradient descent (SGD) Algorithm for Training based on RMSProp, make cross entropy minimum.
4.6th, for each sample in MS-COCO data sets, 4.1-4.5 steps are repeated.
4.7th, 4.1-4.6 steps 20 time are repeated.
(5) abstract characteristics of images to be recognized, are extracted with the depth residual error network model trained in step (3).First
The image of Imagenet data sets is pre-processed using in step (2), then pretreated image is input into step
(3) in the depth residual error network for training, abstract characteristics matrix is extracted from bottom convolution block end, size is 7*7*2048=
49*2048。
(6), the abstract characteristics extracted in step (5) are input in the LSTM models that step (4) is trained, for each
Moment t, wherein 0≤t<C, using step S4.1-S4.4 a word is generated, and all words are sequentially connected composition nature sentence.
Above-described embodiment is the present invention preferably embodiment, but embodiments of the present invention not by above-described embodiment
Limit, other any Spirit Essences without departing from the present invention and the change, modification, replacement made under principle, combine, simplification,
Equivalent substitute mode is should be, is included within protection scope of the present invention.
Claims (8)
1. a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that:It is applied to from input picture
Extract abstract characteristics depth residual error network model, according to abstract characteristics generate natural language LSTM models;Specifically include as
Lower step:
S1:Download training dataset;
S2:Data in step S1 data set are pre-processed;
S3:Training depth residual error network model;
S4:Training LSTM models;
S5:The abstract characteristics of images to be recognized are extracted with the depth residual error network model trained in step S3;
S6:The feature extracted in step S5 is input in the LSTM models that step S4 is trained, LSTM models are given birth to according to feature
Into natural sentence.
2. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that
Data set in described step S1 is download two common image data sets of ImageNet, MS-COCO.
3. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that
The pretreatment of step S2 is included to two kinds of situations of ImageNet data sets and MS-COCO data sets:
For ImageNet data sets:Each image, scales the images to 256 × 256 sizes, then left from image upper, middle and lower
Intercept at right 55 it is big it is little be 224 × 224 standard size image, and the classification by standard size image corresponding thereto is paired
Preserve, one " standard size image-classification " is to as a data;
It is as follows for the step of MS-COCO data sets, pretreatment:
S2.1, the corresponding image of each natural language sentence is preserved in pairs, one " image-nature sentence " is to conduct
One data;
S2.2, the image of " image-nature sentence " centering is maintained length-width ratio constant and scaled, be cut into 224 × 224 gauge
Very little image, and the classification by standard size image corresponding thereto preserves in pairs, one " standard size image-nature sentence " is right
As a data;
The word occurred in S2.3, all natural sentences of statistics, duplicate removal, sequence, word total number is designated as K;By each word
All represented with the column vector of 1 × K, in column vector under be designated as word sequence number dispose 1, other positions 0, such a vector is referred to as
Word vector, all of " word, word vector " is to constituting dictionary DIC of the length for K;
S2.4, by the natural sentence of " image-nature sentence " centering with based on dictionary DIC word vector representation, a length
Natural sentence y for C can be expressed as:
4. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that
The structure of depth residual error network model includes multilayer convolution block, pond layer, full articulamentum and softmax classification in step S3
Device;In each convolution block, first then data normalization is carried out using amendment linear unit to data with batch method for normalizing
Nonlinear transformation, finally carries out convolution operation.
5. a kind of according to claim 1 or 4 based on depth residual error network and the image understanding method of LSTM, its feature exists
In training depth residual error network model uses stochastic gradient descent and back-propagation method in step S3, after pretreatment
ImageNet data sets in " standard size image-classification " to as sample;For each sample, standard size image is in net
Propagate forward in network, the output prediction classification after softmax layers, then the difference backpropagation of classification and concrete class will be predicted
To network header, stochastic gradient descent algorithm adjustment network parameter used in back-propagation process;The process of repeated sample input,
Until network convergence.
6. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that
In step S4, LSTM models include C layer LSTM neurons, and wherein C is the maximum length of natural sentence set in advance, according to
C word of secondary output;Using pretreated MS-COCO data sets " standard size image-nature sentence " to as sample;Instruction
Practice LSTM model steps as follows:
S4.1, standard size image is input in the depth residual error network of step S3, bottom convolution block end is extracted abstract
Eigenmatrix, size is 7*7*2048=49*2048, abstraction matrix a={ a1,…,aL,Wherein L=49, D=
2048,0≤i≤L;
S4.2, for each moment t, a picture material vector is generated according to below equation dynamic:
eti=fatt(ai,ht-1)
Wherein, aiIt is the vector in abstraction matrix a, ht-1It was the hidden state amount at a upper moment, fattIt is one and is based on Multilayer Perception
The attention model of machine, can automatically determine the abstract characteristics that moment t more notes, αtiIt is and aiCorresponding weight,It is dynamic
The picture material vector of generation;
S4.3, for each moment t, the forward conduction process of LSTM neurons can be expressed as:
ht=ottanh(ct)
Wherein, σ is sigmoid functions, σ (x)=(1+e-x)-1, it、ft、ct、ot、htT input gate is represented respectively, forgotten
State variable corresponding to door, mnemon, out gate, hidden layer;Wi、Ui、Zi、Wf、Uf、Zf、Wo、Uo、Zo、Wc、Uc、ZcFor
The weight matrix that LSTM model learnings are arrived, bi、bf、bc、boIt is bias term that LSTM model learnings are arrived,It is one
The embeded matrix of random initializtion, m is a constant, yt-1It is the word of upper moment LSTM model output;C during t=0t、ht
Initialize by formula below:
Wherein, finit,c、finit,hIt is two independent multi-layer perception (MLP)s;
S4.4, for each moment t, try to achieve the word y of output by solving following optimization problemt:
Wherein, λ is a constant, and C is the maximum length of natural sentence in sample;
S4.5, for each moment t, the difference of natural sentence in nature sentence and sample is predicted according to cross entropy costing bio disturbance,
Then trained using back-propagation algorithm and the stochastic gradient descent algorithm based on RMSProp, make cross entropy minimum;
S4.6, for each sample in MS-COCO data sets, repeat S4.1-S4.5 steps;
S4.7, repetition S4.1-S4.6 steps 20 time.
7. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that
The feature of extraction images to be recognized concretely comprises the following steps in the S5:
S7.1, the image of Imagenet data sets is pre-processed using in step S2;
S7.2, pretreated image is input in the depth residual error network that step S3 is trained, from bottom convolution block end
Abstract characteristics matrix is extracted at end, and size is 7*7*2048=49*2048.
8. according to claim 1 a kind of based on depth residual error network and the image understanding method of LSTM, it is characterised in that
LSTM models generate nature sentence according to feature in step S6, for each moment t, wherein 0≤t<C, using step
S4.1-S4.4 generates a word, and all words are sequentially connected composition nature sentence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611226528.6A CN106650813B (en) | 2016-12-27 | 2016-12-27 | A kind of image understanding method based on depth residual error network and LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611226528.6A CN106650813B (en) | 2016-12-27 | 2016-12-27 | A kind of image understanding method based on depth residual error network and LSTM |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106650813A true CN106650813A (en) | 2017-05-10 |
CN106650813B CN106650813B (en) | 2019-11-15 |
Family
ID=58832759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611226528.6A Active CN106650813B (en) | 2016-12-27 | 2016-12-27 | A kind of image understanding method based on depth residual error network and LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106650813B (en) |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368831A (en) * | 2017-07-19 | 2017-11-21 | 中国人民解放军国防科学技术大学 | English words and digit recognition method in a kind of natural scene image |
CN107590443A (en) * | 2017-08-23 | 2018-01-16 | 上海交通大学 | Limiter stage live video automatic testing method and system based on the study of depth residual error |
CN107608943A (en) * | 2017-09-08 | 2018-01-19 | 中国石油大学(华东) | Merge visual attention and the image method for generating captions and system of semantic notice |
CN107633520A (en) * | 2017-09-28 | 2018-01-26 | 福建帝视信息科技有限公司 | A kind of super-resolution image method for evaluating quality based on depth residual error network |
CN107657271A (en) * | 2017-09-02 | 2018-02-02 | 西安电子科技大学 | Hyperspectral image classification method based on long memory network in short-term |
CN107742128A (en) * | 2017-10-20 | 2018-02-27 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN107766894A (en) * | 2017-11-03 | 2018-03-06 | 吉林大学 | Remote sensing images spatial term method based on notice mechanism and deep learning |
CN107844743A (en) * | 2017-09-28 | 2018-03-27 | 浙江工商大学 | A kind of image multi-subtitle automatic generation method based on multiple dimensioned layering residual error network |
CN108090558A (en) * | 2018-01-03 | 2018-05-29 | 华南理工大学 | A kind of automatic complementing method of time series missing values based on shot and long term memory network |
CN108111860A (en) * | 2018-01-11 | 2018-06-01 | 安徽优思天成智能科技有限公司 | Video sequence lost frames prediction restoration methods based on depth residual error network |
CN108416059A (en) * | 2018-03-22 | 2018-08-17 | 北京市商汤科技开发有限公司 | Training method and device, equipment, medium, the program of image description model |
CN108427729A (en) * | 2018-02-23 | 2018-08-21 | 浙江工业大学 | Large-scale picture retrieval method based on depth residual error network and Hash coding |
CN108648195A (en) * | 2018-05-09 | 2018-10-12 | 联想(北京)有限公司 | A kind of image processing method and device |
CN108921911A (en) * | 2018-08-01 | 2018-11-30 | 中国科学技术大学 | The method that structuring picture is automatically converted to source code |
CN109101984A (en) * | 2017-06-20 | 2018-12-28 | 北京中科奥森数据科技有限公司 | A kind of image-recognizing method and device based on convolutional neural networks |
CN109117781A (en) * | 2018-08-07 | 2019-01-01 | 北京飞搜科技有限公司 | Method for building up, device and the more attribute recognition approaches of more attribute Recognition Models |
CN109146858A (en) * | 2018-08-03 | 2019-01-04 | 诚亿电子(嘉兴)有限公司 | The secondary method of calibration of automatic optical inspection device problem |
CN109543699A (en) * | 2018-11-28 | 2019-03-29 | 北方工业大学 | Image abstract generation method based on target detection |
CN109558774A (en) * | 2017-09-27 | 2019-04-02 | 中国海洋大学 | Object automatic recognition system based on depth residual error network and support vector machines |
CN109559799A (en) * | 2018-10-12 | 2019-04-02 | 华南理工大学 | The construction method and the model of medical image semantic description method, descriptive model |
CN109670164A (en) * | 2018-04-11 | 2019-04-23 | 东莞迪赛软件技术有限公司 | Healthy the analysis of public opinion method based on the more word insertion Bi-LSTM residual error networks of deep layer |
CN109846477A (en) * | 2019-01-29 | 2019-06-07 | 北京工业大学 | A kind of brain electricity classification method based on frequency band attention residual error network |
CN109948691A (en) * | 2019-03-14 | 2019-06-28 | 齐鲁工业大学 | Iamge description generation method and device based on depth residual error network and attention |
CN110032739A (en) * | 2019-04-18 | 2019-07-19 | 清华大学 | Chinese electronic health record name entity abstracting method and system |
WO2019169816A1 (en) * | 2018-03-09 | 2019-09-12 | 中山大学 | Deep neural network for fine recognition of vehicle attributes, and training method thereof |
CN110321755A (en) * | 2018-03-28 | 2019-10-11 | 中移(苏州)软件技术有限公司 | A kind of recognition methods and device |
CN111667495A (en) * | 2020-06-08 | 2020-09-15 | 北京环境特性研究所 | Image scene analysis method and device |
WO2020248841A1 (en) * | 2019-06-13 | 2020-12-17 | 平安科技(深圳)有限公司 | Au detection method and apparatus for image, and electronic device and storage medium |
CN114338199A (en) * | 2021-12-30 | 2022-04-12 | 广东工业大学 | Attention mechanism-based malicious flow detection method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104463878A (en) * | 2014-12-11 | 2015-03-25 | 南京理工大学 | Novel depth image local descriptor method |
US20150117760A1 (en) * | 2013-10-30 | 2015-04-30 | Nec Laboratories America, Inc. | Regionlets with Shift Invariant Neural Patterns for Object Detection |
CN105631479A (en) * | 2015-12-30 | 2016-06-01 | 中国科学院自动化研究所 | Imbalance-learning-based depth convolution network image marking method and apparatus |
CN105930841A (en) * | 2016-05-13 | 2016-09-07 | 百度在线网络技术(北京)有限公司 | Method and device for automatic semantic annotation of image, and computer equipment |
-
2016
- 2016-12-27 CN CN201611226528.6A patent/CN106650813B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150117760A1 (en) * | 2013-10-30 | 2015-04-30 | Nec Laboratories America, Inc. | Regionlets with Shift Invariant Neural Patterns for Object Detection |
CN104463878A (en) * | 2014-12-11 | 2015-03-25 | 南京理工大学 | Novel depth image local descriptor method |
CN105631479A (en) * | 2015-12-30 | 2016-06-01 | 中国科学院自动化研究所 | Imbalance-learning-based depth convolution network image marking method and apparatus |
CN105930841A (en) * | 2016-05-13 | 2016-09-07 | 百度在线网络技术(北京)有限公司 | Method and device for automatic semantic annotation of image, and computer equipment |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109101984B (en) * | 2017-06-20 | 2022-04-08 | 北京中科奥森数据科技有限公司 | Image identification method and device based on convolutional neural network |
CN109101984A (en) * | 2017-06-20 | 2018-12-28 | 北京中科奥森数据科技有限公司 | A kind of image-recognizing method and device based on convolutional neural networks |
CN107368831A (en) * | 2017-07-19 | 2017-11-21 | 中国人民解放军国防科学技术大学 | English words and digit recognition method in a kind of natural scene image |
CN107368831B (en) * | 2017-07-19 | 2019-08-02 | 中国人民解放军国防科学技术大学 | English words and digit recognition method in a kind of natural scene image |
CN107590443A (en) * | 2017-08-23 | 2018-01-16 | 上海交通大学 | Limiter stage live video automatic testing method and system based on the study of depth residual error |
CN107657271B (en) * | 2017-09-02 | 2019-11-15 | 西安电子科技大学 | Hyperspectral image classification method based on long memory network in short-term |
CN107657271A (en) * | 2017-09-02 | 2018-02-02 | 西安电子科技大学 | Hyperspectral image classification method based on long memory network in short-term |
CN107608943A (en) * | 2017-09-08 | 2018-01-19 | 中国石油大学(华东) | Merge visual attention and the image method for generating captions and system of semantic notice |
CN109558774A (en) * | 2017-09-27 | 2019-04-02 | 中国海洋大学 | Object automatic recognition system based on depth residual error network and support vector machines |
CN107633520A (en) * | 2017-09-28 | 2018-01-26 | 福建帝视信息科技有限公司 | A kind of super-resolution image method for evaluating quality based on depth residual error network |
CN107844743B (en) * | 2017-09-28 | 2020-04-28 | 浙江工商大学 | Image multi-subtitle automatic generation method based on multi-scale hierarchical residual error network |
CN107844743A (en) * | 2017-09-28 | 2018-03-27 | 浙江工商大学 | A kind of image multi-subtitle automatic generation method based on multiple dimensioned layering residual error network |
CN107742128A (en) * | 2017-10-20 | 2018-02-27 | 百度在线网络技术(北京)有限公司 | Method and apparatus for output information |
CN107766894B (en) * | 2017-11-03 | 2021-01-22 | 吉林大学 | Remote sensing image natural language generation method based on attention mechanism and deep learning |
CN107766894A (en) * | 2017-11-03 | 2018-03-06 | 吉林大学 | Remote sensing images spatial term method based on notice mechanism and deep learning |
CN108090558A (en) * | 2018-01-03 | 2018-05-29 | 华南理工大学 | A kind of automatic complementing method of time series missing values based on shot and long term memory network |
CN108090558B (en) * | 2018-01-03 | 2021-06-08 | 华南理工大学 | Automatic filling method for missing value of time sequence based on long-term and short-term memory network |
CN108111860A (en) * | 2018-01-11 | 2018-06-01 | 安徽优思天成智能科技有限公司 | Video sequence lost frames prediction restoration methods based on depth residual error network |
CN108111860B (en) * | 2018-01-11 | 2020-04-14 | 安徽优思天成智能科技有限公司 | Video sequence lost frame prediction recovery method based on depth residual error network |
CN108427729A (en) * | 2018-02-23 | 2018-08-21 | 浙江工业大学 | Large-scale picture retrieval method based on depth residual error network and Hash coding |
WO2019169816A1 (en) * | 2018-03-09 | 2019-09-12 | 中山大学 | Deep neural network for fine recognition of vehicle attributes, and training method thereof |
CN108416059B (en) * | 2018-03-22 | 2021-05-18 | 北京市商汤科技开发有限公司 | Training method and device of image description model, equipment and medium |
CN108416059A (en) * | 2018-03-22 | 2018-08-17 | 北京市商汤科技开发有限公司 | Training method and device, equipment, medium, the program of image description model |
CN110321755A (en) * | 2018-03-28 | 2019-10-11 | 中移(苏州)软件技术有限公司 | A kind of recognition methods and device |
CN109670164A (en) * | 2018-04-11 | 2019-04-23 | 东莞迪赛软件技术有限公司 | Healthy the analysis of public opinion method based on the more word insertion Bi-LSTM residual error networks of deep layer |
CN108648195A (en) * | 2018-05-09 | 2018-10-12 | 联想(北京)有限公司 | A kind of image processing method and device |
CN108921911B (en) * | 2018-08-01 | 2021-03-09 | 中国科学技术大学 | Method for automatically converting structured picture into source code |
CN108921911A (en) * | 2018-08-01 | 2018-11-30 | 中国科学技术大学 | The method that structuring picture is automatically converted to source code |
CN109146858B (en) * | 2018-08-03 | 2021-09-17 | 诚亿电子(嘉兴)有限公司 | Secondary checking method for problem points of automatic optical checking equipment |
CN109146858A (en) * | 2018-08-03 | 2019-01-04 | 诚亿电子(嘉兴)有限公司 | The secondary method of calibration of automatic optical inspection device problem |
CN109117781A (en) * | 2018-08-07 | 2019-01-01 | 北京飞搜科技有限公司 | Method for building up, device and the more attribute recognition approaches of more attribute Recognition Models |
CN109117781B (en) * | 2018-08-07 | 2020-09-08 | 北京一维大成科技有限公司 | Multi-attribute identification model establishing method and device and multi-attribute identification method |
CN109559799A (en) * | 2018-10-12 | 2019-04-02 | 华南理工大学 | The construction method and the model of medical image semantic description method, descriptive model |
CN109543699A (en) * | 2018-11-28 | 2019-03-29 | 北方工业大学 | Image abstract generation method based on target detection |
CN109846477A (en) * | 2019-01-29 | 2019-06-07 | 北京工业大学 | A kind of brain electricity classification method based on frequency band attention residual error network |
CN109948691B (en) * | 2019-03-14 | 2022-02-18 | 齐鲁工业大学 | Image description generation method and device based on depth residual error network and attention |
CN109948691A (en) * | 2019-03-14 | 2019-06-28 | 齐鲁工业大学 | Iamge description generation method and device based on depth residual error network and attention |
CN110032739A (en) * | 2019-04-18 | 2019-07-19 | 清华大学 | Chinese electronic health record name entity abstracting method and system |
WO2020248841A1 (en) * | 2019-06-13 | 2020-12-17 | 平安科技(深圳)有限公司 | Au detection method and apparatus for image, and electronic device and storage medium |
CN111667495A (en) * | 2020-06-08 | 2020-09-15 | 北京环境特性研究所 | Image scene analysis method and device |
CN114338199A (en) * | 2021-12-30 | 2022-04-12 | 广东工业大学 | Attention mechanism-based malicious flow detection method and system |
CN114338199B (en) * | 2021-12-30 | 2024-01-09 | 广东工业大学 | Malicious traffic detection method and system based on attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN106650813B (en) | 2019-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106650813A (en) | Image understanding method based on depth residual error network and LSTM | |
US20200380213A1 (en) | Multitask Learning As Question Answering | |
Nwankpa et al. | Activation functions: Comparison of trends in practice and research for deep learning | |
Caterini et al. | Deep neural networks in a mathematical framework | |
LeCun et al. | Deep learning | |
US20220043972A1 (en) | Answer generating device, answer learning device, answer generating method, and answer generating program | |
Abe | Neural networks and fuzzy systems: theory and applications | |
CN107871014A (en) | A kind of big data cross-module state search method and system based on depth integration Hash | |
CN107688850A (en) | A kind of deep neural network compression method | |
US20210232753A1 (en) | Ml using n-gram induced input representation | |
CN112131886A (en) | Method for analyzing aspect level emotion of text | |
CN111353040A (en) | GRU-based attribute level emotion analysis method | |
Du et al. | Efficient network construction through structural plasticity | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
CN109948163B (en) | Natural language semantic matching method for dynamic sequence reading | |
CN107562729A (en) | The Party building document representation method strengthened based on neutral net and theme | |
Varshitha et al. | Natural language processing using convolutional neural network | |
CN112732879B (en) | Downstream task processing method and model of question-answering task | |
JPWO2019187696A1 (en) | Vectorizers, language processing methods and programs | |
Harikrishnan et al. | Handwritten digit recognition with feed-forward multi-layer perceptron and convolutional neural network architectures | |
Jin et al. | Improving deep belief networks via delta rule for sentiment classification | |
Habeeb et al. | Reducing error rate of deep learning using auto encoder and genetic algorithms | |
Damadi et al. | The Backpropagation algorithm for a math student | |
WO2022164613A1 (en) | Ml using n-gram induced input representation | |
CN114464267A (en) | Method and device for model training and product prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |