CN106096670B

CN106096670B - Concatenated convolutional neural metwork training and image detecting method, apparatus and system

Info

Publication number: CN106096670B
Application number: CN201610439342.2A
Authority: CN
Inventors: 秦红伟; 闫俊杰
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2016-06-17
Filing date: 2016-06-17
Publication date: 2019-07-30
Anticipated expiration: 2036-06-17
Also published as: CN106096670A

Abstract

The present invention discloses a kind of concatenated convolutional neural metwork training and image detecting method, apparatus and system, wherein, the training method includes: that the image data of image to be learned at least regional area is processed into the image data of the different size of input area of N kind respectively, and N is the integer more than or equal to 2；Respectively using the image data of N kind input area as the input of convolutional neural networks at different levels in N grades of cascade convolutional neural networks, convolutional neural networks at different levels are trained；At least training result that convolutional neural networks at different levels are exported respectively is associated, and the training result after association is back to convolutional neural networks at different levels to adjust the parameter of neural networks at different levels.When training result is propagated to convolutional neural networks at different levels, the parameter of neural networks at different levels can be adjusted, concatenated convolutional neural network is enabled to reach the global optimization of neural network parameter in training.

Description

Concatenated convolutional neural metwork training and image detecting method, apparatus and system

Technical field

The present invention relates to image real time transfer fields, and in particular to a kind of concatenated convolutional neural metwork training and image detection Method, apparatus and system.

Background technique

Object detection is the picture for input, the position of certain all type objects is accurately detected out, in computer vision It is occupied an important position in area of pattern recognition.

Traditional object detecting method based on convolutional neural networks, selects a series of positions, size on picture first Then the region is directly inputted a convolutional neural networks, obtains classification results by different area to be tested.By suitably setting The structure for counting convolutional neural networks, can allow computer directly to learn the hiding feature in picture out, avoid engineer spy Sign, can be applied even more extensively in the detection of various classification objects.But since the calculating time of convolutional neural networks often compares The calculating time of artificial design features is much longer, therefore the lance that meet visiting degree of testing the speed and detection effect can not get both when detecting Shield.

For the object detecting method based on convolutional neural networks, commonly single-stage multi-layer convolutional neural networks at present. Since the preferable single-stage convolutional neural networks structure of classifying quality is more complex, the time for detecting each region is longer, if selecting Using comprehensive sliding window back-and-forth method for covering picture when selecting area to be tested, then it is slower to detect speed；If using for figure The algorithms selection most probable of the certain characteristic Designs of piece includes a small amount of region of object, then detects speed and increase, but selecting Regional stage may miss certain regions comprising object.For object detection problem, due to data mark very labor intensive, Time, often there are many negative sample in data set, and positive sample is seldom.In training, since positive and negative sample distribution is unbalanced, training effect Fruit is often poor.

Concatenated convolutional neural network can be made by using multistage multilayer neural network small-sized, that complexity is incremental Speed is detected with improving while sliding window back-and-forth method, and the positive negative sample for providing different proportion to network not at the same level is instructed Practice, alleviates two above-mentioned problems to a certain extent.But in traditional cascade network, neural network not at the same level is usual Separately training, can only make every level-one neural network reach local optimum, the overall performance of multistage multilayer neural network is undesirable.

Summary of the invention

The embodiment of the present invention provides a kind of concatenated convolutional neural network and its training and detection scheme.

According in a first aspect, the embodiment of the invention provides a kind of concatenated convolutional neural network training methods, comprising:

The image data of image to be learned at least regional area is processed into the figure of the different size of input area of N kind respectively As data, N is the integer more than or equal to 2；Respectively using the image data of N kind input area as N grades of cascade convolutional Neurals The input of convolutional neural networks at different levels, is trained convolutional neural networks at different levels in network, wherein N grades of cascade convolution minds It is corresponding with a kind of input area of N kind input area respectively through every level-one convolutional neural networks in network；By convolution minds at different levels It is associated through at least training result that network exports respectively, and the training result after association is back to convolutional Neurals at different levels Network is to adjust the parameters of neural networks at different levels.

Optionally, convolutional neural networks at different levels are trained, comprising: according to level 1 volume product neural network the last layer Output, obtain level 1 volume product neural network feature vector；According to the output of n-th grade of convolutional neural networks the last layer and The feature vector of (n-1)th grade of convolutional neural networks obtains the feature vector of n-th grade of convolutional neural networks, wherein and n is positive integer, And 1 < n≤N.

Optionally, the last layer of level 1 volume product neural network is convolutional layer；It is at different levels in 2nd to N grade of convolutional neural networks The last layer of convolutional neural networks is full articulamentum.

Optionally, convolutional neural networks at different levels are trained, comprising: at least seek convolutional neural networks at different levels respectively The same level loss；At least training result that convolutional neural networks at different levels are exported respectively is associated, comprising: at least to volumes at different levels The same level loss that product neural network exports respectively is weighted summation, obtains the global loss of N grades of cascade convolutional neural networks.

Optionally, the same level loss of convolutional neural networks at different levels is at least sought respectively, comprising: seeks convolution minds at different levels respectively The same level loss function and recurrence bounding box true value loss through network；The same level that at least convolutional neural networks at different levels are exported respectively Loss is weighted summation, obtains the global loss of N grades of cascade convolutional neural networks, comprising: to convolutional neural networks at different levels The same level loss function and/or recurrence bounding box true value loss exported respectively is weighted summation, obtains N grades of cascade convolution minds Global loss through network.

Optionally, the size of i-stage convolutional neural networks input area is less than j-th stage convolutional neural networks input area Size, wherein i and j is positive integer, and 1≤i < j≤N.

The embodiment of the invention also provides a kind of concatenated convolutional neural metwork training devices, comprising:

Learning data acquiring unit, for the image data of image to be learned at least regional area to be processed into N kind respectively The image data of different size of input area, the N are the integer more than or equal to 2；Training unit, being used for respectively will be described Input of the image data of N kind input area as convolutional neural networks at different levels in N grades of cascade convolutional neural networks, at different levels Convolutional neural networks are trained, wherein every level-one convolutional neural networks difference in the N grades of cascade convolutional neural networks It is corresponding with a kind of input area of the N kind input area；Back propagation unit, for export convolutional neural networks at different levels respectively At least a training result is associated, and the training result after association is back to convolutional neural networks at different levels to adjust minds at different levels Parameter through network.

Optionally, training unit includes: primary vector unit, for according to level 1 volume product neural network the last layer Output obtains the feature vector of the level 1 volume product neural network；Secondary vector unit, for according to n-th grade of convolutional Neural net The output of network the last layer and the feature vector of (n-1)th grade of convolutional neural networks, obtain the spy of n-th grade of convolutional neural networks Levy vector, wherein the n is positive integer, and 1 < n≤N.

Optionally, the training unit includes: that unit is sought in the same level loss, at least seeking convolutional Neurals at different levels respectively The same level of network is lost；Unit is sought in overall situation loss, at least losing to the same level that convolutional neural networks at different levels export respectively It is weighted summation, obtains the global loss of the N grades of cascade convolutional neural networks.

The embodiment of the invention also provides a kind of computer storage medium, store for realizing offer of the embodiment of the present invention Any concatenated convolutional neural network training method computer-readable instruction.For example, described instruction includes: will be wait learn The image data of habit image at least regional area is processed into the instruction of the image data of the different size of input area of N kind, N respectively For the integer more than or equal to 2；Respectively using the image data of N kind input area as each in N grades of cascade convolutional neural networks The input of grade convolutional neural networks, the instruction that convolutional neural networks at different levels are trained, wherein N grades of cascade convolutional Neurals Every level-one convolutional neural networks in network are corresponding with a kind of input area of N kind input area respectively；By convolutional Neurals at different levels At least training result that network exports respectively is associated, and the training result after association is back to convolutional Neural nets at different levels Network is to adjust the instruction of the parameter of neural networks at different levels.

The embodiment of the invention also provides a kind of concatenated convolutional neural metwork training systems, comprising:

Image acquiring device, for obtaining the image data of image to be learned；Memory, for storing program；Processor, The image data for receiving image to be learned realizes the operation in above-mentioned training method for executing program.

According to second aspect, the embodiment of the invention provides a kind of image detection sides based on concatenated convolutional neural network Method, comprising:

Obtain the image data of image to be detected；It is established the image data of image to be detected as above-mentioned training method The input of neural network model image to be detected is detected, obtain the testing result of image to be detected.

Optionally, after the image data for obtaining image to be detected, further includes: partition image data into multiple regions Obtain the image data of each region；Using the image data of image to be detected as the input of neural network model to figure to be detected As being detected, the testing result of image to be detected is obtained, comprising: successively regard the image data of each region as nerve respectively The input of network model detects the image data of each region, obtains the testing result of each region.

Optionally, carrying out detection to image to be detected includes: the N grade concatenated convolutional neural network in neural network model Classification score is carried out to image data at least one level neural network to calculate；By the classification score of at least one level convolutional neural networks Calculated value be compared at least one predetermined score value, and determine whether image data includes target according to an at least comparison result Object.

Optionally, after the image data for obtaining image to be detected, further includes: by the size compression of image data at symbol Close the size of n-th grade of convolutional neural networks input area, wherein n is positive integer, and 1≤n≤N-1；By the figure of image to be detected Input as data as neural network model detects image to be detected, obtains the testing result of image to be detected, packet It includes: compressed image data is inputted into n-th grade of convolutional neural networks；By n-th grade of convolutional neural networks to compressed figure It is calculated as data carry out the first classification score；If the first classification score is judged as image data less than the first predetermined score value Not comprising target object.

Optionally, if the first classification score is greater than or equal to the first predetermined score value, further include: by the ruler of image data It is very little to be compressed into the size for meeting (n+1)th grade of convolutional neural networks input area；Compressed image data is inputted into (n+1)th grade of volume Product neural network；The second classification score is carried out to compressed image data by (n+1)th grade of convolutional neural networks to calculate；If Second classification score is then judged as that image data does not include target object less than the second predetermined score value.

Optionally, if the second classification score is greater than or equal to the second predetermined score value, and when n=N-1, then it is judged as figure As data include target object.

Optionally, after judging that image data includes target object, further includes: output includes the picture number of target object According to characteristic information.

The embodiment of the invention also provides a kind of image detection devices based on concatenated convolutional neural network, comprising:

Image data acquisition unit, for obtaining the image data of image to be detected；Detection unit, being used for will be described to be checked The input for the neural network model that the image data of altimetric image is established as above-mentioned training device to described image to be detected into Row detection, obtains the testing result of described image to be detected.

Optionally, further includes: area division unit obtains each area for described image data to be divided into multiple regions The image data in domain；The detection unit is for successively regarding the image data of described each region as the neural network respectively The input of model detects the image data of described each region, obtains the testing result of described each region.

Optionally, the detection unit includes: sub-unit of classifying to obtain, for the N grade cascade in the neural network model Classification score is carried out to described image data in at least one level neural network of convolutional neural networks to calculate；Comparing unit is used for The calculated value of the classification score of at least one level convolutional neural networks is compared at least one predetermined score value, and according at least one Comparison result determines whether described image data include target object.

Optionally, further includes: compression unit, for by the size compression of described image data at meeting n-th grade of convolution mind Size through network inputs region, wherein n is positive integer, and 1≤n≤N-1；The detection unit is used for compressed institute It states image data and inputs n-th grade of convolutional neural networks；First classifies to obtain sub-unit, for passing through n-th grade of convolution mind The first classification score is carried out to compressed described image data through network to calculate；Judging unit, if being used for described first point Class score is then judged as that described image data do not include target object less than the first predetermined score value.

The embodiment of the invention also provides a kind of computer storage medium, store for realizing offer of the embodiment of the present invention Any image detecting method based on concatenated convolutional neural network computer-readable instruction.For example, described instruction It include: the instruction for obtaining the image data of image to be detected；For using the image data of image to be detected as above-mentioned The input for the neural network model that training method is established detects image to be detected, obtains the testing result of image to be detected Instruction.

The embodiment of the invention also provides a kind of image detecting systems based on concatenated convolutional neural network, comprising:

Image acquiring device, for obtaining the image data of image to be detected；Memory, for storing program；Processor, The image data for receiving image to be detected, the operation in above-mentioned detection method is realized for executing program.

According to the third aspect, the embodiment of the invention provides a kind of concatenated convolutional neural networks, comprising:

Cascade N grades of convolutional neural networks, N grades of convolutional neural networks for receiving the different size of input area of N kind Image data, be trained/detect for the image data respectively to the different size of input area of N kind, N be greater than or wait In 2 integer；At least training result that convolutional neural networks at different levels export respectively is associated, and the training result after association is returned Reach convolutional neural networks at different levels.

Technical solution provided in an embodiment of the present invention handles the image data of image to be learned at least regional area respectively At the image data of N kind input area, and N grades of cascade convolutional neural networks are separately input into, then, to N grades of cascade volumes Product neural network is trained, therefore training result, is being propagated to convolutional neural networks at different levels by the association of an at least training result When, the parameter of neural networks at different levels can be adjusted, concatenated convolutional neural network is enabled to reach neural network ginseng in training Several global optimizations.

Detailed description of the invention

It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior art Embodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described below Attached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative labor It puts, is also possible to obtain other drawings based on these drawings.

Fig. 1 is a kind of concatenated convolutional neural network training method flow chart in the embodiment of the present invention 1；

Fig. 2 is a kind of concatenated convolutional neural metwork training device principle block diagram in the embodiment of the present invention 1；

Fig. 3 is a kind of image detecting method flow chart based on concatenated convolutional neural network in the embodiment of the present invention 2；

Fig. 4 is image detecting method flow chart of the another kind based on concatenated convolutional neural network in the embodiment of the present invention 2；

Fig. 5 is a kind of image detection device functional block diagram based on concatenated convolutional neural network in the embodiment of the present invention 2；

Fig. 6 schematically shows the department of computer science of the terminal device or server that are suitable for being used to realize the embodiment of the present application The structural schematic diagram of system.

Specific embodiment

Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementation Example is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill Personnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

As long as in addition, the non-structure each other of technical characteristic involved in invention described below different embodiments It can be combined with each other at conflict.

Embodiment 1

In order to realize the training of Multilevel ANN global optimum, present embodiment discloses a kind of concatenated convolutional neural network instructions Practice method, referring to FIG. 1, being the concatenated convolutional neural network training method flow chart, this method comprises the following steps:

Step S110 obtains the image data of image to be learned at least regional area.In a particular embodiment, cunning can be used Dynamic window selection method selects learning region of at least regional area of image to be learned as the image.In optional embodiment In, each learning region can be irised out with for example rectangular bounding box, and according to objects in images zone boundary frame true value Registration, to mark whether the learning region includes band detection object, in order to neural network learning training.In the present embodiment In, each learning region can be tuned into as preset standard dimensions, such as 48*48, it should be noted that the present embodiment And in following embodiments, in the case where not making especially to declare, specific numerical value does not constitute limit to the technical solution of the present embodiment System, it should be understood that convenient for those skilled in the art understand that technical solution and the example lifted.

The image data of image to be learned at least regional area is processed into the figure of N kind input area by step S120 respectively As data.In the present embodiment, N kind input area has different sizes between any two, and N is the integer more than or equal to 2.Optional Embodiment in, the image data of image to be learned at least regional area can be compressed respectively, to obtain inputting in N The image data in region.

The image data of N kind input area is input to N grades of cascade convolutional neural networks respectively by step S130.At this In embodiment, each grade in N grades of cascade convolutional neural networks is a pair of with a kind of input area of N kind input area one respectively It answers.Cascade convolutional neural networks have a network structure not at the same level, it is not at the same level between input data attribute type it is (such as different The input data of grade respectively corresponds the data of different size of input area) it is different, to realize the number for being directed to same learning region According to study, need respectively to handle the image data of the learning region (processing means may include but be not limited to compression, sampling etc.) At the data size for meeting neural network inputs at different levels.In the present embodiment, therefore sharing N grades of convolutional neural networks cascades needs There is the image data of N kind input area, every level-one convolutional neural networks correspond to a kind of input area.In preferred embodiment In, the size of i-stage convolutional neural networks input area is less than the size of j-th stage convolutional neural networks input area, wherein i It is positive integer, and 1≤i < j≤N with j.It is hereafter illustrated by taking N=3 as an example, the size of the 1st grade of input area is such as 12* 12, the size of the 2nd grade of input area is such as 24*24, the size of 3rd level input area is such as 48*48, needs to illustrate It is in the present embodiment, to be not limiting as meeting certain multiple proportion between the size of input area not at the same level.It can will be by step The size for the predeterminable area that S110 is obtained is tuned into respectively as 12*12,24*24 and 48*48, using as above-mentioned three-stage cascade mind Input area through network.In the present embodiment, the size of the input area of prime convolutional neural networks is reduced, before can be improved The operational efficiency of grade neural network, and becoming large-sized the input area of rear class convolutional neural networks, to meet rear class network The demand for judging fineness enables convolutional neural networks at different levels to weigh efficiency and sentence using various sizes of input area Disconnected fineness.

Step S140 is trained N grades of cascade convolutional neural networks.It in a particular embodiment, can be rule of thumb Or it needs to determine neural network structures at different levels.Such as: in the case where ensuring recall rate, convolutional layer can be suitably reduced, with Improve calculating speed.When the size of input area is bigger, then trains, fineness is higher, and corresponding computational complexity also mentions Height, at this point it is possible to suitably reduce convolutional layer.It is illustrated by taking N=3 as an example: the 1st grade of neural network (such as 12net nerve net Network) size of input area is, for example, 12*12, including such as 3 convolutional layers, 2 pond layers, wherein and convolutional layer exports image Parameter for the convolution of input picture and convolution kernel, convolution kernel can be adjusted by training；Each pixel of pond layer output image For the average value in some region of input picture.The 1st grade of mind is calculated by the output of the last one convolutional layer in the output of 12net The same level through network loses loss0.The size of 2nd grade of neural network (such as 24net neural network) input area is, for example, 24* 24, including for example 1 convolutional layer, 1 pond layer and 1 full articulamentum, the output of 24net are exported by full articulamentum, are calculated The same level to the 2nd grade of neural network loses loss1.The size of 3rd level neural network (such as 48net neural network) input area For example, 48*48, including for example 2 convolutional layers, 2 pond layers and 1 full articulamentum, the output of 48net are defeated by full articulamentum Out, the same level loss loss2 of 3rd level neural network is calculated.In a preferred embodiment, neural networks at different levels can be with benefit The same level input area, which is calculated, with the loss function for returning bounding box true value returns the loss of bounding box true value, such as the recurrence of 12net The recurrence bounding box true value of recurrence bounding box true value the bbox loss1 and 48net of bounding box true value bbox loss0,24net bbox loss2。

Step S150 adjusts the parameter of neural networks at different levels according to training result.Specifically, by convolutional neural networks at different levels At least training result exported respectively is associated, and by the training result after association be back to convolutional neural networks at different levels with Adjust the parameter of neural networks at different levels.In the present embodiment, in training result, at least one parameter phase of convolutional neural networks at different levels Association, by the parameter of the adjustable each layer network of the training result of neural network, thus, so that the neural network to it is global most It is excellent close.In an alternate embodiment of the invention, which is that the same level of convolutional neural networks at different levels is lost, convolution minds at different levels The same level loss function for being exported respectively through network and/or return the loss of bounding box true value and be weighted summation, obtain N grades it is cascade The global loss loss of convolutional neural networks, specifically, the global loss loss of N grades of cascade convolutional neural networks is at least by each The same level loss weighted sum of grade convolutional neural networks obtains, optionally, can be to convolutional Neural nets at different levels in weighted sum The same level loss (loss0, loss1 and loss2) of network and the loss of recurrence bounding box true value at different levels (bbox loss0, Bbox loss1 and bbox loss2) it carries out.In a particular embodiment, specific weighting coefficient can be rule of thumb and/or real It tests to determine.

In order to realize that convolutional neural networks loss functions at different levels are associated, in an alternate embodiment of the invention, step is being executed It when S140 is trained N grades of cascade convolutional neural networks, may further include: most according to level 1 volume product neural network The output of later layer obtains the feature vector of level 1 volume product neural network；According to n-th grade of convolutional neural networks the last layer The feature vector of output and (n-1)th grade of convolutional neural networks, obtains the feature vector of n-th grade of convolutional neural networks, wherein n is Positive integer, and 1 < n≤N.In a particular embodiment, the feature vector according to obtained the same level convolutional neural networks is available Prediction label then at least seeks the same level loss of convolutional neural networks at different levels respectively, specifically, can use neural network Loss function and the loss function for returning bounding box true value are based on prediction label and true tag calculates the same level loss.By the calculating Obtained loss propagates back to the corresponding position of neural network, and adjustment can be optimized to Parameters of Neural Network Structure.Tool Body, it being illustrated by taking 3 cascade neural networks as an example, wherein the last layer of level 1 volume product neural network is convolutional layer, The output of the convolutional layer is the feature vector A of level 1 volume product neural network；Level 2 volume accumulates neural network, including what is be sequentially connected in series Convolutional layer, pond layer and full articulamentum, the last layer of the 2nd grade of neural network are full articulamentum, which exports feature Vector B is then connected to obtain the feature vector A-B of the 2nd grade of neural network with the feature vector A of the 1st grade of neural network；3rd level The last layer of neural network is full articulamentum, which exports feature vector C then feature with the 2nd grade of neural network Vector A-B, which is connected, obtains the feature vector A-B-C of 3rd level neural network.In the present embodiment, level 1 volume product neural network is most Later layer is convolutional layer, and the feature vector of level 1 volume product neural network is the last layer convolutional layer in this grade of convolutional neural networks Output, the last layer of every grade of convolutional neural networks is full articulamentum in the 2nd to N grade of convolutional neural networks, is being embodied In, the feature vector that convolutional layer exports can be adjusted to meet the data format of the feature vector of articulamentum connection.By not The feature vector of convolutional neural networks at the same level is attached, and is associated to realize neural network parameters at different levels.At this In embodiment, for convolutional neural networks not at the same level, level 1 volume product neural network is calculated using full convolutional layer, can be with Save calculation amount；And the 2nd to N grade of convolutional neural networks the last layer is full articulamentum, and the differentiation of positive negative sample may be implemented.

The present embodiment also discloses a kind of concatenated convolutional neural metwork training device, referring to FIG. 2, for concatenated convolutional mind Through network training apparatus structure schematic block diagram, which includes: learning data acquiring unit 110, training unit 120 and returns Leaflet member 130, in which:

Learning data acquiring unit 110 is used to the image data of image to be learned at least regional area being processed into N respectively The image data of the different size of input area of kind, N are the integer more than or equal to 2；Training unit 120 is used for respectively by N kind Input of the image data of input area as convolutional neural networks at different levels in N grades of cascade convolutional neural networks, to volumes at different levels Product neural network is trained, wherein every level-one convolutional neural networks in N grades of cascade convolutional neural networks respectively with N kind A kind of input area of input area is corresponding；Back propagation unit 130 be used for convolutional neural networks at different levels are exported respectively at least one Training result is associated, and the training result after association is back to convolutional neural networks at different levels to adjust neural networks at different levels Parameter.

In an alternate embodiment of the invention, training unit 120 includes: primary vector unit, for according to level 1 volume product nerve net The output of network the last layer obtains the feature vector of level 1 volume product neural network；Secondary vector unit, for being rolled up according to n-th grade The output of product neural network the last layer and the feature vector of (n-1)th grade of convolutional neural networks, obtain n-th grade of convolutional neural networks Feature vector, wherein n is positive integer, and 1 < n≤N.

In an alternate embodiment of the invention, training unit 120 includes: that unit is sought in the same level loss, each at least seeking respectively The same level loss of grade convolutional neural networks；Unit is sought in overall situation loss, at least exporting respectively to convolutional neural networks at different levels The same level loss be weighted summation, obtain N grade cascade convolutional neural networks the overall situation lose.

The present embodiment also discloses a kind of concatenated convolutional neural metwork training system, comprising: photographic device, memory and place Manage device, in which:

Image acquiring device, for obtaining the image data of image to be learned；Memory, for storing program；Processor, The image data for receiving image to be learned realizes the behaviour in above-mentioned concatenated convolutional neural network training method for executing program Make.

Concatenated convolutional neural network training method provided in this embodiment, apparatus and system, by image to be learned at least office The image data in portion region is processed into the image data of N kind input area respectively, and is separately input into N grades of cascade convolutional Neurals Network is then trained N grades of cascade convolutional neural networks, therefore the association of an at least training result, will train knot When fruit propagates to convolutional neural networks at different levels, the parameter of neural networks at different levels can be adjusted, so that concatenated convolutional neural network exists Neural network parameter global optimization can be reached when training.

Embodiment 2

Present embodiment discloses a kind of image detecting methods based on concatenated convolutional neural network, referring to FIG. 3, being the base In the image detecting method flow chart of concatenated convolutional neural network, which includes the following steps:

Step S210, training concatenated convolutional neural network model.It, can the grade according to disclosed in embodiment 1 in the present embodiment Connection convolutional neural networks training method training neural network obtains concatenated convolutional neural network model.It should be noted that this reality It applies in example, step S10 is that execution can no longer execute the step after the completion of neural metwork training in training neural network.

Step S220 obtains the image data of image to be detected.In a particular embodiment, can in advance to image data into Row pretreatment, obtains the image data of pretreated image to be detected, it is alternatively possible to which the pixel value of image data is subtracted A certain numerical value, the numerical value can be the mean value of ImageNet data set, be also possible to be closed according to training pictures calculated equal Value.Certainly, in other alternative embodiments, the pretreatment of binaryzation can also be carried out to image data in advance.

Step S230 detects the image data of image to be detected, obtains testing result.It, can be in the present embodiment Neural network model is established according to the training method of above-described embodiment, then, after the image data for obtaining image to be detected, Image to be detected is detected using the image data of the image to be detected as the input of the neural network model, is obtained to be checked The testing result of altimetric image.In an alternate embodiment of the invention, when detecting to image to be detected, it can detecte image to be detected The location information (such as coordinate position) of middle target object, the location information of target object can be used as testing result output.

It should be noted that in a particular embodiment, carrying out image detection for the first time, executes step S210 training and establish To after concatenated convolutional neural network, in subsequent detection process, it may not be necessary to execute step S210 again.

It should be noted that the image data of image to be detected that step S220 is obtained can be big for preset standard size It is small, such as 48*48.

Certainly, the image data of image to be detected that step S220 is obtained is also possible to other sizes, at this point, executing step After rapid S220, image to be detected data of the acquisition can be divided into multiple regions, to obtain the picture number of each region According to then being detected respectively to each region, i.e., successively regard the image data of each region as neural network model respectively Input the image data of each region is detected, obtain the testing result of each region.In an alternate embodiment of the invention, exist When the image data of the image to be detected that will acquire is divided into multiple regions, each region can be tuned into as preset gauge Very little size, such as 48*48.It in a particular embodiment, should be respectively by the image data of each region when executing step S230 Neural network model is sequentially input, successively the image data to each region should carry out the inspection that detection respectively obtains each region Survey result.

In a particular embodiment, when being detected, can be to detect image to be detected by way of score of classifying No includes target object, specifically, in at least one level neural network of the N grade concatenated convolutional neural network of neural network model Classification score is carried out to image data to calculate, the calculated value of the classification score of at least one level convolutional neural networks is pre- at least one Determine score value to be compared, and determines whether image data includes target object according to an at least comparison result.As an example, if Score of classifying is less than predetermined score value, then judgement is characterized combination not comprising target object.Specifically, referring to FIG. 4, executing step After rapid S220, further includes:

Step S250, by the size compression of image data at the size for meeting n-th grade of convolutional neural networks input area, In, n is positive integer, and 1≤n≤N-1.For example, by the Image Data Compression of step S220 image to be detected obtained at meeting The size (such as 12*12) of level 1 volume product neural network input area.

When executing step S230, may include:

Compressed image data is inputted n-th grade of convolutional neural networks by step S231.For example, step S250 is compressed It is input at the image data of the image to be detected for the size (such as 12*12) for meeting level 1 volume product neural network input area Level 1 volume accumulates neural network.

Step S241 carries out the first classification score to compressed image data by n-th grade of convolutional neural networks and calculates. For example, after compressed image data to be input to level 1 volume product neural network, available first classification score, at this In embodiment, classifying for this grade of convolutional neural networks can be exported by the convolutional layer of the 1st grade of neural network the last layer Divide cls score 1.In a particular embodiment, a certain element in the feature vector of neural networks at different levels characterizes this grade of nerve A certain element in the classification score of network, such as the feature vector A of the 1st grade of neural network characterizes classification score cls score 1.It should be noted that in a particular embodiment, when n is such as 2, then the first classification is scored at the 2nd grade of nerve net The classification score cls score 2 that network calculates.

Whether step S242 judges the first classification score less than the first predetermined score value.It in a particular embodiment, can basis Experience determines the first predetermined score value.If step S242 judges that the first classification score less than the first predetermined score value, judges to tie Fruit is that image data does not include target object.

In an alternate embodiment of the invention, if step S242 judges that the first classification score is greater than or equal to the first predetermined score value, Then further include:

Step S243, by the size compression of image data at the size for meeting (n+1)th grade of convolutional neural networks input area. For example, as n=1, if the 1st grade of classification score cls score 1 is less than this grade of predetermined value, by the size pressure of image data Shorten the size (such as 24*24) for meeting level 2 volume product neural network input area into；As n=2, if the 2nd grade of classification score Cls score 2 is less than this grade of predetermined value, then by the size compression of image data at meeting 3rd level convolutional neural networks input area The size (such as 48*48) in domain.

Compressed image data is inputted (n+1)th grade of convolutional neural networks by step S244.

Step S245 carries out the second classification score meter to compressed image data by (n+1)th grade of convolutional neural networks It calculates.After compressed image data is input to convolutional neural networks at different levels, available the same level neural network classification score, As n=1, then the second classification is scored at the classification score cls score 2 of the 2nd grade of neural computing；As n=2, then Two classification are scored at the classification score cls score 3 of 3rd level neural computing.

Whether step S246 judges the second classification score less than the second predetermined score value.It in a particular embodiment, can basis Experience determines the second predetermined score value.If step S242 judges that the second classification score less than the second predetermined score value, judges to tie Fruit is that image data does not include target object.

In a preferred embodiment, when executing step S246, if the second classification score is greater than or equal to second predetermined point Value, and when n=N-1, then it is judged as that image data includes target object.

In a preferred embodiment, it after judging that image data includes target object, can further include: output The characteristic information of image data comprising target object.In a particular embodiment, it can be finely tuned according to the adjustment amount of prediction to be checked Survey position, the size of zone boundary frame.After detecting all area to be tested of a picture, selecting prediction includes pair The region of elephant, for example non-maximum value of use inhibit method to remove the higher region of registration.As target in the detection method prognostic chart The position of object and size.

The present embodiment also discloses a kind of image detection device based on concatenated convolutional neural network, referring to FIG. 5, for should Image detection device structural schematic block diagram based on concatenated convolutional neural network, the detection device include: the grade of above-described embodiment Join convolutional neural networks training device 1, image data acquisition unit 2 and detection unit 3, in which:

Image data acquisition unit 2 is used to obtain the image data of image to be detected；Detection unit is used for mapping to be checked The image data of picture detects image to be detected as the input for the neural network model that above-mentioned training device is established, and obtains To the testing result of image to be detected.

In an alternate embodiment of the invention, it is somebody's turn to do the image detection device based on concatenated convolutional neural network further include: region division Unit obtains the image data of each region for partitioning image data into multiple regions；Detection unit is used for respectively will be each The image data in a region is successively used as the input of neural network model to detect the image data of each region, obtains each The testing result in a region.

In a preferred embodiment, detection unit includes: sub-unit of classifying to obtain, for the N grade grade in neural network model Join in at least one level neural network of convolutional neural networks and classification score calculating is carried out to image data；Comparing unit, being used for will The calculated value of the classification score of at least one level convolutional neural networks is compared at least one predetermined score value, and according at least one ratio Relatively result determines whether image data includes target object.

In a preferred embodiment, it is somebody's turn to do the image detection device based on concatenated convolutional neural network further include: compression unit, For by the size compression of image data at the size for meeting n-th grade of convolutional neural networks input area, wherein n is positive integer, And 1≤n≤N-1；Detection unit is used to compressed image data inputting n-th grade of convolutional neural networks；First classification score Unit is calculated for carrying out the first classification score to compressed image data by n-th grade of convolutional neural networks；Judgement is single Member, if being judged as that image data does not include target object for the first classification score less than the first predetermined score value.

The present embodiment also discloses a kind of image detecting system based on concatenated convolutional neural network, comprising:

Image acquiring device, for obtaining the image data of image to be learned；Memory, for storing program；Processor, The image data for receiving image to be learned realizes the operation in above-mentioned detection method for executing program.

The present embodiment also discloses a kind of cascade convolutional neural networks, comprising: cascade N grades of convolutional neural networks, N grades The image data for being used to receive the different size of input area of N kind of convolutional neural networks, for different size of to N kind respectively The image data of input area is trained/detects, and N is the integer more than or equal to 2；Convolutional neural networks at different levels export respectively An at least training result it is associated, the training result after association is back to convolutional neural networks at different levels.

Image detecting method provided in this embodiment based on concatenated convolutional neural network, apparatus and system, in nerve net When network training, training result is propagated into convolutional neural networks at different levels, adjusts the parameter of neural networks at different levels, so that concatenated convolutional Neural network can reach the global optimization of neural network parameter in training.Thus when being detected using the network, it can To obtain better detection effect while improving and detecting speed.

Below with reference to Fig. 6, it illustrates the calculating of the terminal device or server that are suitable for being used to realize the embodiment of the present application The structural schematic diagram of machine system 600.

As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Executable instruction in memory (ROM) 602 is loaded into random access storage device (RAM) 603 from storage section 608 Executable instruction and execute various movements appropriate and processing.CPU601 can also be stored with system 600 and operate required various journeys Sequence and data.CPU601, ROM602 and RAM603 are connected with each other by bus 604.Input/output (I/O) interface 605 also connects It is connected to bus 604.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.； And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be tangibly embodied in machine respectively into Row image block is divided to obtain the instruction of image block set；According to each image block in image block set generate at least one first The instruction of image block group；According to the instruction of at least one the first image block group training convolutional neural networks；Based on the first convolution mind Classify to each image block in image block set through network to obtain the instruction ... of at least one the second image block group.? In such embodiment, which can be downloaded and installed from network by communications portion 609, and/or from can Medium 611 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 601, the present processes are executed The above-mentioned function of middle restriction.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is right For those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation or It changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation or It changes still within the protection scope of the invention.

Claims

1. a kind of concatenated convolutional neural network training method characterized by comprising

The image data of image to be learned at least regional area is processed into the picture number of the different size of input area of N kind respectively According to the N is the integer more than or equal to 2；

Respectively using the image data of the N kind input area as convolutional Neural nets at different levels in N grades of cascade convolutional neural networks The input of network is trained convolutional neural networks at different levels, wherein every level-one in the N grades of cascade convolutional neural networks Convolutional neural networks are corresponding with a kind of input area of the N kind input area respectively；

At least training result that convolutional neural networks at different levels are exported respectively is associated, and the training result after association is returned Convolutional neural networks at different levels are reached to adjust the parameter of neural networks at different levels.

2. concatenated convolutional neural network training method as described in claim 1, which is characterized in that described to convolutional Neurals at different levels Network is trained, comprising:

According to the output of level 1 volume product neural network the last layer, the feature vector of the level 1 volume product neural network is obtained；

According to the output of n-th grade of convolutional neural networks the last layer and the feature vector of (n-1)th grade of convolutional neural networks, the is obtained The feature vector of n grades of convolutional neural networks, wherein the n is positive integer, and 1 < n≤N.

3. concatenated convolutional neural network training method as claimed in claim 2, which is characterized in that

The last layer of level 1 volume product neural network is convolutional layer；

The last layer of convolutional neural networks at different levels is full articulamentum in 2nd to N grade of convolutional neural networks.

4. concatenated convolutional neural network training method as claimed in any one of claims 1-3, which is characterized in that

It is described that convolutional neural networks at different levels are trained, comprising: at least to seek the same level damage of convolutional neural networks at different levels respectively It loses；

At least training result that convolutional neural networks at different levels are exported respectively is associated, comprising: at least to convolution minds at different levels The same level loss exported respectively through network is weighted summation, obtains the global loss of the N grades of cascade convolutional neural networks.

5. concatenated convolutional neural network training method as claimed in claim 4, which is characterized in that

The same level loss at least seeking convolutional neural networks at different levels respectively, comprising: seek convolutional neural networks at different levels respectively The same level loss function and return bounding box true value loss；

Summation at least is weighted to the same level loss that convolutional neural networks at different levels export respectively, obtains the N grades of cascade volume The global loss of product neural network, comprising: the same level loss function exported respectively to convolutional neural networks at different levels and/or recurrence side Frame true value loss in boundary's is weighted summation, obtains the global loss of the N grades of cascade convolutional neural networks.

6. concatenated convolutional neural network training method as claimed in any one of claims 1-3, which is characterized in that i-stage volume The size of product neural network input area is less than the size of j-th stage convolutional neural networks input area, wherein the i and j For positive integer, and 1≤i < j≤N.

7. a kind of image detecting method based on concatenated convolutional neural network characterized by comprising

Obtain the image data of image to be detected；

The mind that the image data of described image to be detected is established as training method as claimed in any one of claims 1 to 6 Input through network model detects described image to be detected, obtains the testing result of described image to be detected.

8. the image detecting method as claimed in claim 7 based on concatenated convolutional neural network, which is characterized in that obtained described After taking the image data of image to be detected, further includes:

Described image data are divided into multiple regions and obtain the image data of each region；

Described image to be detected is detected using the image data of described image to be detected as the input of neural network model, Obtain the testing result of described image to be detected, comprising: successively regard the image data of described each region as the mind respectively Input through network model detects the image data of described each region, obtains the testing result of described each region.

9. the image detecting method as claimed in claim 7 or 8 based on concatenated convolutional neural network, which is characterized in that described Carrying out detection to described image to be detected includes:

To described image data in at least one level neural network of the N grade concatenated convolutional neural network of the neural network model Classification score is carried out to calculate；

The calculated value of the classification score of at least one level convolutional neural networks is compared at least one predetermined score value, and according to extremely A few comparison result determines whether described image data include target object.

10. the image detecting method as claimed in claim 8 based on concatenated convolutional neural network, which is characterized in that described After the image data for obtaining image to be detected, further includes:

By the size compression of described image data at the size for meeting m grades of convolutional neural networks input areas, wherein m is positive Integer, and 1≤m≤N-1；

Described image to be detected is detected using the image data of described image to be detected as the input of neural network model, Obtain the testing result of described image to be detected, comprising: compressed described image data are inputted into the m grades of convolutional Neurals Network；

The first classification score is carried out to compressed described image data by the m grades of convolutional neural networks to calculate；

If the first classification score is judged as that described image data do not include target object less than the first predetermined score value.

11. the image detecting method as claimed in claim 10 based on concatenated convolutional neural network, which is characterized in that if institute State the first classification score more than or equal to the first predetermined score value, then further include:

By the size compression of described image data at the size for meeting m+1 grades of convolutional neural networks input areas；

Compressed described image data are inputted into the m+1 grades of convolutional neural networks；

The second classification score is carried out to compressed described image data by the m+1 grades of convolutional neural networks to calculate；

If the second classification score is judged as that described image data do not include target object less than the second predetermined score value.

12. the image detecting method as claimed in claim 11 based on concatenated convolutional neural network, which is characterized in that if institute The second classification score is stated more than or equal to the second predetermined score value, and when m=N-1, is then judged as that described image data include target Object.

13. the image detecting method as claimed in claim 12 based on concatenated convolutional neural network, which is characterized in that described Judge described image data comprising after target object, further includes:

The characteristic information of described image data of the output comprising target object.

14. a kind of concatenated convolutional neural metwork training device characterized by comprising

Learning data acquiring unit is different for the image data of image to be learned at least regional area to be processed into N kind respectively The image data of the input area of size, the N are the integer more than or equal to 2；

Training unit, for respectively using the image data of the N kind input area as each in N grades of cascade convolutional neural networks The input of grade convolutional neural networks, is trained convolutional neural networks at different levels, wherein the N grades of cascade convolutional Neural net Every level-one convolutional neural networks in network are corresponding with a kind of input area of the N kind input area respectively；

Back propagation unit, at least training result for exporting convolutional neural networks at different levels respectively are associated, and will association Training result afterwards is back to convolutional neural networks at different levels to adjust the parameter of neural networks at different levels.

15. concatenated convolutional neural metwork training device as claimed in claim 14, which is characterized in that the training unit packet It includes:

Primary vector unit obtains the level 1 volume product mind for the output according to level 1 volume product neural network the last layer Feature vector through network；

Secondary vector unit, for according to n-th grade of convolutional neural networks the last layer output and (n-1)th grade of convolutional neural networks Feature vector, obtain the feature vector of n-th grade of convolutional neural networks, wherein the n is positive integer, and 1 < n≤N.

16. the concatenated convolutional neural metwork training device as described in claims 14 or 15, which is characterized in that the training unit Include:

Unit is sought in the same level loss, and the same level at least seeking convolutional neural networks at different levels respectively is lost；

Unit is sought in overall situation loss, is asked for being at least weighted to the same level loss that convolutional neural networks at different levels export respectively With obtain the global loss of the N grades of cascade convolutional neural networks.

17. a kind of image detection device based on concatenated convolutional neural network characterized by comprising

Image data acquisition unit, for obtaining the image data of image to be detected；

Detection unit, for using the image data of described image to be detected as described in claim 14-16 any one The input for the neural network model that training device is established detects described image to be detected, obtains described image to be detected Testing result.

18. the image detection device as claimed in claim 17 based on concatenated convolutional neural network, which is characterized in that also wrap It includes:

Area division unit obtains the image data of each region for described image data to be divided into multiple regions；

The detection unit is for successively regarding the image data of described each region as the defeated of the neural network model respectively Enter and the image data of described each region is detected, obtains the testing result of described each region.

19. the image detection device based on concatenated convolutional neural network as described in claim 17 or 18, which is characterized in that institute Stating detection unit includes:

Classify to obtain sub-unit, at least one level nerve net for the N grade concatenated convolutional neural network in the neural network model Classification score is carried out to described image data in network to calculate；

Comparing unit, for carrying out the calculated value of the classification score of at least one level convolutional neural networks and at least one predetermined score value Compare, and determines whether described image data include target object according to an at least comparison result.

20. the image detection device as claimed in claim 17 based on concatenated convolutional neural network, which is characterized in that also wrap It includes:

Compression unit, for by the size compression of described image data at the ruler for meeting m grades of convolutional neural networks input areas It is very little, wherein m is positive integer, and 1≤m≤N-1；

The detection unit is used to compressed described image data inputting the m grades of convolutional neural networks；

First classifies to obtain sub-unit, for being carried out by the m grades of convolutional neural networks to compressed described image data First classification score calculates；

Judging unit, if being judged as described image data not less than the first predetermined score value for the first classification score Include target object.

21. a kind of concatenated convolutional neural metwork training system characterized by comprising

Image acquiring device, for obtaining the image data of image to be learned；

Memory, for storing program；

Processor receives the image data of the image to be learned, for executing described program to realize that claim 1-6 such as appoints Operation in the method for anticipating.

22. a kind of image detecting system based on concatenated convolutional neural network characterized by comprising

Image acquiring device, for obtaining the image data of image to be detected；

Memory, for storing program；

Processor receives the image data of described image to be detected, for executing described program to realize such as claim 7-13 Operation in any one the method.