CN107968962A - A kind of video generation method of the non-conterminous image of two frames based on deep learning - Google Patents

A kind of video generation method of the non-conterminous image of two frames based on deep learning Download PDF

Info

Publication number
CN107968962A
CN107968962A CN201711343243.5A CN201711343243A CN107968962A CN 107968962 A CN107968962 A CN 107968962A CN 201711343243 A CN201711343243 A CN 201711343243A CN 107968962 A CN107968962 A CN 107968962A
Authority
CN
China
Prior art keywords
frames
image
maker
video
conterminous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711343243.5A
Other languages
Chinese (zh)
Other versions
CN107968962B (en
Inventor
温世平
刘威威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201711343243.5A priority Critical patent/CN107968962B/en
Publication of CN107968962A publication Critical patent/CN107968962A/en
Application granted granted Critical
Publication of CN107968962B publication Critical patent/CN107968962B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of video generation method of the non-conterminous image of two frames based on deep learning, belong to confrontation study and video generation field, N frame input pictures are obtained including carrying out linear interpolation processing to the non-conterminous image of two frames, N frames input picture is inputted into the first maker, obtains the video image that the N frames between the non-conterminous image of two frames obscure;N frame video images are inputted into trained second maker, obtain new N frames clearly video image, and the non-conterminous image of two frames and new N frame video images connect generation video.Wherein, using complete the first depth of convolution layer building own coding convolutional network, dual training is used, obtain trained first maker, using full convolutional layer and parallel link the second depth own coding convolutional network of structure is carried out, using dual training, obtains trained second maker.The video quality that the present invention generates is good, time length.

Description

A kind of video generation method of the non-conterminous image of two frames based on deep learning
Technical field
The invention belongs to resist study and video generation field, more particularly, to it is a kind of based on deep learning two The video generation method of the non-conterminous image of frame.
Background technology
The prediction of video generation is always the problem of computer vision field, and the algorithm of traditional non-deep learning is difficult life Into the video of continuous high quality, but in fact video generation and prediction can be used among many fields, such as behavior point Analysis, intelligent monitoring, video estimation, cartoon making etc..
The basic theories of deep learning just has been proposed in the last century 80's, Yuan Lecun et al., but is used for Level of hardware at that time can not meet that it is calculated and require, so Artificial Intelligence Development is slow, but carrying with level of hardware Height, the rise of deep learning, replaces the method for the feature of engineer to be adopted extensively with the feature of convolutional neural networks study With this method overcomes the difficulty of conventional method algorithm for design artificial like that, but employs and build neutral net, passes through ladder The parameter of the optimization algorithm optimization network such as degree decline, and then network is fitted an extraordinary nonlinear function, instead of Artificial algorithm for design.
The conventional video generation method major prognostic video next frame or multiple image based on deep learning, Huo Zhedong Predict.Main is exactly to input to one frame of network or multiframe still image, using ensuing frame as prediction object, training god Go to complete from output is input to through network, mapping as from past frame to future frame is that is to say, when neural network learning arrives During the function of relatively good mapping.Trained neutral net some video frame are inputed to, neutral net can input future Frame appearance.But the video of prediction is often relatively fuzzyyer, when especially predicting the video of long sequence, foreseeable video length Degree is also very limited, can only often predict the video that several frames obscure.These difficult serious video estimations that limit are answered with what is generated With.In addition, a target is given, on the premise of unknown next target motion result, a variety of movements of this target can Can, correspond to the result of video generation has unlimited a variety of solutions.But for our mankind, when we have seen that people in video When smiling, next probability that they embrace action is very big, but for a neutral net, they do not have The so long temporal information of capable understanding and contextual information.Second difficulty is exactly to be hardly produced the preferable image of quality Sequence, most generation result is all very fuzzy, is hardly produced longer image sequence, can only do the motion analysis etc. of short time Deng, these generation is very difficult to apply in cartoon making, short-sighted frequency generation.
It can be seen from the above that the prior art exists and generates or of poor quality, the time short technical problem of prediction video.
The content of the invention
For the disadvantages described above or Improvement requirement of the prior art, the present invention provides a kind of two frames based on deep learning not The video generation method of adjacent image, thus solves the prior art and exists to generate or of poor quality, the time short skill of prediction video Art problem.
To achieve the above object, the present invention provides a kind of video generation of non-conterminous image of two frames based on deep learning Method, including:
(1) linear interpolation processing is carried out to the non-conterminous image of two frames and obtains N frame input pictures, N frames input picture is inputted Trained first maker, obtains the N frame video images between the non-conterminous image of two frames;
(2) N frame video images are inputted into trained second maker, obtains new N frame video images, and two frames are not Adjacent image and new N frame video images connect generation video;
The training of first maker includes:Using complete the first depth of convolution layer building own coding convolutional network, to One depth own coding convolutional network uses dual training, obtains trained first maker;The training of second maker Including:Using full convolutional layer and carry out parallel link the second depth own coding convolutional network of structure;Second depth own coding is rolled up Product Web vector graphic dual training, obtains trained second maker.
Further, the training of the first maker includes:
(S1) complete the first depth of convolution layer building own coding convolutional network is used, it is non-conterminous that two frames are obtained from Sample video N frame true pictures in sample image and the non-conterminous sample image of two frames;
(S2) linear interpolation processing is carried out to the non-conterminous sample image of two frames and obtains N frame samples input picture input first deeply Own coding convolutional network is spent, the first depth own coding convolutional network is trained with the minimum target of loss function, obtains N frames First training image, the first differentiation result is obtained by the first training image of N frames and N frames true picture input arbiter;
(S3) when the first differentiation result is more than threshold value, repeat step (S2), when the first differentiation result is less than or equal to threshold value When, obtain trained first maker.
Further, the training of the second maker includes:
(T1) using full convolutional layer and carry out parallel link build the second depth own coding convolutional network;
(T2) the first training image of N frames is inputted into the second depth own coding convolutional network, with the minimum target of loss function Second depth own coding convolutional network is trained, obtains the second training image of N frames, the second training image of N frames and N frames is true Real image input arbiter obtains the second differentiation result;
(T3) when the second differentiation result is more than threshold value, repeat step (T2), when the second differentiation result is less than or equal to threshold value When, obtain trained second maker.
The present invention generates continuous video using non-conterminous two field picture, instead of the side that next frame is predicted according to previous frame Method.In order to improve generation quality, the structure of twin series connection of growing up to be a useful person has been used, it is twin to grow up to be a useful person with different tasks, it may have no Same network structure, first maker are responsible for from the input frame learning that interleave obtains to motion characteristic, second maker Improve the quality of image on the basis of first maker, two makers connect to obtain the video generation of high quality as a result, And it can realize that end-to-end mode is trained.Devise new loss function:Normalization product associated loss function, used in training During improve generation result quality.
Further, every layer of convolutional layer in the first depth own coding convolutional network and the second depth own coding convolutional network One RELU nonlinear function is set afterwards.
Further, arbiter includes 6 convolutional layers and a full articulamentum, and setting gradually one after every layer of convolutional layer returns One changes operation and a RELU nonlinear function.
Further, loss function is:
Loss=λ1Ladv2Lmse3Lgdl4Lnpcl
Wherein, Loss is loss function, LadvTo resist loss function, λ1To resist the weight of loss function, LmseTo be equal Variance loss function, λ2For the weight of mean square deviation loss function, LgdlFor gradient loss function, λ3For the power of gradient loss function Weight, LnpclFor normalization product associated loss function, λ4For the weight of normalization product associated loss function.
In general, by the contemplated above technical scheme of the present invention compared with prior art, it can obtain down and show Beneficial effect:
(1) present invention uses input of non-conterminous two frame as maker, it is that video is given birth to that the second frame, which can be used as, Into bound term, therefore the dimension of solution space can be significantly reduced, make generation become to be more prone to, while use dual training It is more suitable for the generation of image.In addition it is exactly that the generation network for having used two maker cascades goes generation video, different generations Device is responsible for different tasks, and has different network structures, the quality higher of the result of two maker generations, generation Video frame it is more.
(2) for the present invention by the way of dual training, maker and arbiter form confrontation network, resist network and confrontation Training is combined the generation for being more suitable for image, is confrontation loss function respectively using four loss functions, mean square deviation loses letter Number, gradient loss function and normalization product associated loss function, generation result is punished from different aspect, make generation result and Legitimate reading has very strong similitude.
(3) present invention can generate longer video sequence compared with method before, and ensure the matter of video generation Amount.Action prediction, video compress, video generation field can be widely used in.
Brief description of the drawings
Fig. 1 is a kind of video generation method of non-conterminous image of two frames based on deep learning provided in an embodiment of the present invention Flow chart;
Fig. 2 (a) is the first analogous diagram provided in an embodiment of the present invention;
Fig. 2 (b) is second of analogous diagram provided in an embodiment of the present invention;
Fig. 2 (c) is the third analogous diagram provided in an embodiment of the present invention;
Fig. 2 (d) is the 4th kind of analogous diagram provided in an embodiment of the present invention.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below Not forming conflict each other can be mutually combined.
As shown in Figure 1, a kind of video generation method of the non-conterminous image of two frames based on deep learning, including:
(1) linear interpolation processing is carried out to the non-conterminous image of two frames and obtains N frame input pictures, N frames input picture is inputted Trained first maker, obtains the N frame video images between the non-conterminous image of two frames;
(2) N frame video images are inputted into trained second maker, obtains new N frame video images, and two frames are not Adjacent image and new N frame video images connect generation video;
The training of first maker includes:
(S1) complete the first depth of convolution layer building own coding convolutional network is used, as shown in table 1, without using pond layer and is returned One changes layer, all using convolution layer building network, and the non-thread of relu activation primitives increase network is used behind each layer Sexuality.To avoid the influence of random noise, we use a kind of network structure of own coding formula, on the one hand can increase generation The symmetry of the topological structure of network model, on the other hand can also lift the stability of overall network.
Table 1
First depth own coding convolutional network is as follows:
First layer convolutional layer, convolution kernel size 5*5, output characteristic figure quantity 64, step-length 1;
Second layer convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 128, step-length 2;
Third layer convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 128, step-length 1;
4th layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 2;
Layer 5 convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Layer 6 convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Layer 7 convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
8th layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 512, step-length 1;
9th layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 512, step-length 1;
Tenth layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Eleventh floor transposition convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 2;
Floor 12 convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
13rd layer of transposition convolutional layer, convolution kernel size 4*4, output characteristic figure quantity 64, step-length 2;
14th layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 3, step-length 1;
In the first depth own coding convolutional network, using multilayer convolutional layer, mainly for allowing maker more accurately to learn The movable information of target in video is practised, is prepared for ensuing generation.
Secondly as needing a maker and arbiter using the method for dual training, we have built a differentiation Output of the device network to maker, which is done, to be differentiated, in arbiter, there is a normalization (Batch behind each layer of convolution Normalization) operation, followed by a RELU nonlinear function, strengthen the non-thread sexuality of network, because arbiter is defeated What is gone out is the differentiation to true image and fault image, so in last layer of network, we use full articulamentum, its network structure It is as follows:
First layer convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 128, step-length 2;
Second layer convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Third layer convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 2;
4th layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Layer 5 convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 128, step-length 2;
Layer 6 convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 128, step-length 1;
The full articulamentum of layer 7, output neuron 1.
The N frames obtained from Sample video in the non-conterminous sample image of two frames and the non-conterminous sample image of two frames are truly schemed Picture;
(S2) linear interpolation processing is carried out to the non-conterminous sample image of two frames and obtains N frame samples input picture input first deeply Own coding convolutional network is spent, the first depth own coding convolutional network is trained with the minimum target of loss function, obtains N frames First training image, the first differentiation result is obtained by the first training image of N frames and N frames true picture input arbiter;
(S3) when the first differentiation result is more than threshold value, repeat step (S2), when the first differentiation result is less than or equal to threshold value When, obtain trained first maker.
The training of second maker includes:
(T1) using full convolutional layer and carry out parallel link build the second depth own coding convolutional network;As shown in table 2,
Table 2
Different from the first maker, used parallel link, will before the obtained characteristic pattern of several layers of convolutional layer convolution and after Characteristic pattern that several layers of convolution obtain and together collectively as the input of next layer of convolution, this have the advantage that network more holds The feature of easy synthetic image, plus dual training, the image of output and real image have more like structural information.
Second depth own coding convolutional network structure is as follows:
First layer convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 128, step-length 1;
Second layer convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Third layer convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 2;
4th layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Layer 5 convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 2;
Layer 6 convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Layer 7 transposition convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 2;
512 features are obtained together with 4th layer of 256 obtained characteristic pattern is cascaded to the convolutional layer that layer 7 obtains Figure, the input as the 8th layer of convolution.
8th layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 512, step-length 1;
9th layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 512, step-length 2;
768 features are obtained together with 256 characteristic patterns that the second layer obtains are cascaded to the 9th layer of obtained convolutional layer Figure, the input as the tenth layer of convolution.
Tenth layer of convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 256, step-length 1;
Eleventh floor convolutional layer, convolution kernel size 3*3, output characteristic figure quantity 3, step-length 1;
(T2) the first training image of N frames is inputted into the second depth own coding convolutional network, with the minimum target of loss function Second depth own coding convolutional network is trained, obtains the second training image of N frames, the second training image of N frames and N frames is true Real image input arbiter obtains the second differentiation result;
(T3) when the second differentiation result is more than threshold value, repeat step (T2), when the second differentiation result is less than or equal to threshold value When, obtain trained second maker.
The form for resisting loss function is as follows:
Wherein, L represents loss function (loss function), and adv is subscript, represents confrontation (adversarial), by In confrontation loss function be using cross entropy in the form of carry out, in equation right end be cross entropy formula form, wherein E expression Expectation is taken, D represents the arbiter in our methods, and G is maker, and GD forms generation confrontation network altogether.In addition, we Purpose be generation video, in order to meet training requirement, it would be desirable to which input real video frame is exactly as reference data, X Represent real video frame images (quantity is more than 2), the part of intercalary delection is generated according to two frame video frame, so, in order to protect It is consistent to demonstrate,prove input and output, has obtained the video frame of the quantity as X in the way of weighting according to two frame video framePurpose Exactly allow maker G according toGo to generate the frame similar with X, that is, complete generating process.Due to the side using deep learning The neutral net of method, GD are neutral nets, therefore can be represented with a nonlinear function, so the D in formula, G All be considered as function, what is represented inside bracket is input data, be respectively X and
The result obtained only with confrontation loss simply has certain similitude with real image in pixel distribution, But it is not necessarily similar in the structure of image, in order to ensure there is similitude on the latter, we used mean square deviation loss Enhancing output result and the similitude of true picture are lost with gradient.The form of wherein the two loss functions is as follows:
Mean square deviation loss function is two norms of the difference of two data Y, X of input:
Gradient loss function is:
It is 2, X that p and α is set in the present inventionI, jWithWhat is represented is all the image of function input, because image is by pixel Point composition, therefore matrix can be mathematically considered as, i, j are the subscript of matrix respectively, this function is mainly to the adjacent picture of image Vegetarian refreshments makes the difference, and seeks norm, and then the norm of difference is being made the difference.Intuitively understanding, when Y is as X, above-mentioned formula is 0, When different, above-mentioned formula is not 0.It is the image that we generate, that is,So it is desirable that wish as far as possibleClose and X.
Twin network of growing up to be a useful person coordinate three above loss function we can obtain it is gem-pure as a result, still in image Contrast on still have some differences, therefore, we used another to normalize product associated loss function punishment output As a result picture contrast etc..Its form is as follows:
Wherein,, the image of X expression inputs, matrix form.The line number and columns of M, N representing matrix.Normalization product is related The scope of loss function is more similar closer to 1 representative image between 0-1, in order to so that it becomes the shape of loss function Formula, we have done it operation of taking the logarithm, and have added a negative sign, so export result closer to 0, representative image is related Property is bigger, and this form more suits the form of loss function.After putting up neutral net and choosing loss function, next It is exactly to train neutral net.After training 50 epoch of neutral net, network is already had according among two field pictures generation The ability of the multiple image of missing, and the result generated has higher quality.Associated losses functional form is as follows:
Loss=λ1Ladv2Lmse3Lgdl4Lnpcl
Given two frame video images, as the input of this method depth convolution generation network, before the input can to this two Image does linear interpolation processing (sampling) and obtains ten images, according to the following formula:
(1-r)*X0+r*Xn+1
Wherein r is 10 uniform decimals between 0-1, has thus obtained ten input pictures.This ten images are made For the input of first maker, convolutional calculation is done according to convolutional layer, and exports ten new image Y ' that network calculations obtain, Y ' and the real image X together inputs as arbiter D1, and export and differentiate result y1 ∈ (0,1), y1 represents arbiter pair The evaluation of first maker generation result, the bigger generation result that represents is poorer, maker can constantly be adjusted according to y1 oneself with Generate more preferable result.In addition it is exactly input of the result of first maker as second maker, and passes through convolutional layer Convolutional calculation is done, obtains new generation result Y, then inputs of the Y with true picture X together as arbiter D2, and exports and sentences Other result y2 ∈ (0,1), y2 represents evaluation of the arbiter to second maker generation result, bigger to represent generation result more Difference, maker can constantly adjust oneself according to y2 to generate more preferable result.Then input X is replaced, constantly repeats such mistake Journey does training, until network possesses the ability that multiple true pictures are generated according to two images.At this time arbiter is not being needed Participation, it is only necessary to two maker networks can complete generation task.The step of being demonstrated according to Fig. 2, inputs to net Network two field pictures, after the calculating by two makers, network can generate 10 new video frame, and by this 12 Two field picture connects to form a video.The generation result that this method obtains is in Fig. 2 (a), Fig. 2 (b), Fig. 2 (c) and Fig. 2 (d) In done part show.And the quantity for the frame that needs generate can be controlled.We select ten images of generation, come with reference to result See, the algorithm that the present invention is studied can not only generate video frame true to nature, clear, coherent, and can generate or predict more More frames, can be widely used in the fields such as cartoon making, video generation, video interleave, video compress decompression, have extensive Application value.
In fact video generation has very big solution space, this means that neutral net is difficult in great solution space Go to find suitable solution, if in the case of without suitably constraint information is lacked, be very difficult to generate logical video sequence Row, the quality in addition generated are also very poor.The present invention proposes to be moved through among using two frames (X1, the Xk) generation with the time difference Journey image (X2 ..., Xk-1), we are constrained the solution of video generation using an image Xk parts as input, are described in Xk The motion state in target future in X1, therefore for generation task, Xk are the bound term for action generation, network it is defeated Going out can be as close as Xk.On the other hand it is a kind of resist about that we are also served as at the same time using confrontation network as training pattern Beam, it is as similar to input picture as possible using the sample for resisting network generation.In addition to solving Second Problem, we use The mode of dual training and the preferable generation quality of associated losses guarantee for taking a variety of different loss functions, and used ash Cross-correlation is spent as a kind of clarity of new loss function enhancing generation result.And it instead of conventional production network to only have The method of one maker, we use the series connection of two makers to be mainly used to as cascade maker, first maker Learn the action message of target in video by way of dual training, do not expect that generation quality is how well;Second generation Device can improve the quality of generation video on the basis of first maker.Compared with other methods, generation regards this method The very close real video of frequency, and the length of the video generated is far more than conventional method.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, all any modification, equivalent and improvement made within the spirit and principles of the invention etc., should all include Within protection scope of the present invention.

Claims (6)

  1. A kind of 1. video generation method of the non-conterminous image of two frames based on deep learning, it is characterised in that including:
    (1) linear interpolation processing is carried out to the non-conterminous image of two frames and obtains N frame input pictures, N frames input picture is inputted and is trained The first good maker, obtains the N frame video images between the non-conterminous image of two frames;
    (2) N frame video images are inputted into trained second maker, obtains new N frame video images, and two frames are non-conterminous Image and new N frame video images connect generation video;
    The training of first maker includes:It is deep to first using complete the first depth of convolution layer building own coding convolutional network Degree own coding convolutional network uses dual training, obtains trained first maker;The training of second maker includes: Using full convolutional layer and carry out parallel link the second depth own coding convolutional network of structure;To the second depth own coding convolutional network Using dual training, trained second maker is obtained.
  2. 2. a kind of video generation method of the non-conterminous image of two frames based on deep learning as claimed in claim 1, its feature It is, the training of first maker includes:
    (S1) complete the first depth of convolution layer building own coding convolutional network is used, the non-conterminous sample of two frames is obtained from Sample video N frame true pictures in image and the non-conterminous sample image of two frames;
    (S2) linear interpolation processing is carried out to the non-conterminous sample image of two frames and obtains the first depth of N frame samples input picture input certainly Convolutional network is encoded, the first depth own coding convolutional network is trained with the minimum target of loss function, obtains N frames first Training image, the first differentiation result is obtained by the first training image of N frames and N frames true picture input arbiter;
    (S3) when the first differentiation result is more than threshold value, repeat step (S2), when the first differentiation result is less than or equal to threshold value, obtains To trained first maker.
  3. 3. a kind of video generation method of the non-conterminous image of two frames based on deep learning as claimed in claim 2, its feature It is, the training of second maker includes:
    (T1) using full convolutional layer and carry out parallel link build the second depth own coding convolutional network;
    (T2) the first training image of N frames is inputted into the second depth own coding convolutional network, with the minimum target of loss function to the Two depth own coding convolutional networks are trained, and obtain the second training image of N frames, and the second training image of N frames and N frames are truly schemed As input arbiter obtains the second differentiation result;
    (T3) when the second differentiation result is more than threshold value, repeat step (T2), when the second differentiation result is less than or equal to threshold value, obtains To trained second maker.
  4. 4. a kind of video generation method of non-conterminous image of two frames based on deep learning as described in claim 1-3 is any, It is characterized in that, after every layer of convolutional layer in the first depth own coding convolutional network and the second depth own coding convolutional network One RELU nonlinear function is set.
  5. 5. a kind of video generation method of the non-conterminous image of two frames based on deep learning as claimed in claim 2 or claim 3, it is special Sign is that the arbiter includes 6 convolutional layers and a full articulamentum, and a normalization behaviour is set gradually after every layer of convolutional layer Make and a RELU nonlinear function.
  6. 6. a kind of video generation method of the non-conterminous image of two frames based on deep learning as claimed in claim 2 or claim 3, it is special Sign is that the loss function is:
    Loss=λ1Ladv2Lmse3Lgdl4Lnpcl
    Wherein, Loss is loss function, LadvTo resist loss function, λ1To resist the weight of loss function, LmseDamaged for mean square deviation Lose function, λ2For the weight of mean square deviation loss function, LgdlFor gradient loss function, λ3For the weight of gradient loss function, Lnpcl For normalization product associated loss function, λ4For the weight of normalization product associated loss function.
CN201711343243.5A 2017-12-12 2017-12-12 A kind of video generation method of the non-conterminous image of two frames based on deep learning Expired - Fee Related CN107968962B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711343243.5A CN107968962B (en) 2017-12-12 2017-12-12 A kind of video generation method of the non-conterminous image of two frames based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711343243.5A CN107968962B (en) 2017-12-12 2017-12-12 A kind of video generation method of the non-conterminous image of two frames based on deep learning

Publications (2)

Publication Number Publication Date
CN107968962A true CN107968962A (en) 2018-04-27
CN107968962B CN107968962B (en) 2019-08-09

Family

ID=61994443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711343243.5A Expired - Fee Related CN107968962B (en) 2017-12-12 2017-12-12 A kind of video generation method of the non-conterminous image of two frames based on deep learning

Country Status (1)

Country Link
CN (1) CN107968962B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108615073A (en) * 2018-04-28 2018-10-02 北京京东金融科技控股有限公司 Image processing method and device, computer readable storage medium, electronic equipment
CN108665432A (en) * 2018-05-18 2018-10-16 百年金海科技有限公司 A kind of single image to the fog method based on generation confrontation network
CN108805188A (en) * 2018-05-29 2018-11-13 徐州工程学院 A kind of feature based recalibration generates the image classification method of confrontation network
CN108875818A (en) * 2018-06-06 2018-11-23 西安交通大学 Based on variation from code machine and confrontation network integration zero sample image classification method
CN109151575A (en) * 2018-10-16 2019-01-04 Oppo广东移动通信有限公司 Multimedia data processing method and device, computer readable storage medium
CN109218629A (en) * 2018-09-14 2019-01-15 三星电子(中国)研发中心 Video generation method, storage medium and device
CN109325931A (en) * 2018-08-22 2019-02-12 中北大学 Based on the multi-modality images fusion method for generating confrontation network and super-resolution network
CN109360436A (en) * 2018-11-02 2019-02-19 Oppo广东移动通信有限公司 A kind of video generation method, terminal and storage medium
CN109492764A (en) * 2018-10-24 2019-03-19 平安科技(深圳)有限公司 Training method, relevant device and the medium of production confrontation network
CN109544652A (en) * 2018-10-18 2019-03-29 江苏大学 Add to weigh imaging method based on the nuclear magnetic resonance that depth generates confrontation neural network
CN109993820A (en) * 2019-03-29 2019-07-09 合肥工业大学 A kind of animated video automatic generation method and its device
CN110047118A (en) * 2019-04-08 2019-07-23 腾讯科技(深圳)有限公司 Video generation method, device, computer equipment and storage medium
CN110070612A (en) * 2019-04-25 2019-07-30 东北大学 A kind of CT image layer interpolation method based on generation confrontation network
CN110310351A (en) * 2019-07-04 2019-10-08 北京信息科技大学 A kind of 3 D human body skeleton cartoon automatic generation method based on sketch
CN110473147A (en) * 2018-05-09 2019-11-19 腾讯科技(深圳)有限公司 A kind of video deblurring method and device
CN110852970A (en) * 2019-11-08 2020-02-28 南京工程学院 Underwater robot image enhancement method for generating countermeasure network based on depth convolution
CN110879962A (en) * 2018-09-05 2020-03-13 斯特拉德视觉公司 Method and apparatus for optimizing CNN parameters using multiple video frames
WO2020097795A1 (en) * 2018-11-13 2020-05-22 北京比特大陆科技有限公司 Image processing method, apparatus and device, and storage medium and program product
CN111476868A (en) * 2020-04-07 2020-07-31 哈尔滨工业大学 Animation generation model training and animation generation method and device based on deep learning
CN111696049A (en) * 2020-05-07 2020-09-22 中国海洋大学 Deep learning-based underwater distorted image reconstruction method
CN112995433A (en) * 2021-02-08 2021-06-18 北京影谱科技股份有限公司 Time sequence video generation method and device, computing equipment and storage medium
CN113222964A (en) * 2021-05-27 2021-08-06 推想医疗科技股份有限公司 Method and device for generating coronary artery central line extraction model
CN113674185A (en) * 2021-07-29 2021-11-19 昆明理工大学 Weighted average image generation method based on fusion of multiple image generation technologies
CN109492764B (en) * 2018-10-24 2024-07-26 平安科技(深圳)有限公司 Training method, related equipment and medium for generating countermeasure network

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105354565A (en) * 2015-12-23 2016-02-24 北京市商汤科技开发有限公司 Full convolution network based facial feature positioning and distinguishing method and system
CN106127702A (en) * 2016-06-17 2016-11-16 兰州理工大学 A kind of image mist elimination algorithm based on degree of depth study
CN106296692A (en) * 2016-08-11 2017-01-04 深圳市未来媒体技术研究院 Image significance detection method based on antagonism network
US20170169357A1 (en) * 2015-12-15 2017-06-15 Deep Instinct Ltd. Methods and systems for data traffic analysis
CN106952239A (en) * 2017-03-28 2017-07-14 厦门幻世网络科技有限公司 image generating method and device
US20170278135A1 (en) * 2016-02-18 2017-09-28 Fitroom, Inc. Image recognition artificial intelligence system for ecommerce
CN107220600A (en) * 2017-05-17 2017-09-29 清华大学深圳研究生院 A kind of Picture Generation Method and generation confrontation network based on deep learning
CN107273936A (en) * 2017-07-07 2017-10-20 广东工业大学 A kind of GAN image processing methods and system
CN107330444A (en) * 2017-05-27 2017-11-07 苏州科技大学 A kind of image autotext mask method based on generation confrontation network
US20170337464A1 (en) * 2016-05-20 2017-11-23 Google Inc. Progressive neural networks
CN107463951A (en) * 2017-07-19 2017-12-12 清华大学 A kind of method and device for improving deep learning model robustness

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170169357A1 (en) * 2015-12-15 2017-06-15 Deep Instinct Ltd. Methods and systems for data traffic analysis
CN105354565A (en) * 2015-12-23 2016-02-24 北京市商汤科技开发有限公司 Full convolution network based facial feature positioning and distinguishing method and system
US20170278135A1 (en) * 2016-02-18 2017-09-28 Fitroom, Inc. Image recognition artificial intelligence system for ecommerce
US20170337464A1 (en) * 2016-05-20 2017-11-23 Google Inc. Progressive neural networks
CN106127702A (en) * 2016-06-17 2016-11-16 兰州理工大学 A kind of image mist elimination algorithm based on degree of depth study
CN106296692A (en) * 2016-08-11 2017-01-04 深圳市未来媒体技术研究院 Image significance detection method based on antagonism network
CN106952239A (en) * 2017-03-28 2017-07-14 厦门幻世网络科技有限公司 image generating method and device
CN107220600A (en) * 2017-05-17 2017-09-29 清华大学深圳研究生院 A kind of Picture Generation Method and generation confrontation network based on deep learning
CN107330444A (en) * 2017-05-27 2017-11-07 苏州科技大学 A kind of image autotext mask method based on generation confrontation network
CN107273936A (en) * 2017-07-07 2017-10-20 广东工业大学 A kind of GAN image processing methods and system
CN107463951A (en) * 2017-07-19 2017-12-12 清华大学 A kind of method and device for improving deep learning model robustness

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108615073A (en) * 2018-04-28 2018-10-02 北京京东金融科技控股有限公司 Image processing method and device, computer readable storage medium, electronic equipment
CN110473147A (en) * 2018-05-09 2019-11-19 腾讯科技(深圳)有限公司 A kind of video deblurring method and device
CN108665432A (en) * 2018-05-18 2018-10-16 百年金海科技有限公司 A kind of single image to the fog method based on generation confrontation network
CN108805188A (en) * 2018-05-29 2018-11-13 徐州工程学院 A kind of feature based recalibration generates the image classification method of confrontation network
CN108805188B (en) * 2018-05-29 2020-08-21 徐州工程学院 Image classification method for generating countermeasure network based on feature recalibration
CN108875818A (en) * 2018-06-06 2018-11-23 西安交通大学 Based on variation from code machine and confrontation network integration zero sample image classification method
CN109325931A (en) * 2018-08-22 2019-02-12 中北大学 Based on the multi-modality images fusion method for generating confrontation network and super-resolution network
CN110879962B (en) * 2018-09-05 2023-09-22 斯特拉德视觉公司 Method and device for optimizing CNN parameters by utilizing multiple video frames
CN110879962A (en) * 2018-09-05 2020-03-13 斯特拉德视觉公司 Method and apparatus for optimizing CNN parameters using multiple video frames
CN109218629A (en) * 2018-09-14 2019-01-15 三星电子(中国)研发中心 Video generation method, storage medium and device
CN109218629B (en) * 2018-09-14 2021-02-05 三星电子(中国)研发中心 Video generation method, storage medium and device
CN109151575B (en) * 2018-10-16 2021-12-14 Oppo广东移动通信有限公司 Multimedia data processing method and device and computer readable storage medium
CN109151575A (en) * 2018-10-16 2019-01-04 Oppo广东移动通信有限公司 Multimedia data processing method and device, computer readable storage medium
CN109544652A (en) * 2018-10-18 2019-03-29 江苏大学 Add to weigh imaging method based on the nuclear magnetic resonance that depth generates confrontation neural network
CN109492764A (en) * 2018-10-24 2019-03-19 平安科技(深圳)有限公司 Training method, relevant device and the medium of production confrontation network
CN109492764B (en) * 2018-10-24 2024-07-26 平安科技(深圳)有限公司 Training method, related equipment and medium for generating countermeasure network
WO2020082572A1 (en) * 2018-10-24 2020-04-30 平安科技(深圳)有限公司 Training method of generative adversarial network, related device, and medium
CN109360436A (en) * 2018-11-02 2019-02-19 Oppo广东移动通信有限公司 A kind of video generation method, terminal and storage medium
WO2020097795A1 (en) * 2018-11-13 2020-05-22 北京比特大陆科技有限公司 Image processing method, apparatus and device, and storage medium and program product
CN109993820A (en) * 2019-03-29 2019-07-09 合肥工业大学 A kind of animated video automatic generation method and its device
CN109993820B (en) * 2019-03-29 2022-09-13 合肥工业大学 Automatic animation video generation method and device
CN110047118A (en) * 2019-04-08 2019-07-23 腾讯科技(深圳)有限公司 Video generation method, device, computer equipment and storage medium
CN110047118B (en) * 2019-04-08 2023-06-27 腾讯科技(深圳)有限公司 Video generation method, device, computer equipment and storage medium
CN110070612A (en) * 2019-04-25 2019-07-30 东北大学 A kind of CT image layer interpolation method based on generation confrontation network
CN110070612B (en) * 2019-04-25 2023-09-22 东北大学 CT image interlayer interpolation method based on generation countermeasure network
CN110310351B (en) * 2019-07-04 2023-07-21 北京信息科技大学 Sketch-based three-dimensional human skeleton animation automatic generation method
CN110310351A (en) * 2019-07-04 2019-10-08 北京信息科技大学 A kind of 3 D human body skeleton cartoon automatic generation method based on sketch
CN110852970A (en) * 2019-11-08 2020-02-28 南京工程学院 Underwater robot image enhancement method for generating countermeasure network based on depth convolution
CN111476868B (en) * 2020-04-07 2023-06-23 哈尔滨工业大学 Animation generation model training and animation generation method and device based on deep learning
CN111476868A (en) * 2020-04-07 2020-07-31 哈尔滨工业大学 Animation generation model training and animation generation method and device based on deep learning
CN111696049A (en) * 2020-05-07 2020-09-22 中国海洋大学 Deep learning-based underwater distorted image reconstruction method
CN112995433A (en) * 2021-02-08 2021-06-18 北京影谱科技股份有限公司 Time sequence video generation method and device, computing equipment and storage medium
CN113222964A (en) * 2021-05-27 2021-08-06 推想医疗科技股份有限公司 Method and device for generating coronary artery central line extraction model
CN113674185A (en) * 2021-07-29 2021-11-19 昆明理工大学 Weighted average image generation method based on fusion of multiple image generation technologies
CN113674185B (en) * 2021-07-29 2023-12-08 昆明理工大学 Weighted average image generation method based on fusion of multiple image generation technologies

Also Published As

Publication number Publication date
CN107968962B (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN107968962A (en) A kind of video generation method of the non-conterminous image of two frames based on deep learning
CN110378844A (en) Motion blur method is gone based on the multiple dimensioned Image Blind for generating confrontation network is recycled
CN111105352B (en) Super-resolution image reconstruction method, system, computer equipment and storage medium
CN110097178A (en) It is a kind of paid attention to based on entropy neural network model compression and accelerated method
CN109741247A (en) A kind of portrait-cartoon generation method neural network based
CN102156875A (en) Image super-resolution reconstruction method based on multitask KSVD (K singular value decomposition) dictionary learning
CN108711182A (en) Render processing method, device and mobile terminal device
CN112837224A (en) Super-resolution image reconstruction method based on convolutional neural network
CN109146061A (en) The treating method and apparatus of neural network model
CN115880158B (en) Blind image super-resolution reconstruction method and system based on variation self-coding
CN115471423A (en) Point cloud denoising method based on generation countermeasure network and self-attention mechanism
CN109922346A (en) A kind of convolutional neural networks for the reconstruct of compressed sensing picture signal
CN110533579A (en) Based on the video style conversion method from coding structure and gradient order-preserving
Wu et al. Ganhead: Towards generative animatable neural head avatars
CN109658508B (en) Multi-scale detail fusion terrain synthesis method
CN112380764B (en) Gas scene end-to-end rapid reconstruction method under limited view
Shariff et al. Artificial (or) fake human face generator using generative adversarial network (GAN) machine learning model
CN113096015B (en) Image super-resolution reconstruction method based on progressive perception and ultra-lightweight network
CN110223224A (en) A kind of Image Super-resolution realization algorithm based on information filtering network
CN108769674A (en) A kind of video estimation method based on adaptive stratification motion modeling
CN108470208A (en) It is a kind of based on be originally generated confrontation network model grouping convolution method
CN113129237B (en) Depth image deblurring method based on multi-scale fusion coding network
CN110009568A (en) The generator construction method of language of the Manchus image super-resolution rebuilding
Zhang et al. Research on image super-resolution reconstruction based on deep learning
Zou et al. EDCNN: a novel network for image denoising

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190809

Termination date: 20191212