CN110366029A - Method, system and the electronic equipment of picture frame are inserted between a kind of video - Google Patents

Method, system and the electronic equipment of picture frame are inserted between a kind of video Download PDF

Info

Publication number
CN110366029A
CN110366029A CN201910600097.2A CN201910600097A CN110366029A CN 110366029 A CN110366029 A CN 110366029A CN 201910600097 A CN201910600097 A CN 201910600097A CN 110366029 A CN110366029 A CN 110366029A
Authority
CN
China
Prior art keywords
video
network
image
human body
lstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910600097.2A
Other languages
Chinese (zh)
Other versions
CN110366029B (en
Inventor
张昱航
任宏帅
叶可江
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201910600097.2A priority Critical patent/CN110366029B/en
Publication of CN110366029A publication Critical patent/CN110366029A/en
Application granted granted Critical
Publication of CN110366029B publication Critical patent/CN110366029B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/647Control signaling between network components and server or clients; Network processes for video distribution between server and clients, e.g. controlling the quality of the video stream, by dropping packets, protecting content from unauthorised alteration within the network, monitoring of network load, bridging between two different networks, e.g. between IP and wireless
    • H04N21/64784Data processing by the network
    • H04N21/64792Controlling the complexity of the content stream, e.g. by dropping packets

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

This application involves method, system and electronic equipments that picture frame is inserted between a kind of video.Include: respectively before the video-losing time and video restoration time after respectively select m frame include pedestrian characteristic pattern, and from every characteristic pattern respectively acquisition setting quantity human body attitude point;All human body attitude points are inputted into Alex Net network, the Alex Net network is predicted using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image;The corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, obtains pedestrian's attitude prediction result of parked image;Parked image is obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, and calculates the insertion position of the parked image in video, the parked image is inserted into video on corresponding position.The application improves the precision of existing algorithm.

Description

Method, system and the electronic equipment of picture frame are inserted between a kind of video
Technical field
The application belongs to frame insertion technical field between video, in particular to is inserted into the method for picture frame between a kind of video, is System and electronic equipment.
Background technique
Along with the application of more and more amusement short-sighted frequencies of grade, there is a large amount of short transmission of video in China at present With the demand of broadcasting, but the problem of bringing at the same time, is: since a variety of causes such as network transmission will cause transmission of video mistake Frame losing in journey will result in the discontinuous feeling of people's vision of viewing video under such circumstances, not only influence viewing experience, also Being likely due to loss key message causes video usable value not high.
The prior art is in the case where facing lost frames, often by improving the quality of transmission process so that video The upload or downloading of not frame losing as far as possible, but the case where missing video frames caused by facing already is also helpless.In recent years The work of the top research achievement of latest academic circle come in this respect is quite few, and being primarily due to this is the new need that industry expedites the emergence of It asks, current related work is fewer and fewer, several frames predicted below are only now restored based on past video frame in the art, But almost seldom it is related to restoring a few frame videos of loss in continuous videos.Document [Walker J, Marino K, Gupta A,et al.The pose knows:Video forecasting by generating pose futures[C]// Proceedings of the IEEE International Conference on Computer Vision.2017: 3332-3341.] in show prediction the possible behavior of next frame on, VAE method [Kingma D P, Welling can be used M.Auto-encoding variational bayes [J] .arXiv preprint arXiv:1312.6114,2013.] into The attitude prediction of pedestrian, but these attitude predictions are not related to utilize these only for some predictions of future frame Prediction is by two associated videos but intermediate frame number has the picture completion of missing.
Summary of the invention
This application provides method, system and electronic equipments that picture frame is inserted between a kind of video, it is intended at least certain One of above-mentioned technical problem in the prior art is solved in degree.
To solve the above-mentioned problems, this application provides following technical solutions:
The method of picture frame is inserted between a kind of video, comprising the following steps:
Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the feature of pedestrian Figure, and the human body attitude point of acquisition setting quantity respectively from every characteristic pattern;
Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network utilizes cubic polynomial Pedestrian's posture that the method that fitting is combined with cubic spline interpolation treats restored image is predicted;
Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, is obtained to multiple Pedestrian's attitude prediction result of original image;
Step d: obtaining parked image according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, And the insertion position of the parked image in video is calculated, the parked image is inserted into corresponding position in video On.
The technical solution that the embodiment of the present application is taken further include: assuming that the human body attitude point acquired in every characteristic pattern is 17 A, in the step b, pedestrian's attitude prediction method of the Alex Net network is specifically included:
Step b1: it by 17 human body attitudes o'clock as 17 ID, is utilized respectively cubic polynomial approximating method and determines one Regression curve;The corresponding human body attitude point of each ID has a coordinate in the picture, indicates are as follows: locationID= (xi,yi), i=ID, a series of (x that each ID is formedi,yi) cubic polynomial is substituted into, it obtains:
Y=ax3+bx2+cx+d
17 groups of (a are obtained by computer fittingi,bi,ci,di), and form 17 cubic polynomials and be fitted identified y Y=f (x) is plotted on an image by=f (x), generates 17 curves, and the transverse and longitudinal coordinate in curve is each human body respectively The position of posture point indicates;
Step b2: by cubic spline interpolation, the image coordinate point between two frames before and after the video-losing time is restored Out, pedestrian's attitude prediction result of parked image is obtained.
The technical solution that the embodiment of the present application is taken further include: in the step c, the input structure of the LSTM network Are as follows:
[ht,ct]=LSTM (pt,ht-1,ct-1)
The then attitude prediction of next frame parked image are as follows:
In above-mentioned formula, WTIndicate the weight that neural metwork training goes out, ht,ctFor the intrinsic parameter of LSTM structure, LSTM network Loss function be expressed as object2=Loss (LSTM).
The technical solution that the embodiment of the present application is taken further include: after the step c further include: objective function, according to The objective function optimizes Alexnet network and LSTM network:
objectfinal=object1+object2+|object1-object2|
In above-mentioned formula, | object1-object2| indicate the pedestrian's posture for allowing Alex Net network and LSTM network to generate Prediction result is close as far as possible.
The technical solution that the embodiment of the present application is taken further include: described to be inserted into parked image in the step d Specifically include on corresponding position in video: optimized Alex Net network and LSTM network respectively obtain two groups of phases It all include 17 human body attitude points in each frame parked image, respectively by each frame parked with the parked image of frame number 17 human body attitude points in image are corresponded to each other with its ID, and seek the (x being passed to from Alex Net networki,yi) with from LSTM net What network was passed toAverage value, obtain the insertion position of each frame parked image, and by all parked images It is inserted at corresponding position;The position calculation formula are as follows:
Another technical solution that the embodiment of the present application is taken are as follows: the system of picture frame is inserted between a kind of video, comprising:
Characteristic pattern selecting module: for selecting before the video-losing time and respectively m frame after video restoration time respectively Characteristic pattern comprising pedestrian;
Posture point acquisition module: for acquisition to set the human body attitude point of quantity respectively from every characteristic pattern;
Alex Net neural network forecast module: for all human body attitude points to be inputted Alex Net network, the Alex Net network is carried out using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image Prediction;
LSTM neural network forecast module: for inputting the corresponding human body attitude point of characteristic pattern before the video-losing time LSTM network obtains pedestrian's attitude prediction result of parked image;
Image is inserted into module: for being obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network Parked image, and the insertion position of the parked image in video is calculated, the parked image is inserted into video In on corresponding position.
The technical solution that the embodiment of the present application is taken further include: assuming that the human body attitude point acquired in every characteristic pattern is 17 It is a, pedestrian's attitude prediction method of the Alex Net neural network forecast module specifically:
By 17 human body attitudes o'clock as 17 ID, it is utilized respectively cubic polynomial approximating method and determines that a recurrence is bent Line;The corresponding human body attitude point of each ID has a coordinate in the picture, indicates are as follows: locationID=(xi,yi),i =ID, a series of (x that each ID is formedi,yi) cubic polynomial is substituted into, it obtains:
Y=ax3+bx2+cx+d
17 groups of (a are obtained by computer fittingi,bi,ci,di), and form 17 cubic polynomials and be fitted identified y Y=f (x) is plotted on an image by=f (x), generates 17 curves, and the transverse and longitudinal coordinate in curve is each human body respectively The position of posture point indicates;
By cubic spline interpolation, the image coordinate point between two frames before and after the video-losing time is restored, is obtained To pedestrian's attitude prediction result of parked image.
The technical solution that the embodiment of the present application is taken further include: the input structure of the LSTM network are as follows:
[ht,ct]=LSTM (pt,ht-1,ct-1)
The then attitude prediction of next frame parked image are as follows:
In above-mentioned formula, WTIndicate the weight that neural metwork training goes out, ht,ctFor the intrinsic parameter of LSTM structure, LSTM network Loss function be expressed as object2=Loss (LSTM).
The technical solution that the embodiment of the present application is taken further includes network optimization module, and the network optimization module is for defining Objective function optimizes Alexnet network and LSTM network according to the objective function:
objectfinal=object1+object2+|object1-object2|
In above-mentioned formula, | object1-object2| indicate the pedestrian's posture for allowing Alex Net network and LSTM network to generate Prediction result is close as far as possible.
The technical solution that the embodiment of the present application is taken further include: described image is inserted into module and parked image is inserted into view Specifically included on corresponding position in frequency: optimized Alex Net network and LSTM network, respectively obtain two groups it is identical The parked image of frame number all includes 17 human body attitude points in each frame parked image, respectively by each frame parked figure 17 human body attitude points as in are corresponded to each other with its ID, and seek the (x being passed to from Alex Net networki,yi) with from LSTM network IncomingAverage value, obtain the insertion position of each frame parked image, and all parked images are inserted Enter at corresponding position;The position calculation formula are as follows:
The another technical solution that the embodiment of the present application is taken are as follows: a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by described at least one It manages device to execute, so that at least one described processor is able to carry out the following behaviour for being inserted into the method for picture frame between above-mentioned video Make:
Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the feature of pedestrian Figure, and the human body attitude point of acquisition setting quantity respectively from every characteristic pattern;
Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network utilizes cubic polynomial Pedestrian's posture that the method that fitting is combined with cubic spline interpolation treats restored image is predicted;
Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, is obtained to multiple Pedestrian's attitude prediction result of original image;
Step d: obtaining parked image according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, And the insertion position of the parked image in video is calculated, the parked image is inserted into corresponding position in video On.
Compared with the existing technology, the beneficial effect that the embodiment of the present application generates is: the video of the embodiment of the present application interleaves Enter the method, system and electronic equipment of picture frame by using Alex Net network integration LSTM network while the final phase again of prediction The scheme mutually promoted is predicted using cubic spline interpolation based on the video frame of front and back, will not be based on interruption to LSTM The case where deficiency that video afterwards learns known sample again is supplemented, and single LSTM forecasting inaccuracy is efficiently solved, Preferable viewing experience is brought to video viewers.Meanwhile the dual network of the application effectively improves the precision of existing algorithm, it can Expansion is high, can also be by replacement convolutional neural networks to complete increasingly complex prediction task.
Detailed description of the invention
Fig. 1 be the embodiment of the present application video between be inserted into picture frame method flow chart;
Fig. 2 be the embodiment of the present application video between be inserted into picture frame system structural schematic diagram;
Fig. 3 is the hardware device structural schematic diagram that the method for picture frame is inserted between video provided by the embodiments of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the application, not For limiting the application.
For technical problem of the existing technology, present applicant proposes it is a kind of using shot and long term memory network (LSTM) into The method of frame insertion in row memory prediction and video, this method become by convolutional neural networks, by frame insertion process Habit process, so that network constantly learns relevant parameter weight, insertion is automatic raw in the continuous but intermediate even video for having interruption At picture frame to complete it is continuous on video visual, give viewer provide a better viewing experience.
The application pertain only in the case where detecting human body attitude (human body attitude detection method referring to [Fang H S, Xie S,Tai Y W,et al.Rmpe:Regional multi-person pose estimation[C]// Proceedings of the IEEE International Conference on Computer Vision.2017: 2334-2343.]) it carries out the prediction of target and completes insertion process.Specifically, referring to Fig. 1, being the view of the embodiment of the present application The flow chart of the method for picture frame is inserted between frequency.It includes following step that the method for picture frame is inserted between the video of the embodiment of the present application It is rapid:
Step 100: respectively before the video-losing time and video restoration time after respectively select m frame as generate wait answer The characteristic pattern of original image;
In step 100, it is assumed that video is lost at time t=p, restores at time t=q, and chronomere is the second (s), The video-losing time is then miss=| p-q |, since each second generally takes 24 frames in smooth video, parked image it is total Frame number is Total Frames=miss*24.It is selected before video-losing time t=p and respectively after video restoration time t=q M frame is selected, then input is Input Frames=2*m.
Step 200: acquisition sets the human body attitude point of quantity respectively from every characteristic pattern;
In step 200, the application selectes 17 people's body nodes as human body attitude point, (can manage shown in table 1 specific as follows Solution, the quantity of human body attitude point and position can be set according to practical application):
1 human body node specification of table
Step 300: collected human body attitude point is inputted into Alex Net network [Krizhevsky A, Sutskever I,Hinton G E.Imagenet classification with deep convolutional neural networks [C] //Advances in neural information processing systems.2012:1097-1105.], Alex Net network is carried out using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image Prediction;
In step 300, since every characteristic pattern respectively includes 17 human body attitude points, then just there is 2*17*m point to make in total For the input of Alex Net network.Pedestrian's attitude prediction method of Alex Net network specifically includes:
Step 301: firstly, 17 human body attitudes o'clock are utilized respectively three times as 17 ID (ID=1,2,3 ... 17) Polynomial fitting method determines a regression curve;Wherein, the corresponding human body attitude point of each ID has one in the picture A coordinate indicates are as follows: locationID=(xi,yi), i=ID, a series of (x that each ID is formedi,yi) substitute into it is more three times Item formula, obtains:
Y=ax3+bx2+cx+d (1)
In above-mentioned formula, a ≠ 0, b, c, d are constant.By computer fitting (MATLAB), so that it may obtain 17 groups of (ai, bi,ci,di), it ultimately forms 17 cubic polynomials and is fitted identified y=f (x).Y=f (x) is plotted on an image, To generate 17 curves, the transverse and longitudinal coordinate in curve is that respective positions indicate respectively.
Step 302: by cubic spline interpolation, the image coordinate point between two frame position of t=p, t=q will be interrupted It restores, obtains pedestrian's attitude prediction result of parked image;Wherein, cubic spline interpolation is as follows:
Share 17 nodes: x0<x1<…<xn-1<xn(n=17)
Functional value: yi=f (xi)
S (x) meets s (xi)=yi(i=0,1,2 ..., 16)
In formula (3), skIt (x) is [xk-1,xk], (k=1,2 ..., 17);Have simultaneously: sk(xk-1)=yk-1,sk(xk)= yk
It can be obtained according to the boundary condition of Interpolatory Splines method:
So can obtain:
Each skIt (x) is the cubic polynomial determined in (1), there are four undetermined coefficient ai,bi,ci,di.So shared 4n coefficient, but be derived by 2n+2 (n-1)=4n-2 equation at present, is also not enough to solve, be added thus initially and Boundary condition constructs 4n equation.
The two equations are respectively as follows:
s′(x0)=f '0 (5)
s′(xn)=f 'n (6)
Above formula is solved by the chasing method that matrix calculates, in this way, the curve of parked image can be obtained, amplifies xpAnd xq The curve at place, stochastical sampling go out the point of Total Frames number, are similarly operated to each ID point, can be obtained every As soon as 17 sample points of frame parked image, this 17 sample points constitute pedestrian's attitude prediction of each frame parked image As a result.
In convolutional neural networks training, the application indicates the experimental error of convolutional neural networks using mean square deviation, such as Shown in following formula:
In formula (7),It indicates true coordinate in training set, passes through The continuous training of Alexnet adjusts suitable Interpolatory Splines method coefficient to reach good prediction effect.
Step 400: the human body attitude point based on the characteristic pattern acquisition before video-losing time t=p, it is (long using LSTM Short-term memory network) obtain pedestrian's attitude prediction result of parked image;
In step 400, the input structure of LSTM are as follows:
[ht,ct]=LSTM (pt,ht-1,ct-1) (8)
The then attitude prediction of next frame parked image are as follows:
In formula (8) and (9), WTIndicate the weight that neural metwork training goes out, ht,ctFor the intrinsic parameter of LSTM structure.LSTM The loss function of network can be represented simply as object2=Loss (LSTM).
Step 500: objective function optimizes Alexnet network and LSTM network according to the objective function;
In step 500, the row of corresponding Total Frames number is generated respectively by Alex Net network and LSTM network People's attitude prediction as a result, pedestrian's attitude prediction that Alex Net network generates the result is that based on before the video-losing time and video Data distribution after recovery time, and pedestrian's attitude prediction for being generated in LSTM network the result is that based on the video-losing time it Preceding feature utilizes mutually, the objective function that the application is defined as follows is to Alex in order to allow the two networks to learn from other's strong points to offset one's weaknesses Net network and LSTM network optimize:
objectfinal=object1+object2+|object1-object2| (10)
In formula (10), | object1-object2| show the pedestrian's posture for allowing Alex Net network and LSTM network to generate Prediction result is close as far as possible, could finally generate relatively good restoration result.
Step 600: according to after optimization Alex Net network and LSTM network to export final parked image respectively pre- It surveys as a result, and parked image is inserted into video on corresponding position;
In step 600, by the optimization of Alex Net network and LSTM network, having respectively obtained two groups of frame numbers is Total The parked image of Frames all includes 17 human body attitude points in each frame, respectively by 17 human body attitudes in each frame Point is corresponded to each other with its ID, and seeks the (x being passed to from Alex Net networki,yi) and be passed to from LSTM network's Average value obtains the insertion position of each frame parked image, and all parked images is inserted at corresponding position, complete At the insertion for losing picture frame in entire video.Wherein, position calculation formula is as follows:
Referring to Fig. 2, be the embodiment of the present application video between be inserted into picture frame system structural schematic diagram.The application is real Applying the system that picture frame is inserted between the video of example includes that characteristic pattern selecting module, posture point acquisition module, Alex Net network are pre- It surveys module, LSTM neural network forecast module, network optimization module and image and is inserted into module.
Characteristic pattern selecting module: to select before the video-losing time and respectively m frame after video restoration time respectively As the characteristic pattern for generating parked image;In the embodiment of the present application, it is assumed that video is lost at time t=p, in time t=q Place restores, and chronomere is the second (s), and the video-losing time is then miss=| p-q |, since each second generally takes in smooth video 24 frames, therefore the totalframes of parked image is Total Frames=miss*24.Before video-losing time t=p and regard Each selection m frame after frequency recovery time t=q, then input is Input Frames=2*m.
Posture point acquisition module: for acquisition to set the human body attitude point of quantity respectively from every characteristic pattern;Wherein, originally Apply for selected 17 people's body nodes as human body attitude point, (it is appreciated that the quantity of human body attitude point shown in table 1 specific as follows Can be set according to practical application with position):
1 human body node specification of table
Alex Net neural network forecast module: collected human body attitude point is inputted Alex Net network, Alex Net network is carried out using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image Prediction;Wherein, since every characteristic pattern respectively includes 17 human body attitude points, then just there is 2*17*m point as Alex in total The input of Net network.Alex Net neural network forecast module specifically includes:
Cubic polynomial fitting unit: for by 17 human body attitudes o'clock as 17 ID (ID=1,2,3 ... 17) points A regression curve is not determined using cubic polynomial approximating method;Wherein, the corresponding human body attitude point of each ID is in image In all have a coordinate, indicate are as follows: locationID=(xi,yi), i=ID, a series of (x that each ID is formedi,yi) Cubic polynomial is substituted into, is obtained:
Y=ax3+bx2+cx+d (1)
By computer fitting (MATLAB), so that it may obtain 17 groups of (ai,bi,ci,di), ultimately form 17 it is more three times Item formula is fitted identified y=f (x).Y=f (x) is plotted on an image, so that 17 curves are generated, the cross in curve Ordinate is that respective positions indicate respectively.
Cubic spline interpolation unit: for will interrupt between two frame position of t=p, t=q by cubic spline interpolation Image coordinate point restore, obtain pedestrian's attitude prediction result of parked image;Wherein, cubic spline interpolation is as follows It is shown:
Share 17 nodes: x0<x1<…<xn-1<xn(n=17)
Functional value: yi=f (xi)
S (x) meets s (xi)=yi(i=0,1,2 ..., 16)
In formula (3), skIt (x) is [xk-1,xk], (k=1,2 ..., 17);Have simultaneously: sk(xk-1)=yk-1,sk(xk)= yk
It can be obtained according to the boundary condition of Interpolatory Splines method:
So can obtain:
Each skIt (x) is the cubic polynomial determined in (1), there are four undetermined coefficient ai,bi,ci,di.So shared 4n coefficient, but be derived by 2n+2 (n-1)=4n-2 equation at present, is also not enough to solve, be added thus initially and Boundary condition constructs 4n equation.
The two equations are respectively as follows:
s′(x0)=f '0 (5)
s′(xn)=f 'n (6)
Above formula is solved by the chasing method that matrix calculates, in this way, the curve of parked image can be obtained, amplifies xpAnd xq The curve at place, stochastical sampling go out the point of Total Frames number, are similarly operated to each ID point, can be obtained every As soon as 17 sample points of frame parked image, this 17 sample points constitute pedestrian's attitude prediction of each frame parked image As a result.
In convolutional neural networks training, the application indicates the experimental error of convolutional neural networks using mean square deviation, such as Shown in following formula:
In formula (7),It indicates true coordinate in training set, passes through Alexnet's Constantly training adjusts suitable Interpolatory Splines method coefficient to reach good prediction effect.
LSTM neural network forecast module: for the human body attitude based on the characteristic pattern acquisition before video-losing time t=p Point obtains pedestrian's attitude prediction result of parked image using LSTM (shot and long term memory network);Wherein, LSTM network is defeated Enter structure are as follows:
[ht,ct]=LSTM (pt,ht-1,ct-1) (8)
The then attitude prediction of next frame parked image are as follows:
In formula (8) and (9), WTIndicate the weight that neural metwork training goes out, ht,ctFor the intrinsic parameter of LSTM structure.LSTM The loss function of network can be represented simply as object2=Loss (LSTM).
Network optimization module: being used for objective function, according to the objective function to Alexnet network and LSTM network into Row optimization;Wherein, pedestrian's posture of corresponding Total Frames number is generated respectively by Alex Net network and LSTM network Prediction result, pedestrian's attitude prediction that Alex Net network generates is the result is that be based on before the video-losing time and when video restoration Between after data distribution, and the pedestrian's attitude prediction generated in LSTM network is the result is that based on the spy before the video-losing time Sign utilizes mutually, the objective function that the application is defined as follows is to Alex Net network in order to allow the two networks to learn from other's strong points to offset one's weaknesses It is optimized with LSTM network:
objectfinal=object1+object2+|object1-object2| (10)
In formula (10), | object1-object2| show the pedestrian's posture for allowing Alex Net network and LSTM network to generate Prediction result is close as far as possible, could finally generate relatively good restoration result.
Image is inserted into module: for according to after optimization Alex Net network and LSTM network export respectively it is final wait answer Original image prediction result, and parked image is inserted into video on corresponding position;By Alex Net network and LSTM The optimization of network has respectively obtained the parked image that two groups of frame numbers are Total Frames, all includes 17 people in each frame Body posture point respectively corresponds to each other 17 human body attitude points in each frame with its ID, and asks incoming from Alex Net network (xi,yi) and be passed to from LSTM networkAverage value, obtain the insertion position of each frame parked image, And all parked images are inserted at corresponding position, complete the insertion that picture frame is lost in entire video.Wherein, position Calculation formula is as follows:
Fig. 3 is the hardware device structural schematic diagram that the method for picture frame is inserted between video provided by the embodiments of the present application.Such as Shown in Fig. 3, which includes one or more processors and memory.It takes a processor as an example, which can also wrap It includes: input system and output system.
Processor, memory, input system and output system can be connected by bus or other modes, in Fig. 3 with For being connected by bus.
Memory as a kind of non-transient computer readable storage medium, can be used for storing non-transient software program, it is non-temporarily State computer executable program and module.Processor passes through operation non-transient software program stored in memory, instruction And module realizes the place of above method embodiment thereby executing the various function application and data processing of electronic equipment Reason method.
Memory may include storing program area and storage data area, wherein storing program area can storage program area, extremely Application program required for a few function;It storage data area can storing data etc..In addition, memory may include that high speed is random Memory is accessed, can also include non-transient memory, a for example, at least disk memory, flush memory device or other are non- Transient state solid-state memory.In some embodiments, it includes the memory remotely located relative to processor that memory is optional, this A little remote memories can pass through network connection to processing system.The example of above-mentioned network includes but is not limited to internet, enterprise Intranet, local area network, mobile radio communication and combinations thereof.
Input system can receive the number or character information of input, and generate signal input.Output system may include showing Display screen etc. shows equipment.
One or more of module storages in the memory, are executed when by one or more of processors When, execute the following operation of any of the above-described embodiment of the method:
Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the feature of pedestrian Figure, and the human body attitude point of acquisition setting quantity respectively from every characteristic pattern;
Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network utilizes cubic polynomial Pedestrian's posture that the method that fitting is combined with cubic spline interpolation treats restored image is predicted;
Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, is obtained to multiple Pedestrian's attitude prediction result of original image;
Step d: obtaining parked image according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, And the insertion position of the parked image in video is calculated, the parked image is inserted into corresponding position in video On.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiments of the present application.
The embodiment of the present application provides a kind of non-transient (non-volatile) computer storage medium, and the computer storage is situated between Matter is stored with computer executable instructions, the executable following operation of the computer executable instructions:
Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the feature of pedestrian Figure, and the human body attitude point of acquisition setting quantity respectively from every characteristic pattern;
Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network utilizes cubic polynomial Pedestrian's posture that the method that fitting is combined with cubic spline interpolation treats restored image is predicted;
Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, is obtained to multiple Pedestrian's attitude prediction result of original image;
Step d: obtaining parked image according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, And the insertion position of the parked image in video is calculated, the parked image is inserted into corresponding position in video On.
The embodiment of the present application provides a kind of computer program product, and the computer program product is non-temporary including being stored in Computer program on state computer readable storage medium, the computer program include program instruction, when described program instructs When being computer-executed, the computer is made to execute following operation:
Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the feature of pedestrian Figure, and the human body attitude point of acquisition setting quantity respectively from every characteristic pattern;
Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network utilizes cubic polynomial Pedestrian's posture that the method that fitting is combined with cubic spline interpolation treats restored image is predicted;
Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, is obtained to multiple Pedestrian's attitude prediction result of original image;
Step d: obtaining parked image according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, And the insertion position of the parked image in video is calculated, the parked image is inserted into corresponding position in video On.
Method, system and the electronic equipment of picture frame are inserted between the video of the embodiment of the present application by using Alex Net net Network combination LSTM network predicts the scheme finally mutually promoted again, the video frame using cubic spline interpolation based on front and back simultaneously It is predicted, the LSTM deficiency that will not learn again known sample based on the video after interruption is supplemented, effectively solved The case where LSTM forecasting inaccuracy for having determined single, brings preferable viewing experience to video viewers.Meanwhile double nets of the application Network effectively improves the precision of existing algorithm, and expansibility is high, can also be by replacement convolutional neural networks to complete more Complicated prediction task.
The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, defined herein General Principle can realize in other embodiments without departing from the spirit or scope of the application.Therefore, this Shen These embodiments shown in the application please be not intended to be limited to, and are to fit to special with principle disclosed in the present application and novelty The consistent widest scope of point.

Claims (11)

1. being inserted into the method for picture frame between a kind of video, which comprises the following steps:
Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the characteristic pattern of pedestrian, And acquisition sets the human body attitude point of quantity respectively from every characteristic pattern;
Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network is fitted using cubic polynomial Pedestrian's posture that the method combined with cubic spline interpolation treats restored image is predicted;
Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, obtains parked figure Pedestrian's attitude prediction result of picture;
Step d: parked image is obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, and is counted The insertion position of the parked image in video is calculated, the parked image is inserted into video on corresponding position.
2. being inserted into the method for picture frame between video according to claim 1, which is characterized in that assuming that being adopted in every characteristic pattern The human body attitude point of collection is 17, and in the step b, pedestrian's attitude prediction method of the Alex Net network is specifically wrapped It includes:
Step b1: it by 17 human body attitudes o'clock as 17 ID, is utilized respectively cubic polynomial approximating method and determines a recurrence Curve;The corresponding human body attitude point of each ID has a coordinate in the picture, indicates are as follows: locationID=(xi, yi), i=ID, a series of (x that each ID is formedi,yi) cubic polynomial is substituted into, it obtains:
Y=ax3+bx2+cx+d
17 groups of (a are obtained by computer fittingi,bi,ci,di), and form 17 cubic polynomials and be fitted identified y=f (x), y=f (x) is plotted on an image, generates 17 curves, the transverse and longitudinal coordinate in curve is each human body attitude respectively The position of point indicates;
Step b2: by cubic spline interpolation, the image coordinate point between two frames before and after the video-losing time being restored, Obtain pedestrian's attitude prediction result of parked image.
3. being inserted into the method for picture frame between video according to claim 1, which is characterized in that described in the step c The input structure of LSTM network are as follows:
[ht,ct]=LSTM (pt,ht-1,ct-1)
The then attitude prediction of next frame parked image are as follows:
In above-mentioned formula, WTIndicate the weight that neural metwork training goes out, ht,ctFor the intrinsic parameter of LSTM structure, the damage of LSTM network Mistake function representation is object2=Loss (LSTM).
4. being inserted into the method for picture frame between video according to any one of claims 1 to 3, which is characterized in that the step c Afterwards further include: objective function optimizes Alexnet network and LSTM network according to the objective function:
objectfinal=object1+object2+|object1-object2|
In above-mentioned formula, | object1-object2| indicate the pedestrian's attitude prediction for allowing Alex Net network and LSTM network to generate As a result close as far as possible.
5. being inserted into the method for picture frame between video according to claim 4, which is characterized in that described in the step d Parked image is inserted into video on corresponding position and is specifically included: optimized Alex Net network and LSTM net Network respectively obtains the parked image of two groups of same number of frames, all includes 17 human body attitude points in each frame parked image, point 17 human body attitude points in each frame parked image are not corresponded to each other with its ID, and ask from Alex Net network be passed to (xi,yi) and be passed to from LSTM networkAverage value, obtain the insertion position of each frame parked image, and All parked images are inserted at corresponding position;The position calculation formula are as follows:
6. being inserted into the system of picture frame between a kind of video characterized by comprising
Characteristic pattern selecting module: for respectively selecting m frame to include before the video-losing time and after video restoration time respectively The characteristic pattern of pedestrian;
Posture point acquisition module: for acquisition to set the human body attitude point of quantity respectively from every characteristic pattern;
Alex Net neural network forecast module: for all human body attitude points to be inputted Alex Net network, the Alex Net net Network is predicted using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image;
LSTM neural network forecast module: for the corresponding human body attitude point of characteristic pattern before the video-losing time to be inputted LSTM net Network obtains pedestrian's attitude prediction result of parked image;
Image is inserted into module: for being obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network to multiple Original image, and the insertion position of the parked image in video is calculated, it is right in video that the parked image is inserted into On the position answered.
7. being inserted into the system of picture frame between video according to claim 6, which is characterized in that assuming that being adopted in every characteristic pattern The human body attitude point of collection is 17, pedestrian's attitude prediction method of the Alex Net neural network forecast module specifically:
By 17 human body attitudes o'clock as 17 ID, it is utilized respectively cubic polynomial approximating method and determines a regression curve;Often The corresponding human body attitude point of one ID all has a coordinate in the picture, indicates are as follows: locationID=(xi,yi), i=ID, A series of (x that each ID is formedi,yi) cubic polynomial is substituted into, it obtains:
Y=ax3+bx2+cx+d
17 groups of (a are obtained by computer fittingi,bi,ci,di), and form 17 cubic polynomials and be fitted identified y=f (x), y=f (x) is plotted on an image, generates 17 curves, the transverse and longitudinal coordinate in curve is each human body attitude respectively The position of point indicates;
By cubic spline interpolation, the image coordinate point between two frames before and after the video-losing time is restored, obtain to Pedestrian's attitude prediction result of restored image.
8. being inserted into the system of picture frame between video according to claim 6, which is characterized in that the input of the LSTM network Structure are as follows:
[ht,ct]=LSTM (pt,ht-1,ct-1)
The then attitude prediction of next frame parked image are as follows:
In above-mentioned formula, WTIndicate the weight that neural metwork training goes out, ht,ctFor the intrinsic parameter of LSTM structure, the damage of LSTM network Mistake function representation is object2=Loss (LSTM).
9. according to the system for being inserted into picture frame between the described in any item videos of claim 6 to 8, which is characterized in that further include net Network optimization module, the network optimization module are used for objective function, according to the objective function to Alexnet network and LSTM network optimizes:
objectfinal=object1+object2+|object1-object2|
In above-mentioned formula, | object1-object2| indicate the pedestrian's attitude prediction for allowing Alex Net network and LSTM network to generate As a result close as far as possible.
10. being inserted into the system of picture frame between video according to claim 9, which is characterized in that described image is inserted into module Parked image is inserted into video on corresponding position and is specifically included: optimized Alex Net network and LSTM net Network respectively obtains the parked image of two groups of same number of frames, all includes 17 human body attitude points in each frame parked image, point 17 human body attitude points in each frame parked image are not corresponded to each other with its ID, and ask from Alex Net network be passed to (xi,yi) and be passed to from LSTM networkAverage value, obtain the insertion position of each frame parked image, and All parked images are inserted at corresponding position;The position calculation formula are as follows:
11. a kind of electronic equipment, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by one processor, and described instruction is by least one described processor It executes, so that the method that at least one described processor is able to carry out insertion picture frame between above-mentioned 1 to 5 described in any item videos Following operation:
Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the characteristic pattern of pedestrian, And acquisition sets the human body attitude point of quantity respectively from every characteristic pattern;
Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network is fitted using cubic polynomial Pedestrian's posture that the method combined with cubic spline interpolation treats restored image is predicted;
Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, obtains parked figure Pedestrian's attitude prediction result of picture;
Step d: parked image is obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, and is counted The insertion position of the parked image in video is calculated, the parked image is inserted into video on corresponding position.
CN201910600097.2A 2019-07-04 2019-07-04 Method and system for inserting image frame between videos and electronic equipment Active CN110366029B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910600097.2A CN110366029B (en) 2019-07-04 2019-07-04 Method and system for inserting image frame between videos and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910600097.2A CN110366029B (en) 2019-07-04 2019-07-04 Method and system for inserting image frame between videos and electronic equipment

Publications (2)

Publication Number Publication Date
CN110366029A true CN110366029A (en) 2019-10-22
CN110366029B CN110366029B (en) 2021-08-24

Family

ID=68217860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910600097.2A Active CN110366029B (en) 2019-07-04 2019-07-04 Method and system for inserting image frame between videos and electronic equipment

Country Status (1)

Country Link
CN (1) CN110366029B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530342A (en) * 2020-05-26 2021-03-19 友达光电股份有限公司 Display method
CN112884830A (en) * 2021-01-21 2021-06-01 浙江大华技术股份有限公司 Target frame determining method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471073A (en) * 2007-12-27 2009-07-01 华为技术有限公司 Package loss compensation method, apparatus and system based on frequency domain
KR101452635B1 (en) * 2013-06-03 2014-10-22 충북대학교 산학협력단 Method for packet loss concealment using LMS predictor, and thereof recording medium
CN104751851A (en) * 2013-12-30 2015-07-01 联芯科技有限公司 Before and after combined estimation based frame loss error hiding method and system
CN106919360A (en) * 2017-04-18 2017-07-04 珠海全志科技股份有限公司 A kind of head pose compensation method and device
US20180095076A1 (en) * 2013-03-15 2018-04-05 Carnegie Mellon University Linked Peptide Fluorogenic Biosensors
CN108111860A (en) * 2018-01-11 2018-06-01 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on depth residual error network
CN108615027A (en) * 2018-05-11 2018-10-02 常州大学 A method of video crowd is counted based on shot and long term memory-Weighted Neural Network
US20190124403A1 (en) * 2017-10-20 2019-04-25 Fmr Llc Integrated Intelligent Overlay for Media Content Streams

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101471073A (en) * 2007-12-27 2009-07-01 华为技术有限公司 Package loss compensation method, apparatus and system based on frequency domain
US20180095076A1 (en) * 2013-03-15 2018-04-05 Carnegie Mellon University Linked Peptide Fluorogenic Biosensors
KR101452635B1 (en) * 2013-06-03 2014-10-22 충북대학교 산학협력단 Method for packet loss concealment using LMS predictor, and thereof recording medium
CN104751851A (en) * 2013-12-30 2015-07-01 联芯科技有限公司 Before and after combined estimation based frame loss error hiding method and system
CN106919360A (en) * 2017-04-18 2017-07-04 珠海全志科技股份有限公司 A kind of head pose compensation method and device
US20190124403A1 (en) * 2017-10-20 2019-04-25 Fmr Llc Integrated Intelligent Overlay for Media Content Streams
CN108111860A (en) * 2018-01-11 2018-06-01 安徽优思天成智能科技有限公司 Video sequence lost frames prediction restoration methods based on depth residual error network
CN108615027A (en) * 2018-05-11 2018-10-02 常州大学 A method of video crowd is counted based on shot and long term memory-Weighted Neural Network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JACOB WALKER: ""The Pose Knows: Video Forecasting by Generating Pose Futures"", 《IEEE》 *
张顺: "深度卷积神经网络的发展及其在计算机视觉领域的应用", 《计算机学报》 *
郑远攀: ""深度学习在图像识别中的应用研究综述"", 《计算机工程与应用》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530342A (en) * 2020-05-26 2021-03-19 友达光电股份有限公司 Display method
TWI729826B (en) * 2020-05-26 2021-06-01 友達光電股份有限公司 Display method
CN112530342B (en) * 2020-05-26 2023-04-25 友达光电股份有限公司 Display method
CN112884830A (en) * 2021-01-21 2021-06-01 浙江大华技术股份有限公司 Target frame determining method and device
CN112884830B (en) * 2021-01-21 2024-03-29 浙江大华技术股份有限公司 Target frame determining method and device

Also Published As

Publication number Publication date
CN110366029B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN108416327B (en) Target detection method and device, computer equipment and readable storage medium
Weber et al. Imagination-augmented agents for deep reinforcement learning
Seo et al. Reinforcement learning with action-free pre-training from videos
CN107527091B (en) Data processing method and device
CN107515909B (en) Video recommendation method and system
US10026017B2 (en) Scene labeling of RGB-D data with interactive option
EP3451241A1 (en) Device and method for performing training of convolutional neural network
CN111444878A (en) Video classification method and device and computer readable storage medium
KR20200018283A (en) Method for training a convolutional recurrent neural network and for semantic segmentation of inputted video using the trained convolutional recurrent neural network
CN108111860B (en) Video sequence lost frame prediction recovery method based on depth residual error network
TW202215303A (en) Processing images using self-attention based neural networks
CN110366029A (en) Method, system and the electronic equipment of picture frame are inserted between a kind of video
Voulodimos et al. Improving multi-camera activity recognition by employing neural network based readjustment
US11907335B2 (en) System and method for facilitating autonomous target selection
WO2021090535A1 (en) Information processing device and information processing method
CN113469289A (en) Video self-supervision characterization learning method and device, computer equipment and medium
CN113112536A (en) Image processing model training method, image processing method and device
WO2020225247A1 (en) Unsupervised learning of object keypoint locations in images through temporal transport or spatio-temporal transport
CA3204121A1 (en) Recommendation with neighbor-aware hyperbolic embedding
CN110083761B (en) Data distribution method, system and storage medium based on content popularity
CN110288444A (en) Realize the method and system of user&#39;s associated recommendation
CN116189277A (en) Training method and device, gesture recognition method, electronic equipment and storage medium
CN116982080A (en) Methods, systems, and computer media for scene adaptive future depth prediction in monocular video
CN107622498A (en) Image penetration management method, apparatus and computing device based on scene cut
CN113902639A (en) Image processing method, image processing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant