CN110366029A

CN110366029A - Method, system and the electronic equipment of picture frame are inserted between a kind of video

Info

Publication number: CN110366029A
Application number: CN201910600097.2A
Authority: CN
Inventors: 张昱航; 任宏帅; 叶可江; 须成忠
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2019-10-22
Anticipated expiration: 2039-07-04
Also published as: CN110366029B

Abstract

This application involves method, system and electronic equipments that picture frame is inserted between a kind of video.Include: respectively before the video-losing time and video restoration time after respectively select m frame include pedestrian characteristic pattern, and from every characteristic pattern respectively acquisition setting quantity human body attitude point；All human body attitude points are inputted into Alex Net network, the Alex Net network is predicted using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image；The corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, obtains pedestrian's attitude prediction result of parked image；Parked image is obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, and calculates the insertion position of the parked image in video, the parked image is inserted into video on corresponding position.The application improves the precision of existing algorithm.

Description

Method, system and the electronic equipment of picture frame are inserted between a kind of video

Technical field

The application belongs to frame insertion technical field between video, in particular to is inserted into the method for picture frame between a kind of video, is System and electronic equipment.

Background technique

Along with the application of more and more amusement short-sighted frequencies of grade, there is a large amount of short transmission of video in China at present With the demand of broadcasting, but the problem of bringing at the same time, is: since a variety of causes such as network transmission will cause transmission of video mistake Frame losing in journey will result in the discontinuous feeling of people's vision of viewing video under such circumstances, not only influence viewing experience, also Being likely due to loss key message causes video usable value not high.

The prior art is in the case where facing lost frames, often by improving the quality of transmission process so that video The upload or downloading of not frame losing as far as possible, but the case where missing video frames caused by facing already is also helpless.In recent years The work of the top research achievement of latest academic circle come in this respect is quite few, and being primarily due to this is the new need that industry expedites the emergence of It asks, current related work is fewer and fewer, several frames predicted below are only now restored based on past video frame in the art, But almost seldom it is related to restoring a few frame videos of loss in continuous videos.Document [Walker J, Marino K, Gupta A,et al.The pose knows:Video forecasting by generating pose futures[C]// Proceedings of the IEEE International Conference on Computer Vision.2017: 3332-3341.] in show prediction the possible behavior of next frame on, VAE method [Kingma D P, Welling can be used M.Auto-encoding variational bayes [J] .arXiv preprint arXiv:1312.6114,2013.] into The attitude prediction of pedestrian, but these attitude predictions are not related to utilize these only for some predictions of future frame Prediction is by two associated videos but intermediate frame number has the picture completion of missing.

Summary of the invention

This application provides method, system and electronic equipments that picture frame is inserted between a kind of video, it is intended at least certain One of above-mentioned technical problem in the prior art is solved in degree.

To solve the above-mentioned problems, this application provides following technical solutions:

The method of picture frame is inserted between a kind of video, comprising the following steps:

Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the feature of pedestrian Figure, and the human body attitude point of acquisition setting quantity respectively from every characteristic pattern；

Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network utilizes cubic polynomial Pedestrian's posture that the method that fitting is combined with cubic spline interpolation treats restored image is predicted；

Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, is obtained to multiple Pedestrian's attitude prediction result of original image；

Step d: obtaining parked image according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, And the insertion position of the parked image in video is calculated, the parked image is inserted into corresponding position in video On.

The technical solution that the embodiment of the present application is taken further include: assuming that the human body attitude point acquired in every characteristic pattern is 17 A, in the step b, pedestrian's attitude prediction method of the Alex Net network is specifically included:

Step b1: it by 17 human body attitudes o'clock as 17 ID, is utilized respectively cubic polynomial approximating method and determines one Regression curve；The corresponding human body attitude point of each ID has a coordinate in the picture, indicates are as follows: location_ID= (x_i,y_i), i=ID, a series of (x that each ID is formed_i,y_i) cubic polynomial is substituted into, it obtains:

Y=ax³+bx²+cx+d

17 groups of (a are obtained by computer fitting_i,b_i,c_i,d_i), and form 17 cubic polynomials and be fitted identified y Y=f (x) is plotted on an image by=f (x), generates 17 curves, and the transverse and longitudinal coordinate in curve is each human body respectively The position of posture point indicates；

Step b2: by cubic spline interpolation, the image coordinate point between two frames before and after the video-losing time is restored Out, pedestrian's attitude prediction result of parked image is obtained.

The technical solution that the embodiment of the present application is taken further include: in the step c, the input structure of the LSTM network Are as follows:

[h_t,c_t]=LSTM (p_t,h_t-1,c_t-1)

The then attitude prediction of next frame parked image are as follows:

In above-mentioned formula, W^TIndicate the weight that neural metwork training goes out, h_t,c_tFor the intrinsic parameter of LSTM structure, LSTM network Loss function be expressed as object₂=Loss (LSTM).

The technical solution that the embodiment of the present application is taken further include: after the step c further include: objective function, according to The objective function optimizes Alexnet network and LSTM network:

object_final=object₁+object₂+|object₁-object₂|

In above-mentioned formula, | object₁-object₂| indicate the pedestrian's posture for allowing Alex Net network and LSTM network to generate Prediction result is close as far as possible.

The technical solution that the embodiment of the present application is taken further include: described to be inserted into parked image in the step d Specifically include on corresponding position in video: optimized Alex Net network and LSTM network respectively obtain two groups of phases It all include 17 human body attitude points in each frame parked image, respectively by each frame parked with the parked image of frame number 17 human body attitude points in image are corresponded to each other with its ID, and seek the (x being passed to from Alex Net network_i,y_i) with from LSTM net What network was passed toAverage value, obtain the insertion position of each frame parked image, and by all parked images It is inserted at corresponding position；The position calculation formula are as follows:

Another technical solution that the embodiment of the present application is taken are as follows: the system of picture frame is inserted between a kind of video, comprising:

Characteristic pattern selecting module: for selecting before the video-losing time and respectively m frame after video restoration time respectively Characteristic pattern comprising pedestrian；

Posture point acquisition module: for acquisition to set the human body attitude point of quantity respectively from every characteristic pattern；

Alex Net neural network forecast module: for all human body attitude points to be inputted Alex Net network, the Alex Net network is carried out using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image Prediction；

LSTM neural network forecast module: for inputting the corresponding human body attitude point of characteristic pattern before the video-losing time LSTM network obtains pedestrian's attitude prediction result of parked image；

Image is inserted into module: for being obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network Parked image, and the insertion position of the parked image in video is calculated, the parked image is inserted into video In on corresponding position.

The technical solution that the embodiment of the present application is taken further include: assuming that the human body attitude point acquired in every characteristic pattern is 17 It is a, pedestrian's attitude prediction method of the Alex Net neural network forecast module specifically:

By 17 human body attitudes o'clock as 17 ID, it is utilized respectively cubic polynomial approximating method and determines that a recurrence is bent Line；The corresponding human body attitude point of each ID has a coordinate in the picture, indicates are as follows: location_ID=(x_i,y_i),i =ID, a series of (x that each ID is formed_i,y_i) cubic polynomial is substituted into, it obtains:

Y=ax³+bx²+cx+d

By cubic spline interpolation, the image coordinate point between two frames before and after the video-losing time is restored, is obtained To pedestrian's attitude prediction result of parked image.

The technical solution that the embodiment of the present application is taken further include: the input structure of the LSTM network are as follows:

[h_t,c_t]=LSTM (p_t,h_t-1,c_t-1)

The then attitude prediction of next frame parked image are as follows:

The technical solution that the embodiment of the present application is taken further includes network optimization module, and the network optimization module is for defining Objective function optimizes Alexnet network and LSTM network according to the objective function:

object_final=object₁+object₂+|object₁-object₂|

The technical solution that the embodiment of the present application is taken further include: described image is inserted into module and parked image is inserted into view Specifically included on corresponding position in frequency: optimized Alex Net network and LSTM network, respectively obtain two groups it is identical The parked image of frame number all includes 17 human body attitude points in each frame parked image, respectively by each frame parked figure 17 human body attitude points as in are corresponded to each other with its ID, and seek the (x being passed to from Alex Net network_i,y_i) with from LSTM network IncomingAverage value, obtain the insertion position of each frame parked image, and all parked images are inserted Enter at corresponding position；The position calculation formula are as follows:

The another technical solution that the embodiment of the present application is taken are as follows: a kind of electronic equipment, comprising:

At least one processor；And

The memory being connect at least one described processor communication；Wherein,

The memory is stored with the instruction that can be executed by one processor, and described instruction is by described at least one It manages device to execute, so that at least one described processor is able to carry out the following behaviour for being inserted into the method for picture frame between above-mentioned video Make:

Compared with the existing technology, the beneficial effect that the embodiment of the present application generates is: the video of the embodiment of the present application interleaves Enter the method, system and electronic equipment of picture frame by using Alex Net network integration LSTM network while the final phase again of prediction The scheme mutually promoted is predicted using cubic spline interpolation based on the video frame of front and back, will not be based on interruption to LSTM The case where deficiency that video afterwards learns known sample again is supplemented, and single LSTM forecasting inaccuracy is efficiently solved, Preferable viewing experience is brought to video viewers.Meanwhile the dual network of the application effectively improves the precision of existing algorithm, it can Expansion is high, can also be by replacement convolutional neural networks to complete increasingly complex prediction task.

Detailed description of the invention

Fig. 1 be the embodiment of the present application video between be inserted into picture frame method flow chart；

Fig. 2 be the embodiment of the present application video between be inserted into picture frame system structural schematic diagram；

Fig. 3 is the hardware device structural schematic diagram that the method for picture frame is inserted between video provided by the embodiments of the present application.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the application, not For limiting the application.

For technical problem of the existing technology, present applicant proposes it is a kind of using shot and long term memory network (LSTM) into The method of frame insertion in row memory prediction and video, this method become by convolutional neural networks, by frame insertion process Habit process, so that network constantly learns relevant parameter weight, insertion is automatic raw in the continuous but intermediate even video for having interruption At picture frame to complete it is continuous on video visual, give viewer provide a better viewing experience.

The application pertain only in the case where detecting human body attitude (human body attitude detection method referring to [Fang H S, Xie S,Tai Y W,et al.Rmpe:Regional multi-person pose estimation[C]// Proceedings of the IEEE International Conference on Computer Vision.2017: 2334-2343.]) it carries out the prediction of target and completes insertion process.Specifically, referring to Fig. 1, being the view of the embodiment of the present application The flow chart of the method for picture frame is inserted between frequency.It includes following step that the method for picture frame is inserted between the video of the embodiment of the present application It is rapid:

Step 100: respectively before the video-losing time and video restoration time after respectively select m frame as generate wait answer The characteristic pattern of original image；

In step 100, it is assumed that video is lost at time t=p, restores at time t=q, and chronomere is the second (s), The video-losing time is then miss=| p-q |, since each second generally takes 24 frames in smooth video, parked image it is total Frame number is Total Frames=miss*24.It is selected before video-losing time t=p and respectively after video restoration time t=q M frame is selected, then input is Input Frames=2*m.

Step 200: acquisition sets the human body attitude point of quantity respectively from every characteristic pattern；

In step 200, the application selectes 17 people's body nodes as human body attitude point, (can manage shown in table 1 specific as follows Solution, the quantity of human body attitude point and position can be set according to practical application):

1 human body node specification of table

Step 300: collected human body attitude point is inputted into Alex Net network [Krizhevsky A, Sutskever I,Hinton G E.Imagenet classification with deep convolutional neural networks [C] //Advances in neural information processing systems.2012:1097-1105.], Alex Net network is carried out using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image Prediction；

In step 300, since every characteristic pattern respectively includes 17 human body attitude points, then just there is 2*17*m point to make in total For the input of Alex Net network.Pedestrian's attitude prediction method of Alex Net network specifically includes:

Step 301: firstly, 17 human body attitudes o'clock are utilized respectively three times as 17 ID (ID=1,2,3 ... 17) Polynomial fitting method determines a regression curve；Wherein, the corresponding human body attitude point of each ID has one in the picture A coordinate indicates are as follows: location_ID=(x_i,y_i), i=ID, a series of (x that each ID is formed_i,y_i) substitute into it is more three times Item formula, obtains:

Y=ax³+bx²+cx+d (1)

In above-mentioned formula, a ≠ 0, b, c, d are constant.By computer fitting (MATLAB), so that it may obtain 17 groups of (a_i, b_i,c_i,d_i), it ultimately forms 17 cubic polynomials and is fitted identified y=f (x).Y=f (x) is plotted on an image, To generate 17 curves, the transverse and longitudinal coordinate in curve is that respective positions indicate respectively.

Step 302: by cubic spline interpolation, the image coordinate point between two frame position of t=p, t=q will be interrupted It restores, obtains pedestrian's attitude prediction result of parked image；Wherein, cubic spline interpolation is as follows:

Share 17 nodes: x₀<x₁<…<x_n-1<x_n(n=17)

Functional value: y_i=f (x_i)

S (x) meets s (x_i)=y_i(i=0,1,2 ..., 16)

In formula (3), s_kIt (x) is [x_k-1,x_k], (k=1,2 ..., 17)；Have simultaneously: s_k(x_k-1)=y_k-1,s_k(x_k)= y_k。

It can be obtained according to the boundary condition of Interpolatory Splines method:

So can obtain:

Each s_kIt (x) is the cubic polynomial determined in (1), there are four undetermined coefficient a_i,b_i,c_i,d_i.So shared 4n coefficient, but be derived by 2n+2 (n-1)=4n-2 equation at present, is also not enough to solve, be added thus initially and Boundary condition constructs 4n equation.

The two equations are respectively as follows:

s′(x₀)=f '₀ (5)

s′(x_n)=f '_n (6)

Above formula is solved by the chasing method that matrix calculates, in this way, the curve of parked image can be obtained, amplifies x_pAnd x_q The curve at place, stochastical sampling go out the point of Total Frames number, are similarly operated to each ID point, can be obtained every As soon as 17 sample points of frame parked image, this 17 sample points constitute pedestrian's attitude prediction of each frame parked image As a result.

In convolutional neural networks training, the application indicates the experimental error of convolutional neural networks using mean square deviation, such as Shown in following formula:

In formula (7),It indicates true coordinate in training set, passes through The continuous training of Alexnet adjusts suitable Interpolatory Splines method coefficient to reach good prediction effect.

Step 400: the human body attitude point based on the characteristic pattern acquisition before video-losing time t=p, it is (long using LSTM Short-term memory network) obtain pedestrian's attitude prediction result of parked image；

In step 400, the input structure of LSTM are as follows:

[h_t,c_t]=LSTM (p_t,h_t-1,c_t-1) (8)

The then attitude prediction of next frame parked image are as follows:

In formula (8) and (9), W^TIndicate the weight that neural metwork training goes out, h_t,c_tFor the intrinsic parameter of LSTM structure.LSTM The loss function of network can be represented simply as object₂=Loss (LSTM).

Step 500: objective function optimizes Alexnet network and LSTM network according to the objective function；

In step 500, the row of corresponding Total Frames number is generated respectively by Alex Net network and LSTM network People's attitude prediction as a result, pedestrian's attitude prediction that Alex Net network generates the result is that based on before the video-losing time and video Data distribution after recovery time, and pedestrian's attitude prediction for being generated in LSTM network the result is that based on the video-losing time it Preceding feature utilizes mutually, the objective function that the application is defined as follows is to Alex in order to allow the two networks to learn from other's strong points to offset one's weaknesses Net network and LSTM network optimize:

object_final=object₁+object₂+|object₁-object₂| (10)

In formula (10), | object₁-object₂| show the pedestrian's posture for allowing Alex Net network and LSTM network to generate Prediction result is close as far as possible, could finally generate relatively good restoration result.

Step 600: according to after optimization Alex Net network and LSTM network to export final parked image respectively pre- It surveys as a result, and parked image is inserted into video on corresponding position；

In step 600, by the optimization of Alex Net network and LSTM network, having respectively obtained two groups of frame numbers is Total The parked image of Frames all includes 17 human body attitude points in each frame, respectively by 17 human body attitudes in each frame Point is corresponded to each other with its ID, and seeks the (x being passed to from Alex Net network_i,y_i) and be passed to from LSTM network's Average value obtains the insertion position of each frame parked image, and all parked images is inserted at corresponding position, complete At the insertion for losing picture frame in entire video.Wherein, position calculation formula is as follows:

Referring to Fig. 2, be the embodiment of the present application video between be inserted into picture frame system structural schematic diagram.The application is real Applying the system that picture frame is inserted between the video of example includes that characteristic pattern selecting module, posture point acquisition module, Alex Net network are pre- It surveys module, LSTM neural network forecast module, network optimization module and image and is inserted into module.

Characteristic pattern selecting module: to select before the video-losing time and respectively m frame after video restoration time respectively As the characteristic pattern for generating parked image；In the embodiment of the present application, it is assumed that video is lost at time t=p, in time t=q Place restores, and chronomere is the second (s), and the video-losing time is then miss=| p-q |, since each second generally takes in smooth video 24 frames, therefore the totalframes of parked image is Total Frames=miss*24.Before video-losing time t=p and regard Each selection m frame after frequency recovery time t=q, then input is Input Frames=2*m.

Posture point acquisition module: for acquisition to set the human body attitude point of quantity respectively from every characteristic pattern；Wherein, originally Apply for selected 17 people's body nodes as human body attitude point, (it is appreciated that the quantity of human body attitude point shown in table 1 specific as follows Can be set according to practical application with position):

1 human body node specification of table

Alex Net neural network forecast module: collected human body attitude point is inputted Alex Net network, Alex Net network is carried out using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image Prediction；Wherein, since every characteristic pattern respectively includes 17 human body attitude points, then just there is 2*17*m point as Alex in total The input of Net network.Alex Net neural network forecast module specifically includes:

Cubic polynomial fitting unit: for by 17 human body attitudes o'clock as 17 ID (ID=1,2,3 ... 17) points A regression curve is not determined using cubic polynomial approximating method；Wherein, the corresponding human body attitude point of each ID is in image In all have a coordinate, indicate are as follows: location_ID=(x_i,y_i), i=ID, a series of (x that each ID is formed_i,y_i) Cubic polynomial is substituted into, is obtained:

Y=ax³+bx²+cx+d (1)

By computer fitting (MATLAB), so that it may obtain 17 groups of (a_i,b_i,c_i,d_i), ultimately form 17 it is more three times Item formula is fitted identified y=f (x).Y=f (x) is plotted on an image, so that 17 curves are generated, the cross in curve Ordinate is that respective positions indicate respectively.

Cubic spline interpolation unit: for will interrupt between two frame position of t=p, t=q by cubic spline interpolation Image coordinate point restore, obtain pedestrian's attitude prediction result of parked image；Wherein, cubic spline interpolation is as follows It is shown:

Share 17 nodes: x₀<x₁<…<x_n-1<x_n(n=17)

Functional value: y_i=f (x_i)

S (x) meets s (x_i)=y_i(i=0,1,2 ..., 16)

So can obtain:

The two equations are respectively as follows:

s′(x₀)=f '₀ (5)

s′(x_n)=f '_n (6)

In formula (7),It indicates true coordinate in training set, passes through Alexnet's Constantly training adjusts suitable Interpolatory Splines method coefficient to reach good prediction effect.

LSTM neural network forecast module: for the human body attitude based on the characteristic pattern acquisition before video-losing time t=p Point obtains pedestrian's attitude prediction result of parked image using LSTM (shot and long term memory network)；Wherein, LSTM network is defeated Enter structure are as follows:

[h_t,c_t]=LSTM (p_t,h_t-1,c_t-1) (8)

The then attitude prediction of next frame parked image are as follows:

Network optimization module: being used for objective function, according to the objective function to Alexnet network and LSTM network into Row optimization；Wherein, pedestrian's posture of corresponding Total Frames number is generated respectively by Alex Net network and LSTM network Prediction result, pedestrian's attitude prediction that Alex Net network generates is the result is that be based on before the video-losing time and when video restoration Between after data distribution, and the pedestrian's attitude prediction generated in LSTM network is the result is that based on the spy before the video-losing time Sign utilizes mutually, the objective function that the application is defined as follows is to Alex Net network in order to allow the two networks to learn from other's strong points to offset one's weaknesses It is optimized with LSTM network:

object_final=object₁+object₂+|object₁-object₂| (10)

Image is inserted into module: for according to after optimization Alex Net network and LSTM network export respectively it is final wait answer Original image prediction result, and parked image is inserted into video on corresponding position；By Alex Net network and LSTM The optimization of network has respectively obtained the parked image that two groups of frame numbers are Total Frames, all includes 17 people in each frame Body posture point respectively corresponds to each other 17 human body attitude points in each frame with its ID, and asks incoming from Alex Net network (x_i,y_i) and be passed to from LSTM networkAverage value, obtain the insertion position of each frame parked image, And all parked images are inserted at corresponding position, complete the insertion that picture frame is lost in entire video.Wherein, position Calculation formula is as follows:

Fig. 3 is the hardware device structural schematic diagram that the method for picture frame is inserted between video provided by the embodiments of the present application.Such as Shown in Fig. 3, which includes one or more processors and memory.It takes a processor as an example, which can also wrap It includes: input system and output system.

Processor, memory, input system and output system can be connected by bus or other modes, in Fig. 3 with For being connected by bus.

Memory as a kind of non-transient computer readable storage medium, can be used for storing non-transient software program, it is non-temporarily State computer executable program and module.Processor passes through operation non-transient software program stored in memory, instruction And module realizes the place of above method embodiment thereby executing the various function application and data processing of electronic equipment Reason method.

Memory may include storing program area and storage data area, wherein storing program area can storage program area, extremely Application program required for a few function；It storage data area can storing data etc..In addition, memory may include that high speed is random Memory is accessed, can also include non-transient memory, a for example, at least disk memory, flush memory device or other are non- Transient state solid-state memory.In some embodiments, it includes the memory remotely located relative to processor that memory is optional, this A little remote memories can pass through network connection to processing system.The example of above-mentioned network includes but is not limited to internet, enterprise Intranet, local area network, mobile radio communication and combinations thereof.

Input system can receive the number or character information of input, and generate signal input.Output system may include showing Display screen etc. shows equipment.

One or more of module storages in the memory, are executed when by one or more of processors When, execute the following operation of any of the above-described embodiment of the method:

Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiments of the present application.

The embodiment of the present application provides a kind of non-transient (non-volatile) computer storage medium, and the computer storage is situated between Matter is stored with computer executable instructions, the executable following operation of the computer executable instructions:

The embodiment of the present application provides a kind of computer program product, and the computer program product is non-temporary including being stored in Computer program on state computer readable storage medium, the computer program include program instruction, when described program instructs When being computer-executed, the computer is made to execute following operation:

Method, system and the electronic equipment of picture frame are inserted between the video of the embodiment of the present application by using Alex Net net Network combination LSTM network predicts the scheme finally mutually promoted again, the video frame using cubic spline interpolation based on front and back simultaneously It is predicted, the LSTM deficiency that will not learn again known sample based on the video after interruption is supplemented, effectively solved The case where LSTM forecasting inaccuracy for having determined single, brings preferable viewing experience to video viewers.Meanwhile double nets of the application Network effectively improves the precision of existing algorithm, and expansibility is high, can also be by replacement convolutional neural networks to complete more Complicated prediction task.

The foregoing description of the disclosed embodiments makes professional and technical personnel in the field can be realized or use the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, defined herein General Principle can realize in other embodiments without departing from the spirit or scope of the application.Therefore, this Shen These embodiments shown in the application please be not intended to be limited to, and are to fit to special with principle disclosed in the present application and novelty The consistent widest scope of point.

Claims

1. being inserted into the method for picture frame between a kind of video, which comprises the following steps:

Step a: respectively selecting m frame before the video-losing time and after video restoration time respectively includes the characteristic pattern of pedestrian, And acquisition sets the human body attitude point of quantity respectively from every characteristic pattern；

Step b: all human body attitude points are inputted into Alex Net network, the Alex Net network is fitted using cubic polynomial Pedestrian's posture that the method combined with cubic spline interpolation treats restored image is predicted；

Step c: the corresponding human body attitude point of characteristic pattern before the video-losing time is inputted into LSTM network, obtains parked figure Pedestrian's attitude prediction result of picture；

Step d: parked image is obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network, and is counted The insertion position of the parked image in video is calculated, the parked image is inserted into video on corresponding position.

2. being inserted into the method for picture frame between video according to claim 1, which is characterized in that assuming that being adopted in every characteristic pattern The human body attitude point of collection is 17, and in the step b, pedestrian's attitude prediction method of the Alex Net network is specifically wrapped It includes:

Step b1: it by 17 human body attitudes o'clock as 17 ID, is utilized respectively cubic polynomial approximating method and determines a recurrence Curve；The corresponding human body attitude point of each ID has a coordinate in the picture, indicates are as follows: location_ID=(x_i, y_i), i=ID, a series of (x that each ID is formed_i,y_i) cubic polynomial is substituted into, it obtains:

Y=ax³+bx²+cx+d

17 groups of (a are obtained by computer fitting_i,b_i,c_i,d_i), and form 17 cubic polynomials and be fitted identified y=f (x), y=f (x) is plotted on an image, generates 17 curves, the transverse and longitudinal coordinate in curve is each human body attitude respectively The position of point indicates；

Step b2: by cubic spline interpolation, the image coordinate point between two frames before and after the video-losing time being restored, Obtain pedestrian's attitude prediction result of parked image.

3. being inserted into the method for picture frame between video according to claim 1, which is characterized in that described in the step c The input structure of LSTM network are as follows:

[h_t,c_t]=LSTM (p_t,h_t-1,c_t-1)

The then attitude prediction of next frame parked image are as follows:

In above-mentioned formula, W^TIndicate the weight that neural metwork training goes out, h_t,c_tFor the intrinsic parameter of LSTM structure, the damage of LSTM network Mistake function representation is object₂=Loss (LSTM).

4. being inserted into the method for picture frame between video according to any one of claims 1 to 3, which is characterized in that the step c Afterwards further include: objective function optimizes Alexnet network and LSTM network according to the objective function:

object_final=object₁+object₂+|object₁-object₂|

In above-mentioned formula, | object₁-object₂| indicate the pedestrian's attitude prediction for allowing Alex Net network and LSTM network to generate As a result close as far as possible.

5. being inserted into the method for picture frame between video according to claim 4, which is characterized in that described in the step d Parked image is inserted into video on corresponding position and is specifically included: optimized Alex Net network and LSTM net Network respectively obtains the parked image of two groups of same number of frames, all includes 17 human body attitude points in each frame parked image, point 17 human body attitude points in each frame parked image are not corresponded to each other with its ID, and ask from Alex Net network be passed to (x_i,y_i) and be passed to from LSTM networkAverage value, obtain the insertion position of each frame parked image, and All parked images are inserted at corresponding position；The position calculation formula are as follows:

6. being inserted into the system of picture frame between a kind of video characterized by comprising

Characteristic pattern selecting module: for respectively selecting m frame to include before the video-losing time and after video restoration time respectively The characteristic pattern of pedestrian；

Alex Net neural network forecast module: for all human body attitude points to be inputted Alex Net network, the Alex Net net Network is predicted using pedestrian's posture that the method that cubic polynomial fitting is combined with cubic spline interpolation treats restored image；

LSTM neural network forecast module: for the corresponding human body attitude point of characteristic pattern before the video-losing time to be inputted LSTM net Network obtains pedestrian's attitude prediction result of parked image；

Image is inserted into module: for being obtained according to pedestrian's attitude prediction result of the Alex Net network and LSTM network to multiple Original image, and the insertion position of the parked image in video is calculated, it is right in video that the parked image is inserted into On the position answered.

7. being inserted into the system of picture frame between video according to claim 6, which is characterized in that assuming that being adopted in every characteristic pattern The human body attitude point of collection is 17, pedestrian's attitude prediction method of the Alex Net neural network forecast module specifically:

By 17 human body attitudes o'clock as 17 ID, it is utilized respectively cubic polynomial approximating method and determines a regression curve；Often The corresponding human body attitude point of one ID all has a coordinate in the picture, indicates are as follows: location_ID=(x_i,y_i), i=ID, A series of (x that each ID is formed_i,y_i) cubic polynomial is substituted into, it obtains:

Y=ax³+bx²+cx+d

By cubic spline interpolation, the image coordinate point between two frames before and after the video-losing time is restored, obtain to Pedestrian's attitude prediction result of restored image.

8. being inserted into the system of picture frame between video according to claim 6, which is characterized in that the input of the LSTM network Structure are as follows:

[h_t,c_t]=LSTM (p_t,h_t-1,c_t-1)

The then attitude prediction of next frame parked image are as follows:

9. according to the system for being inserted into picture frame between the described in any item videos of claim 6 to 8, which is characterized in that further include net Network optimization module, the network optimization module are used for objective function, according to the objective function to Alexnet network and LSTM network optimizes:

object_final=object₁+object₂+|object₁-object₂|

10. being inserted into the system of picture frame between video according to claim 9, which is characterized in that described image is inserted into module Parked image is inserted into video on corresponding position and is specifically included: optimized Alex Net network and LSTM net Network respectively obtains the parked image of two groups of same number of frames, all includes 17 human body attitude points in each frame parked image, point 17 human body attitude points in each frame parked image are not corresponded to each other with its ID, and ask from Alex Net network be passed to (x_i,y_i) and be passed to from LSTM networkAverage value, obtain the insertion position of each frame parked image, and All parked images are inserted at corresponding position；The position calculation formula are as follows:

11. a kind of electronic equipment, comprising:

At least one processor；And

The memory is stored with the instruction that can be executed by one processor, and described instruction is by least one described processor It executes, so that the method that at least one described processor is able to carry out insertion picture frame between above-mentioned 1 to 5 described in any item videos Following operation: