Based on the Realistic animation generation method of body sense camera Kinect
Technical field
The invention belongs to computer vision and field of Computer Graphics, particularly a kind of new element video method adopting a Kinect somatosensory camera to generate strong sense of reality.
Background technology
For a long time, the research for new animation producing method did not all stop always, and its solution method also emerges in an endless stream.These methods are divided three classes substantially, and first method utilizes skeleton and model rendering the earliest, carry out Model Reconstruction use Effective Pictures By A Computer to obtain new animated video sequence to simple scenario.But this method is better for the object effect of rigidity, but cannot form the video sequence of strong sense of reality for nonrigid object.Second method utilizes the method for physical modeling to synthesize the three-dimensional model of a small amount of high precision, high realism off-line, afterwards with these three-dimensional models for sample, synthesize under various human motion, there is the three-dimensional model of geometric detail true to nature.But the control of these class methods to the setting of apparel construction and material parameters is all very limited, its apparel construction generating motion corresponding all cannot conform to livery a certain in reality completely with material.The third method is the method based on Video Textures, extract by some frames concentrated existing video sequence and recombinate, finally generate new video sequence, this method can form the comparatively real new animation result of effect for simple scene, but can not process comparatively complicated scene.Marx's Planck Computer Sciences Laboratory (Feng Xu, Yebin Liu, Carsten Stoll, James Tompkin, Gaurav Bharaj, Qionghai Dai, Hans-Peter Seidel, Jan Kautz, Christian Theobalt.Video-based characters:creating new human performances from a multi-view video database.ACM Transactions on Graphics (TOG), 2011, 30 (4): 32.) by adopting various visual angles camera array to carry out data acquisition, model and video texture is utilized to generate the very strong new animated video sequence of the sense of reality.But this scheme cost is high, high to requirement for experiment condition, is unfavorable for Industry Promotion.
Body sense camera Kinect (Zhengyou Zhang.Microsoft kinect sensor and its effect.MultiMedia, IEEE, 2012,19 (2): 4-10.) appearance, making to realize Realistic animation based on cheap apparatus becomes possibility.In November, 2010, Kinect was gone on the market by Microsoft, became the focus of whole world E-consumer soon.It can catch chromatic information, depth information, skeleton motion information, voice messaging etc. in real time.And the research application based on it is rapidly broken out.The people such as Tong (Tong J, Zhou J, Liu L, et al.Scanning 3D full human bodies using kinects.IEEE Transactions onVisualization and Computer Graphics, 2012,18 (4): 643-650.) Kinect is utilized to achieve three-dimensional reconstruction for human body.
deng people (
m, Martinek M, Greiner G, et al..Automatic reconstruction of personalized avatars from 3D face scans.Computer Animation and Virtual Worlds, 2011,22 (2 ?3): 195-202.) the real character's mask achieved based on Kinect is rebuild.2014, in Kinect bis-generation, was issued, and was greatly improved than precision before.Can more accurate Model Reconstruction being carried out based on Kinect v2.0, the cromogram of more real 1080p can be preserved in real time, and precision be the depth map of 1mm simultaneously, providing good data basis for generating based on the Realistic animation of Kinect v2.0.
Summary of the invention
For overcoming the deficiencies in the prior art, realizing can ensureing to generate the true to nature credible of new video sequences, the method that experimental facilities requires high difficult point can be solved again.For this reason, the technical scheme that the present invention takes is, based on the Realistic animation generation method of body sense camera Kinect, utilizes single Kinect v2.0 personage to be carried out to the collection of static three-dimensional model and color depth framework information.Utilize unmarked motion capture method, obtain the skeleton motion information of every frame and the three-dimensional model after being out of shape.After user specifies a new element, based on set up data set, pass through the texture synthesis method that the proposed low-rank matrix based on rarefaction representation fills and generate the video that this personage does new element; Specifically comprise the following steps:
1) based on the method for Kinect Fusion, personage is carried out to the reconstruction of static three-dimensional model, and at this model namely with reference on the basis of attitude mode, manually embed skeleton;
2) Database: the frame sequence of color video, deep video and correspondence when adopting single this personage of Kinect v2.0 cameras capture to carry out various elemental motion; IK (inverse kinematics) algorithm is utilized to calculate the kinematic parameter of each frame skeleton node; Adopt linear hybrid skinning method (linear blend skinning) to calculate skin weight and be out of shape and obtain three-dimensional model corresponding to every frame with reference to attitude mode; Model after adopting depth information to correct distortion makes it more accurate; Data set only needs to set up once, can generate the video sequence of the new element of this personage based on this database;
3) user specifies the frame sequence of new action, and the topological structure of this skeleton is identical with the topological structure of data centralization skeleton;
4) for every frame of fresh target action finds similar candidate frame in data centralization;
5) utilize Moving Least Squares method that the human body parts of the frame retrieved is carried out deformation to be estimated at first, then utilize weighting low-rank matrix to fill up interpolation method and be optimized;
6) by step 5) final image that obtains puts together the new video sequences obtaining requiring.
The design of database building plan, specifically comprises the following steps:
2-1) for the seizure of color video, deep video and framework information, Kinect v2.0 is utilized to carry out Real-time Collection preservation, arranging Kinect v2.0 frame per second is 30fps, and color image resolution is 1920 × 1080, and depth image resolution is set to 512 × 424; The collection of frame sequence comprises the preservation of the preservation of node space three-dimensional information and the mapping relations with color depth information; The personage's elemental motion gathered comprises as walked, running, stride, wave, stretch, frontly to play, side is played, chest expanding, lateral bending and time;
2-2) utilize the three-dimensional information of each frame frame sequence collected, input each node space three dimensional local information and carry out IK Algorithm for Solving, obtain the rotation translation matrix of each node, that comprise its overall situation with kinematic matrix that is local;
2-3) calculate skin weight with linear hybrid skinning method (linear blend skinning) and distortion obtains three-dimensional model corresponding to every frame with reference to attitude mode;
2-4) adopting based on non-rigid surface's alignment schemes of rarefaction representation, utilize depth information, by step 2-3) three-dimensional model that obtains is out of shape further, makes it to coincide with depth information and more accurate.
The design of similar frame search method, specifically comprises the following steps:
4-1) first the skeleton in target skeleton and database being snapped in same world coordinates, rotates skeleton and make it towards same direction, then obtaining candidate frame by minimizing the retrieval of following energy function:
Wherein F is unknown candidate frame, and N (F) is candidate sequence frame number, and I (F) is the frame number of database original series, and M is target sequence frame number,
the skeleton knot vector of database sequence i-th frame,
the skeleton knot vector of database sequence i-th-1 frame,
the skeleton knot vector of target sequence i-th frame,
be the skeleton knot vector of target sequence i-th-1 frame, α, β represent the weight allocation between two constraints, and skeleton similarity is defined as follows:
Wherein m and n can represent d or q in formula (1) respectively, S
jbe the position of a jth skeleton node, and J is interstitial content, σ
jit is the variance of node j position in a database; In formula (1), Section 1 is space constraint, ensures the similarity of candidate frame and target frame; Two item constraints are below to ensure temporal continuity, avoid shake.
Synthesis target frame sequence is the comparatively popular sparse representation method based on occurring recently, removes noise step as follows:
5-1) 3 d surface model of every frame is all obtained by same three-dimensional model deformation in assumption database and in target sequence, and they have identical summit and Topology connection; Dividing processing is carried out to three-dimensional model, is always divided into 16 parts and by mapping, the character image of target frame and retrieval frame is also divided into 16 parts; Calculate target frame and the respective pixel of retrieval frame, be guide with corresponding point, utilize Moving Least Squares method will retrieve frame distortion;
5-2) step 5-1) early results that obtains may comprise some missing information, because retrieval frame can not comprise all information of target frame, the method adopting the matrix based on rarefaction representation to fill up carries out interpolation to these regions, finally obtain every two field picture, synthesize new video sequence; For the method that the matrix based on rarefaction representation is filled up, first will ask for priority in the edge on current imperfect image, the sort criteria of priority considers texture and the depth information of block; Then, fill up according to the priority orders of each piece, for some blocks to be filled up, find the K similar to it similar piece; If the pixel P repaired
iresiding w × w block B
0, and K similar piece is designated B respectively
1, B
2, B
3... B
k, obtain a matrix D with these matrixes as each row.The method filled up based on the matrix of rarefaction representation is expressed as following optimization problem:
min
A,Erank(A)+λ||W°E||
0s.t.P
Ω(D)=P
Ω(A+E) (3)
Wherein, A is original matrix to be asked, and E is nonzero element number, and D is observing matrix, and W is the weight matrix considering similar piece of similarity, and λ is the weight of noise size; The order of rank (A) representing matrix A, || ||
00 norm of representing matrix, " ° " represents that two matrix corresponding elements are multiplied, and Ω is the index set of known elements, P
Ωfor projecting to the operator in Ω territory.Above optimization problem is NP hard (Non-deterministic Polynomial-time hard) problem, therefore uses the nuclear norm of matrix || ||
*rank of matrix is replaced to minimize, by matrix 1 norm || ||
1replace 0 norm, above-mentioned problem is converted into:
min
A,E||A||
*+λ||W°E||
1,s.t.P
Ω(D)=P
Ω(A+E) (4)
After solving out and obtaining matrix A, take out first row, be restructured as the block of pixels size of w × w, be the current block that method that the matrix based on rarefaction representation fills up recovers, the block of pixels of other positions by that analogy.
Compared with the prior art, technical characterstic of the present invention and effect:
The problems such as the experiment condition needs that inventive process avoids polyphaser collecting work room and the sense of reality having new animation producing method is poor, action limitation, by adopting a Kinect v2.0 collected by camera, formation base database, by retrieval coupling, distortion, repairing etc., realize the new video sequences of the strong sense of reality generating any action.There is following characteristics:
1, equipment is cheap and experiment condition simple, is easy to realize.
2, the sense of reality is very strong, can obtain effect comparatively true to nature for tiny texture variations.
3, database movement collection only need be carried out once, based on this data set, can generate the new video sequences that validity that action edits arbitrarily is very high.
The present invention can adopt single Kinect v2.0 to carry out three-dimensional reconstruction and data acquisition, simple in realization, and has the effect allowing people comparatively be satisfied with.The method proposed has good extensibility: can by utilizing the flexible change on the increase of the information of multiple Kinect v2.0 fulfillment database and visual angle.
Accompanying drawing explanation
The present invention above-mentioned and/or additional aspect and advantage will become obvious and easy understand from the following description of the accompanying drawings of embodiments, wherein:
Fig. 1 is the Realistic animation generation method flow diagram of the embodiment of the present invention based on body sense camera Kinect v2.0;
Fig. 2 is that the embodiment of the present invention adopts proposed invention method to generate the result of new element video sequence;
Fig. 3 is the result result of three kinds of different new elements being placed on a certain frame of the video synthesized in captured background video.
Embodiment
The present invention is intended to the deficiency overcoming existing new animation producing scheme, realizes can ensureing to generate the true to nature credible of new video sequences, can solve again the method that experimental facilities requires high difficult point.The present invention is by using a Kinect v2.0 as the equipment of acquisition database, and by carrying out the process such as motion tracking to data, set up elemental motion database, based on this, by generating the realistic video of new element based on the texture synthesis method of rarefaction representation.
The present invention proposes a kind of Realistic animation generation method based on body sense camera Kinect v2.0, by reference to the accompanying drawings and embodiment be described in detail as follows:
The present invention utilizes single Kinect v2.0 camera personage to be carried out to the collection of static three-dimensional model and color depth framework information.Utilize unmarked motion capture method, obtain the skeleton motion information of every frame and the three-dimensional model after being out of shape.After user specifies a new element, based on set up data set, pass through the texture synthesis method that the proposed low-rank matrix based on rarefaction representation fills and generate the video that this personage does new element.As shown in Figure 1, for the Realistic animation based on body sense camera Kinect v2.0 of the embodiment of the present invention generates method flow diagram, comprise the following steps:
1) based on the method for Kinect Fusion (project that the real-time static three-dimensional model that Kinect official issues is rebuild), personage is carried out to the reconstruction of static three-dimensional model, and on the basis of this model (with reference to attitude mode), manually embed skeleton;
2) Database: the frame sequence of color video, deep video and correspondence when adopting single this personage of Kinect v2.0 cameras capture to carry out various elemental motion; IK (inverse kinematics) algorithm is utilized to calculate the kinematic parameter of each frame skeleton node; Adopt linear hybrid skinning method (linear blend skinning) to calculate skin weight and be out of shape and obtain three-dimensional model corresponding to every frame with reference to attitude mode; Model after adopting depth information to correct distortion makes it more accurate.Data set only needs to set up once, can generate the video sequence of the new element of this personage based on this database;
3) user specifies the frame sequence of new action.The topological structure of this skeleton is identical with the topological structure of data centralization skeleton;
4) for every frame of fresh target action finds similar candidate frame in data centralization;
5) utilize Moving Least Squares method that the human body parts of the frame retrieved is carried out deformation to be estimated at first, then utilize weighting low-rank matrix to fill up interpolation method and be optimized;
6) by step 5) final image that obtains puts together the new video sequences obtaining requiring.
The design of database building plan, specifically comprises the following steps:
2-1) for the seizure of color video, deep video and framework information, Kinect v2.0 is utilized to carry out Real-time Collection preservation, arranging Kinect v2.0 frame per second is 30fps, and color image resolution is 1920 × 1080, and depth image resolution is set to 512 × 424; The collection of frame sequence comprises the preservation of the preservation of node space three-dimensional information and the mapping relations with color depth information; The personage's elemental motion gathered comprises as walked, running, stride, wave, stretch, frontly to play, side is played, chest expanding, lateral bending and time etc.;
2-2) utilize the three-dimensional information of each frame frame sequence collected, input each node space three dimensional local information and carry out IK Algorithm for Solving, obtain the rotation translation matrix of each node, that comprise its overall situation with kinematic matrix that is local;
2-3) calculate skin weight with linear hybrid skinning method (linear blend skinning) and distortion obtains three-dimensional model corresponding to every frame with reference to attitude mode;
2-4) adopting based on non-rigid surface's alignment schemes of rarefaction representation, utilize depth information, by step 2-3) three-dimensional model that obtains is out of shape further, makes it to coincide with depth information and more accurate.
The design of similar frame search method, specifically comprises the following steps:
4-1) first the skeleton in target skeleton and database being snapped in same world coordinates, rotates skeleton and make it towards same direction, then obtaining candidate frame by minimizing the retrieval of following energy function:
Wherein F is unknown candidate frame, and N (F) is candidate sequence frame number, and I (F) is the frame number of database original series, and M is target sequence frame number,
the skeleton knot vector of database sequence i-th frame,
the skeleton knot vector of database sequence i-th-1 frame,
the skeleton knot vector of target sequence i-th frame,
be the skeleton knot vector of target sequence i-th-1 frame, α, β represent the weight allocation between two constraints, and skeleton similarity is defined as follows:
Wherein m and n can represent d or q in formula (1) respectively, S
jbe the position of a jth skeleton node, and J is interstitial content, σ
jit is the variance of node j position in a database; In formula (1), Section 1 is space constraint, ensures the similarity of candidate frame and target frame; Two item constraints are below to ensure temporal continuity, avoid shake.
Synthesis target frame sequence is the comparatively popular sparse representation method based on occurring recently: low-rank matrix is filled up.The method can not only repair image lack part, can also remove noise; Concrete grammar comprises the following steps:
5-1) 3 d surface model of every frame is all obtained by same three-dimensional model deformation in assumption database and in target sequence, and they have identical summit and Topology connection; Dividing processing is carried out to three-dimensional model, is always divided into 16 parts and by mapping, the character image of target frame and retrieval frame is also divided into 16 parts; Calculate target frame and the respective pixel of retrieval frame, be guide with corresponding point, utilize Moving Least Squares method will retrieve frame distortion;
5-2) step 5-1) early results that obtains may comprise some missing information, because retrieval frame can not comprise all information of target frame.The method that we adopt the matrix based on rarefaction representation to fill up carries out interpolation to these regions, finally obtains every two field picture, synthesizes new video sequence; For the method that the matrix based on rarefaction representation is filled up, first will ask for priority in the edge on current imperfect image, the sort criteria of priority considers texture and the depth information of block.Then, fill up according to the priority orders of each piece.For some blocks to be filled up, find the K similar to it similar piece.If the pixel P repaired
iresiding w × w block B
0, and K similar piece is designated B respectively
1, B
2, B
3... B
k, obtain a matrix D with these matrixes as each row.The method filled up based on the matrix of rarefaction representation is expressed as following optimization problem:
min
A,Erank(A)+λ||W°E||
0s.t.P
Ω(D)=P
Ω(A+E) (3)
Wherein, A is original matrix to be asked, and E is nonzero element number, and D is observing matrix, and W is the weight matrix considering similar piece of similarity, and λ is the weight of noise size; The order of rank (A) representing matrix A, || ||
00 norm of representing matrix, " ° " represents that two matrix corresponding elements are multiplied, and Ω is the index set of known elements, P
Ωfor projecting to the operator in Ω territory.Above optimization problem is NP hard (Non-deterministic Polynomial-time hard) problem, therefore uses the nuclear norm of matrix || ||
*rank of matrix is replaced to minimize, by matrix 1 norm || ||
1replace 0 norm, above-mentioned problem can be converted into:
min
A,E||A||
*+λ||W°E||
1,s.t.P
Ω(D)=P
Ω(A+E) (4)
After solving out and obtaining matrix A, take out first row, be restructured as the block of pixels size of w × w, be the current block that method that the matrix based on rarefaction representation fills up recovers.The block of pixels of other positions by that analogy.