CN103745462A

CN103745462A - Human body mouth shape video reconfiguration system and reconfiguration method

Info

Publication number: CN103745462A
Application number: CN201310745441.XA
Authority: CN
Inventors: 孟濬; 黄吉羊; 刘琼
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-12-27
Filing date: 2013-12-27
Publication date: 2014-04-23
Anticipated expiration: 2033-12-27
Also published as: CN103745462B

Abstract

The invention provides a human body mouth shape video reconfiguration system and a corresponding method based on the annular resilient space dynamic time evolution. The method comprises the steps of information reading, preprocessing, mouth shape reconfiguration and video output and adopts a correlation inversion method and a logic correction method. By adopting the reconfiguration method and the reconfiguration system, not only can the inversion of the read mouth shape be realized on a single-frame image to generate a reconfigured human body mouth shape video, but also the correction of the read mouth shape information can be realized on the video consisting of a multiple frame images to generate a configured human body mouth shape video. Compared with the traditional mouth shape reconfiguration method and system, the method and system are accurate and high efficient, no data base is needed, and the flexibility for changing the mouth shapes can be enhanced while the space is saved. Preferably, all units of the system are integrated onto one intelligent terminal, and the intelligent terminal can be different intelligent mobile phones, flat computers (such as iPad), handheld computers, intelligent handheld game machines and the like.

Description

A kind of human body shape of the mouth as one speaks video reconstruction system and reconstructing method

Technical field

The present invention relates to field of video image processing, be specifically related to a kind of human body shape of the mouth as one speaks video reconstruction system and reconstructing method based on cyclic spring spatial dynamics temporal evolution.

Background technology

Along with the development of computer technology be gradually improved, the moulding of people's face and animation also more and more receive people's concern as a unique branch in computer graphics, and wherein the variation for the human body shape of the mouth as one speaks in video, image is widely used especially.Many occasions need to be reconstructed the shape of the mouth as one speaks of the people in existing video or image, by a static image, generate a series of shape of the mouth as one speaks and move, or the shape of the mouth as one speaks in existing video is revised.In order to reach such object, existing technical method is all generally by existing multitude of video image information analyzing and processing in addition, sets up mouth shape data storehouse, then carries out calling relevant information from described mouth shape data storehouse for particular problem.Although such technological means can convert the human body shape of the mouth as one speaks in video, image comparatively exactly, its limitation is also obvious.On the one hand, its realization depends on the huge mouth shape data storehouse building in advance, needs huge data sample, portable poor; On the other hand, the realization of algorithm relates to a large amount of computational analysiss, and complexity is very high, has also limited its range of application.

Summary of the invention

For the deficiencies in the prior art, technical matters to be solved by this invention is to provide high, the portable good human body shape of the mouth as one speaks video reconstruction method and system of a kind of precision, with the evolution to video according to needed shape of the mouth as one speaks realize target object single-frame images, or modification and the inverting of the video of realize target object multiple image composition.Traditional shape of the mouth as one speaks converter technique depends on huge mouth shape data storehouse, has comprised sound bank and corresponding mouth shape image with it in this mouth shape data storehouse, so that called in conversion, has taken on the one hand a large amount of spaces; On the other hand because this mouth shape data storehouse itself can not independently build the shape of the mouth as one speaks making new advances, the transformation problem that does not comprise the shape of the mouth as one speaks in cannot process database in practice.System of the present invention is different from traditional shape of the mouth as one speaks transformation system, does not need such mouth shape data storehouse, can complete quickly and accurately the video reconstruction of the human body shape of the mouth as one speaks.

The technical solution used in the present invention is as follows:

A human body shape of the mouth as one speaks video reconstruction method, specifically comprises following four steps:

(1) information is read in: from input port, read in human body information and Shape of mouth, described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;

(2) pre-service: the Shape of mouth that input port is read in is identified conversion and the Shape of mouth after identification conversion is shown in real time at display module, and the position of oral area is analyzed and locked to the human body information that input port is read in;

(3) shape of the mouth as one speaks reconstruct: the temporal evolution method based on cyclic spring spatial dynamics, according to pretreated Shape of mouth and human body information, carry out human body shape of the mouth as one speaks video reconstruction;

(4) video output: the human body shape of the mouth as one speaks video after delivery outlet output reconstruct completes.

The process flow diagram of technical solution of the present invention as shown in Figure 1.

In described step (3), the method for described shape of the mouth as one speaks reconstruct is based on cyclic spring spatial dynamics temporal evolution.Described cyclic spring space is a kind of order of point and plane space of distance of having defined, and it has following 4 character:

1, any two points P in cyclic spring space ₁and P ₂-, distance variable therebetween.

2, any two points P in cyclic spring space ₁and P ₂-, its order is strictly constant, that is: choose in cyclic spring space and differ from P ₁, P ₂any point P ₃, the order of these 3 clockwise (or counterclockwise) can not change in conversion arbitrarily.

3, can be subject to transverse axis angle be the effect that α, size are the power F of f to any point P in cyclic spring space, and therefore produce locational variation, and direction produces certain displacement along being α with transverse axis angle to show as relative original position.

4, any point P in cyclic spring space is subject to the used time of doing of a power F, and this power F also can have influence in cyclic spring space other point when affecting P, makes it be equal to that to be subject to one be α with transverse axis angle ^', size is f ^'the effect of power, be called correlation.This locus with respect to P has determined α ^'size, the distance of this point and P has determined f ^'size, when coverage R that the distance of this point and P is greater than, think the correlation impact of its F that do not stress.

Cyclic spring space schematic diagram as shown in Figure 2.

The conversion of the shape of the mouth as one speaks is that the musculus orbicularis oris of lip affects generation by buccal branch of facial nerve domination, therefore for the shape of the mouth as one speaks, can set up described cyclic spring spatial model and study.When t constantly the shape of the mouth as one speaks change, can think now to put P for certain n on this cyclic spring space ₁, P ₂..., P _nbe subject to respectively power F ₁, F ₂..., F _neffect, the acting in conjunction of this n power makes this cyclic spring space that local displacement, rotation or stretching occur, and produces the conversion of the shape of the mouth as one speaks.In described step (3), system processing module can pick out the position of the shape of the mouth as one speaks in video, image and change based on seasonal effect in time series, set up corresponding cyclic spring spatial model, while extracting each t, be engraved in the effect of the power producing on this model regional.Meanwhile, the described human body information of recycling is set up new cyclic spring spatial model, by extracted power according to corresponding time effect the correspondence position on new cyclic spring spatial model, can complete human body shape of the mouth as one speaks video reconstruction.Described correspondence position can determine by 4 outline lines and the unique point on outline line of the shape of the mouth as one speaks, and in order to guarantee the precision of conversion, the unique point in practical operation on every outline line should be more than or equal to 3, as shown in Figure 3.The process of described definite correspondence position is the association based on cyclic spring space.

As preferably, in described step (3) is association inversion method based on cyclic spring spatial dynamics temporal evolution method, the Shape of mouth being demonstrated as synchronization object simulant display model by on-the-spot true man, by Real-time Collection module, gather analog video again, the human body information having read carries out the coupling based on cyclic spring space, thereby completes the reconstruct of human body shape of the mouth as one speaks video.As shown in Figure 4, in this method, the Shape of mouth of reconstruct is treated in synchronization object in-site modeling, this process is collected as analog video, based on this analog video, set up cyclic spring spatial model and then it is carried out to analyzing and processing, can make the Shape of mouth for the treatment of reconstruct be reappeared accurately and efficiently on the human body information of destination object, thereby realize this shape of the mouth as one speaks in the reconstruct of destination object oral area.This method schematic flow sheet as shown in Figure 6.Particularly, the Shape of mouth simulation shape of the mouth as one speaks that synchronization object demonstrates according to display module, for example, read the passage of demonstration or imitate the some shape of the mouth as one speaks pictures that show, now, processing module is controlled the analog video that Real-time Collection module gathers synchronization object, as the foundation of shape of the mouth as one speaks reconstruct.After collection completes, processing module is on average divided into n frame (when described sample shape of the mouth as one speaks video duration is T during second, having n=TN), respectively corresponding time t by the analog video collecting according to certain frame number N ₁, t ₂..., t _n, locate the shape of the mouth as one speaks of each frame, and profile and the unique point of the shape of the mouth as one speaks in the profile of the shape of the mouth as one speaks and unique point and the human body information that read linked corresponding.Described frame number N can determine according to actual conditions, meet sampling thheorem and can reflect the Shape of mouth of required reconstruct with the image after guaranteeing to cut apart; The frequency of cutting apart is higher, and the complexity of shape of the mouth as one speaks reconstruct is higher, and the precision of reconstruct is also higher; The frequency of cutting apart is lower, and the complexity of shape of the mouth as one speaks reconstruct is lower, and the precision of reconstruct is also lower.When the human body information reading in step (1) is single-frame images, described interlock correspondence refers to the shape of the mouth as one speaks unique point in each frame of analog video is all corresponded on single frames human body information image; When the human body information reading in step (1) is the video of multiple image composition, described interlock correspondence refers to the shape of the mouth as one speaks unique point in each frame of analog video is all corresponded on the corresponding frame of human body information video.Described corresponding frame can be determined by method below: the frame figure that the frame figure that human body information Video segmentation is gone out and analog video are partitioned into is numbered, if the frame number of human body information video and analog video is equal, described corresponding frame is the frame that numbering is identical; If the frame number of human body information video and analog video is unequal, described corresponding frame is the identical frame in proportion position in sum.When the frame number of analog video is greater than human body information video frame number, unnecessary frame is cast out in proportion; When the frame number of analog video is less than human body information video frame number, not enough frame is carried out to interpolation processing in proportion, the middle entry shape of the mouth as one speaks of interpolation is constructed by the dynamics temporal evolution based on cyclic spring space.After having completed the interlock correspondence of the shape of the mouth as one speaks, can draw according to the mutation analysis of the shape of the mouth as one speaks of i frame to the (i+1) frame in analog video the effect of this power that each unique point is subject in corresponding cyclic spring spatial model constantly of t=(i/N) second, the masterpiece obtaining is used in the cyclic spring spatial model that human body information is corresponding, can completes this reconstruct of Shape of mouth constantly in t=(i/N) second.After each frame figure reconstruct of new video completes, the human body shape of the mouth as one speaks video after obtaining reconstruct and completing.

As preferably, in described step (3) is logic revised law based on cyclic spring spatial dynamics temporal evolution method, not relying on on-the-spot true man deduces, directly according to required Shape of mouth, call shape of the mouth as one speaks primitive module and build artificially shape of the mouth as one speaks state template, then complete video reconstruction by the transition state of the dynamics temporal evolution generation disappearance based on cyclic spring space.As shown in Figure 5, this method is without synchronization object in-site modeling, but on the basis of human body information and Shape of mouth, generate shape of the mouth as one speaks state template by calling artificially shape of the mouth as one speaks primitive, set up again the shape of the mouth as one speaks video that the evolution of cyclic spring spatial model generates destination object, the video reconstruction of realize target object oral area.This method schematic flow sheet as shown in Figure 7.Described shape of the mouth as one speaks primitive is the model of the basic scenario of the human body shape of the mouth as one speaks, for example a shape of the mouth as one speaks (opening one's mouth) in phonetic, the o shape of the mouth as one speaks (pouting one's lips), i shape of the mouth as one speaks (grinning) etc., can generate by the dynamics temporal evolution based on cyclic spring space the shape of the mouth as one speaks state of all transition.The shape of the mouth as one speaks state of described transition refers to that a shape of the mouth as one speaks primitive transforms to the shape of the mouth as one speaks state producing in the process of another shape of the mouth as one speaks primitive, for example, from the shape of the mouth as one speaks primitive of remaining silent to the shape of the mouth as one speaks primitive that sends phonetic " a ", the shape of the mouth as one speaks state of its transition is exactly that oral area slowly magnifies the shape of the mouth as one speaks in process.Particularly, when display module demonstrates the Shape of mouth of required reconstruct, can in shape of the mouth as one speaks storehouse, choose artificially n the basic shape of the mouth as one speaks that meets demand the shape of the mouth as one speaks in ad-hoc location frame is carried out to association correction, simulate and construct based on seasonal effect in time series shape of the mouth as one speaks state template.When the human body information reading in step (1) is single-frame images, the information in described shape of the mouth as one speaks state template outside the shape of the mouth as one speaks is all expanded by single-frame images; When the human body information reading in step (1) is the video of multiple image composition, the information in described shape of the mouth as one speaks state template outside the shape of the mouth as one speaks is consistent with video.Information outside the described shape of the mouth as one speaks is all information outside oral area in image or video, comprises other parts (such as nose, eye, cheek, trunk, four limbs etc.) and the residing environment of people outside human body oral area.For example, the rocking of the blinking of eye, health, other people process etc. is all considered as the variation that all information outside oral area occur after one's death.After shape of the mouth as one speaks state template has built, then the oral area peripheral position in human body information is carried out to the association based on cyclic spring space, make the variation of oral area cause corresponding impact to the region of oral area peripheral extent, construct corresponding cyclic spring spatial model.Now, analyze the variation of the individual shape of the mouth as one speaks primitive of i shape of the mouth as one speaks primitive to the (i+1), can draw the effect of the power that in cyclic spring spatial model corresponding to i stage, each point is received, again the effect of power is extended in longer time series, can obtain all shape of the mouth as one speaks transition states in these two stages.When the whole reconstruct of (n-1) individual transition state completes, realize the reconstruct of human body shape of the mouth as one speaks video.

For association inversion method, the present invention also provides a kind of human body shape of the mouth as one speaks video reconstruction system, comprises input port, delivery outlet, processing module, display module and Real-time Collection module, wherein:

Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;

Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after reconstruct completes;

Described display module shows in real time for the Shape of mouth that input port is read in;

Described processing module is carried out conversion process for the Shape of mouth that input port is read in, then on the basis of human body information, realizes the reconstruct of human body shape of the mouth as one speaks video;

Described Real-time Collection module is for carrying out Real-time Collection in the process that adopts association inversion method to be reconstructed to the video of synchronization object.

The connected mode of modules as shown in Figure 8.Wherein, between described input port and processing module, between processing module and delivery outlet, between processing module and Real-time Collection module, between processing module and display module, can partly or entirely by wired or wireless mode, be connected, to guarantee effective transmission of data.Can all adopt wired mode to connect according to actual needs, all adopt wireless mode to connect, or part adopts, and wired mode connects, part adopts wireless mode to connect.

Described processing module is the terminal with video image processing and information analysis ability, can be selected from digit chip, intelligent terminal.Described intelligent terminal refers to can catch external information, can calculate, analyzes and process, and between different terminals, can carry out the equipment of communication, includes but not limited to desktop computer, notebook computer, mobile intelligent terminal.The described mobile intelligent terminal intelligent terminal that is of portable form, includes but not limited to various smart mobile phones, panel computer (as iPad etc.), palm PC, intelligent handheld device.Described digit chip refers to, through design, adopt integrated electronic technique, the chip that can calculate, analyze and process, and can control other equipment by expansion, include but not limited to single-chip microcomputer, ARM, DSP, FPGA etc.

Described Real-time Collection module is selected from video camera, camera, camera, digitized image equipment, have camera function intelligent terminal any one or multiple arbitrarily.

Described display module be selected from display, display screen, projector, intelligent terminal any one or multiple arbitrarily.

Particularly, the Shape of mouth simulation shape of the mouth as one speaks that synchronization object demonstrates according to display module, for example, read the passage of demonstration or imitate the some shape of the mouth as one speaks pictures that show, now, processing module is controlled the analog video that Real-time Collection module gathers synchronization object, as the foundation of shape of the mouth as one speaks reconstruct.After collection completes, processing module is on average divided into n frame (when described sample shape of the mouth as one speaks video duration is T during second, having n=TN), respectively corresponding time t by the analog video collecting according to certain frame number N ₁, t ₂..., t _n, locate the shape of the mouth as one speaks of each frame, and profile and the unique point of the shape of the mouth as one speaks in the profile of the shape of the mouth as one speaks and unique point and the human body information that read linked corresponding.The described frequency of cutting apart can determine according to actual conditions, meet sampling thheorem and can reflect the Shape of mouth of required reconstruct with the image after guaranteeing to cut apart; The frequency of cutting apart is higher, and the complexity of shape of the mouth as one speaks reconstruct is higher, and the precision of reconstruct is also higher; The frequency of cutting apart is lower, and the complexity of shape of the mouth as one speaks reconstruct is lower, and the precision of reconstruct is also lower.When the human body information reading in described input port is single-frame images, described interlock correspondence refers to the shape of the mouth as one speaks unique point in each frame of analog video is all corresponded on single frames human body information image; When the human body information reading in described input port is the video of multiple image composition, described interlock correspondence refers to the shape of the mouth as one speaks unique point in each frame of analog video is all corresponded on the corresponding frame of human body information video.Described corresponding frame can be determined by method below: the frame figure that the frame figure that human body information Video segmentation is gone out and analog video are partitioned into is numbered, if the frame number of human body information video and analog video is equal, described corresponding frame is the frame that numbering is identical; If the frame number of human body information video and analog video is unequal, described corresponding frame is the identical frame in proportion position in sum.When the frame number of analog video is greater than human body information video frame number, unnecessary frame is cast out in proportion; When the frame number of analog video is less than human body information video frame number, not enough frame is carried out to interpolation processing in proportion, the middle entry shape of the mouth as one speaks of interpolation is constructed by the dynamics temporal evolution based on cyclic spring space.After having completed the interlock correspondence of the shape of the mouth as one speaks, can draw according to the mutation analysis of the shape of the mouth as one speaks of i frame to the (i+1) frame in analog video the effect of this power that each unique point is subject in corresponding cyclic spring spatial model constantly of t=(i/N) second, the masterpiece obtaining is used in the cyclic spring spatial model that human body information is corresponding, can completes this reconstruct of Shape of mouth constantly in t=(i/N) second.After each frame figure reconstruct of new video completes, the human body shape of the mouth as one speaks video after obtaining reconstruct and completing.

For logic revised law, the present invention also provides a kind of human body shape of the mouth as one speaks video reconstruction system, comprises input port, delivery outlet, processing module, display module and shape of the mouth as one speaks primitive module, wherein:

Described shape of the mouth as one speaks primitive module, for storing basic shape of the mouth as one speaks primitive, in order to call in the process adopting logic revised law to be reconstructed, builds shape of the mouth as one speaks state template artificially.

Described shape of the mouth as one speaks primitive is the model of the basic scenario of the human body shape of the mouth as one speaks, for example a shape of the mouth as one speaks (opening one's mouth) in phonetic, the o shape of the mouth as one speaks (pouting one's lips), i shape of the mouth as one speaks (grinning) etc., can generate by the dynamics temporal evolution based on cyclic spring space the shape of the mouth as one speaks state of all transition.The shape of the mouth as one speaks state of described transition refers to that a shape of the mouth as one speaks primitive transforms to the shape of the mouth as one speaks state producing in the process of another shape of the mouth as one speaks primitive, for example, from the shape of the mouth as one speaks primitive of remaining silent to the shape of the mouth as one speaks primitive that sends phonetic " a ", the shape of the mouth as one speaks state of its transition is exactly that oral area slowly magnifies the shape of the mouth as one speaks in process.

The connected mode of modules as shown in Figure 9.Wherein, between described input port and processing module, between processing module and delivery outlet, between processing module and shape of the mouth as one speaks primitive module, between processing module and display module, can partly or entirely by wired or wireless mode, be connected, to guarantee effective transmission of data.Can all adopt wired mode to connect according to actual needs, all adopt wireless mode to connect, or part adopts, and wired mode connects, part adopts wireless mode to connect.

Described processing module is the terminal with video image processing and information analysis ability, comprises and is selected from digit chip, intelligent terminal.Described intelligent terminal refers to can catch external information, can calculate, analyzes and process, and between different terminals, can carry out the equipment of communication, includes but not limited to desktop computer, notebook computer, mobile intelligent terminal.The described mobile intelligent terminal intelligent terminal that is of portable form, includes but not limited to various smart mobile phones, panel computer (as iPad etc.), palm PC, intelligent handheld device.Described digit chip refers to, through design, adopt integrated electronic technique, the chip that can calculate, analyze and process, and can control other equipment by expansion, include but not limited to single-chip microcomputer, ARM, DSP, FPGA etc.

Described shape of the mouth as one speaks primitive module, for storing basic shape of the mouth as one speaks model, in order to call in the process adopting logic revised law to be reconstructed, builds shape of the mouth as one speaks state template artificially.Traditional shape of the mouth as one speaks converter technique depends on huge mouth shape data storehouse, in this mouth shape data storehouse, comprised sound bank and with it corresponding mouth shape image so that called, taken on the one hand a large amount of spaces in conversion; On the other hand because this mouth shape data storehouse itself can not independently build the shape of the mouth as one speaks making new advances, the transformation problem that does not comprise the shape of the mouth as one speaks in cannot process database in practice.System of the present invention is different from traditional shape of the mouth as one speaks transformation system, does not need such mouth shape data storehouse, can complete quickly and accurately the video reconstruction of the human body shape of the mouth as one speaks.

As preferably, shape of the mouth as one speaks video reconstruction system of the present invention can be desktop computer, notebook computer or the mobile intelligent terminal with camera function.The described mobile intelligent terminal intelligent terminal that is of portable form, includes but not limited to various smart mobile phones, panel computer (as iPad etc.), palm PC, intelligent handheld device.Particularly, shape of the mouth as one speaks video reconstruction system of the present invention can be only a desktop computer with camera function, or a notebook computer with camera function, or a mobile intelligent terminal with camera function.Now, the communication of equipment and data transmission module, as input port and the delivery outlet of system, are processed kernel as the processing module of system, and camera is as the Real-time Collection module of system, display screen is as the display module of system, and storage unit is as the shape of the mouth as one speaks primitive module of system.Shape of the mouth as one speaks video reconstruction system of the present invention can be also the combination of desktop computer, notebook computer or the mobile intelligent terminal with camera function, for example, there is the camera of mobile intelligent terminal of camera function and display screen respectively as Real-time Collection module and display module, the communication module of notebook computer, process kernel and storage unit respectively as input/output port, processing module and the shape of the mouth as one speaks primitive module of system, etc.

As preferably, in described step (3) is logic revised law based on cyclic spring spatial dynamics temporal evolution method, not relying on on-the-spot true man deduces, directly according to required Shape of mouth, call shape of the mouth as one speaks primitive module and build artificially shape of the mouth as one speaks state template, by the dynamics temporal evolution based on cyclic spring space, generate the transition state lacking again and complete video reconstruction, its schematic flow sheet as shown in Figure 7.Described shape of the mouth as one speaks primitive is the model of the basic scenario of the human body shape of the mouth as one speaks, for example a shape of the mouth as one speaks (opening one's mouth) in phonetic, the o shape of the mouth as one speaks (pouting one's lips), i shape of the mouth as one speaks (grinning) etc., can generate by the dynamics temporal evolution based on cyclic spring space the shape of the mouth as one speaks state of all transition.The shape of the mouth as one speaks state of described transition refers to that a shape of the mouth as one speaks primitive transforms to the shape of the mouth as one speaks state producing in the process of another shape of the mouth as one speaks primitive, for example, from the shape of the mouth as one speaks primitive of remaining silent to the shape of the mouth as one speaks primitive that sends phonetic " a ", the shape of the mouth as one speaks state of its transition is exactly that oral area slowly magnifies the shape of the mouth as one speaks in process.Particularly, when display module demonstrates the Shape of mouth of required reconstruct, can in shape of the mouth as one speaks storehouse, choose artificially n the basic shape of the mouth as one speaks that meets demand the shape of the mouth as one speaks in ad-hoc location frame is carried out to association correction, simulate and construct based on seasonal effect in time series shape of the mouth as one speaks state template.When the human body information reading in described input port is single-frame images, the information in described shape of the mouth as one speaks state template outside the shape of the mouth as one speaks is all expanded by single-frame images; When the human body information reading in described input port is the video of multiple image composition, the information in described shape of the mouth as one speaks state template outside the shape of the mouth as one speaks is consistent with video.Information outside the described shape of the mouth as one speaks is all information outside oral area in image or video, comprises other parts (such as nose, eye, cheek, trunk, four limbs etc.) and the residing environment of people outside human body oral area.For example, the rocking of the blinking of eye, health, other people process etc. is all considered as the variation that all information outside oral area occur after one's death.After shape of the mouth as one speaks state template has built, then the oral area peripheral position in human body information is carried out to the association based on cyclic spring space, make the variation of oral area cause corresponding impact to the region of oral area peripheral extent, construct corresponding cyclic spring spatial model.Now, analyze the variation of the individual shape of the mouth as one speaks primitive of i shape of the mouth as one speaks primitive to the (i+1), can draw the effect of the power that in cyclic spring spatial model corresponding to i stage, each point is received, again the effect of power is extended in longer time series, can obtain all shape of the mouth as one speaks transition states in these two stages.When the whole reconstruct of (n-1) individual transition state completes, realize the reconstruct of human body shape of the mouth as one speaks video.

The invention has the beneficial effects as follows:

(1) the present invention both can realize the inverting of reading in Shape of mouth on this single-frame images, generate the human body shape of the mouth as one speaks video after reconstruct, on the video that also can form at multiple image, realize the correction of reading in Shape of mouth, generate the human body shape of the mouth as one speaks video after reconstruct, have very strong applicability.

(2) the present invention has association inversion and two kinds of specific embodiments of logic correction, and the former can complete the reconstruct of human body shape of the mouth as one speaks video fast, efficiently by on-the-spot true man's synchronous deduction; But the latter needs people for calling shape of the mouth as one speaks primitive does not rely on on-the-spot deduction, can realize off-line and revise, and two kinds of methods can meet the demand of shape of the mouth as one speaks video reconstruction under different situations.

(3) the present invention configures simply aspect system hardware, with low cost; Software aspect also only needs common video, image processing software and small-sized shape of the mouth as one speaks primitive, do not relate to extra Software deployment, especially relatively traditional shape of the mouth as one speaks reconfiguration system, system of the present invention, without database, has also strengthened the flexibility ratio of shape of the mouth as one speaks conversion when having saved space.

(4) more preferably, all unit of system of the present invention can be integrated on an intelligent terminal, and described intelligent terminal can be smart mobile phone, panel computer, palm PC, intelligent handheld device, therefore have very high portability.

Accompanying drawing explanation

Fig. 1 is the inventive method process flow diagram.

Fig. 2 is cyclic spring space schematic diagram.

Fig. 3 is the schematic diagram of outline line and unique point when shape of the mouth as one speaks position is corresponding in the inventive method, in figure, L1 to L4 and L1 ' are respectively the outline line of two shape of the mouth as one speaks to L4 ', P1 to P6 and P1 ' are respectively two key points on shape of the mouth as one speaks outline line to P6 ', and need to guarantee has at least 3 corresponding point to guarantee the accuracy of conversion on every outline line.

Fig. 4 is the information conversion sketch of association inversion method in the present invention.

Fig. 5 is the information conversion sketch of logic revised law in the present invention.

Fig. 6 is the schematic flow sheet of association inversion method in the present invention.

Fig. 7 is the schematic flow sheet of logic revised law in the present invention.

Fig. 8 is system construction drawing corresponding to association inversion method of the present invention.

Fig. 9 is system construction drawing corresponding to logic revised law of the present invention.

Embodiment

In order to illustrate in greater detail human body shape of the mouth as one speaks video reconstruction method of the present invention, describe with reference to the accompanying drawings the present invention below in detail.

Embodiment 1

As shown in Figure 6, using B as synchronization object, using association inversion method from a photo of destination object A, to reconstruct the video that A reads aloud a lecture original text is example, illustrates shape of the mouth as one speaks reconstructing method of the present invention.Here using a desktop computer that has been equipped with camera as reconfiguration system, wherein: USB interface is as input, the delivery outlet of system, processor is as the processing module of system, and camera is as the Real-time Collection module of system, and display is as the display module of system.

(1) information is read in: system is read in the photo of A as pending human body information from USB interface, read in speech draft document as pending Shape of mouth.

(2) pre-service: it is text formatting that processor identifies Shape of mouth, considers that the utilization of association inversion method is convenient, directly the Shape of mouth of text formatting is passed to display and shows; Meanwhile, processor carries out graphical analysis to the photo of A, identifies and lock out the position of A oral area in photo, selects the unique point of the shape of the mouth as one speaks, as two labial angles, four Tiao Chunxian centers.

(3) shape of the mouth as one speaks reconstruct: the Word message simulation shape of the mouth as one speaks that synchronization object B demonstrates according to display, the content of reading speech draft.Meanwhile, camera collection B reads the video (duration 1000 seconds) of this speech draft, i.e. analog video is used as the foundation of shape of the mouth as one speaks reconstruct.After collection completes, processor is divided into 30000 frames by the analog video of the B collecting by the frame number of 30 frame/seconds, respectively corresponding time t ₁, t ₂..., t ₃₀₀₀₀, and locate the shape of the mouth as one speaks in each frame, choose same shape of the mouth as one speaks unique point, i.e. two labial angles, four Tiao Chunxian centers.Because human body information, the photo of A is single-frame images, 30000 frames that the analog video of B is partitioned into respectively with the photo of A in characteristic of correspondence point carry out correspondence, interlock peripheral position, sets up based on seasonal effect in time series cyclic spring spatial model.Afterwards, can draw according to the mutation analysis of the shape of the mouth as one speaks of the 1st frame to the 2 frames in the analog video of B the effect of this power that each unique point is subject in corresponding cyclic spring spatial model constantly of t=(1/30) second, the masterpiece obtaining is used in the cyclic spring spatial model that A photo is corresponding, can completes this reconstruct of A Shape of mouth constantly in t=(1/30) second.When 30000 frame reconstruct all complete, the A after obtaining reconstruct and completing reads aloud the video of this speech draft.

(4) video output: the A after USB interface output reconstruct completes reads aloud the video of this speech draft.

In the present embodiment, also can use and using a smart mobile phone as reconfiguration system, wherein here: WIFI interface is as input, the delivery outlet of system, and handset processes device is as the processing module of system, mobile phone camera is as the Real-time Collection module of system, and mobile phone display screen is as the display module of system.

(1) information is read in: system is read in the photo of A as pending human body information from WIFI interface, read in speech draft document as pending Shape of mouth.

(2) pre-service: it is text formatting that handset processes device identifies Shape of mouth, considers that the utilization of association inversion method is convenient, directly the Shape of mouth of text formatting is passed to display and shows; Meanwhile, processor carries out graphical analysis to the photo of A, identifies and lock out the position of A oral area in photo, selects the unique point of the shape of the mouth as one speaks, as two labial angles, four Tiao Chunxian centers.

(3) shape of the mouth as one speaks reconstruct: the Word message simulation shape of the mouth as one speaks that synchronization object B demonstrates according to display, the content of reading speech draft.Meanwhile, mobile phone camera is divided into 30000 frames by the analog video of the B collecting by the frame number of 30 frame/seconds, respectively corresponding time t ₁, t ₂..., t ₃₀₀₀₀, and locate the shape of the mouth as one speaks in each frame, choose same shape of the mouth as one speaks unique point, i.e. two labial angles, four Tiao Chunxian centers.Because human body information, the photo of A is single-frame images, 30000 frames that the analog video of B is partitioned into respectively with the photo of A in characteristic of correspondence point carry out correspondence, interlock peripheral position, sets up based on seasonal effect in time series cyclic spring spatial model.Afterwards, can draw according to the mutation analysis of the shape of the mouth as one speaks of the 1st frame to the 2 frames in the analog video of B the effect of this power that each unique point is subject in corresponding cyclic spring spatial model constantly of t=(1/30) second, the masterpiece obtaining is used in the cyclic spring spatial model that A photo is corresponding, can completes this reconstruct of A Shape of mouth constantly in t=(1/30) second.When 30000 frame reconstruct all complete, the A after obtaining reconstruct and completing reads aloud the video of this speech draft.

(4) video output: the A after WIFI interface output reconstruct completes reads aloud the video of this speech draft.

Embodiment 2

As shown in Figure 7, to use logic revised law to be modified to example to the shape of the mouth as one speaks of certain fragment in announcer C video, illustrate shape of the mouth as one speaks reconstructing method of the present invention below, in the present embodiment, C is destination object.Here using a smart mobile phone as reconfiguration system, wherein: WIFI interface is as input, the delivery outlet of system, handset processes device is as the processing module of system, and mobile phone display screen is as the display module of system, and the storage unit of mobile phone is as the shape of the mouth as one speaks primitive module of system.

(1) information is read in: system is read in the video of announcer C from WIFI interface, and montage goes out part to be revised as pending human body information, reads in voice correction content as pending Shape of mouth simultaneously.

(2) pre-service: it is phonetic matrix that processor identifies Shape of mouth, considers that the utilization of logic revised law is convenient, shape of the mouth as one speaks delivery of video to display screen corresponding to the Shape of mouth of phonetic matrix converts to and shows.

(3) shape of the mouth as one speaks reconstruct: when display screen demonstrates the Shape of mouth of required reconstruct, can in shape of the mouth as one speaks primitive module, call artificially the basic shape of the mouth as one speaks that meets demand the shape of the mouth as one speaks in ad-hoc location frame is carried out to association correction, simulation is constructed based on seasonal effect in time series shape of the mouth as one speaks state template, information in template outside the shape of the mouth as one speaks, rocking of people's limbs here for example, the variations of surrounding environment etc., need consistent with video.For example, until reconstruct is by the state of remaining silent, to be sent one section of voice that after the sound of " a ", recovery is remained silent again, only need to be the shape of the mouth as one speaks of initially remaining silent, open these three shape of the mouth as one speaks of the shape of the mouth as one speaks that the maximum shape of the mouth as one speaks, pronunciation remain silent after finishing while sending out " a " and rewrite the frame into the corresponding time, can be used as the state template of this section of shape of the mouth as one speaks, set up corresponding cyclic spring spatial model.Two variations between this model three phases are analyzed, can obtain the effect of the power that in these two stage cyclic spring spatial models, each unique point is received, again the effect of power is extended in longer time series, can obtain the shape of the mouth as one speaks state of transition all in these two stages, the some frames that magnify slowly and the some frames that are slow of speech and close up slowly are slow of speech.For example, need between these two shape of the mouth as one speaks primitives, build 30 frames to complete video reconstruction, just the effect of analyzed power is divided into 30 parts, act on successively in this cyclic spring spatial model, produce the shape of the mouth as one speaks state of 30 transition.

(4) video output: cover with the video generating the part that in original video, montage goes out, the video of the announcer C after WIFI interface output reconstruct completes.

Should be understood that to one skilled in the art according to designing requirement and other factors and can carry out various modifications, combination, certainly combine and change, as long as they all fall within claims and equivalents limited range thereof.

Claims

1. a human body shape of the mouth as one speaks video reconstruction method, is characterized in that comprising following four steps:

Information is read in: from input port, read in human body information and Shape of mouth, described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;

Pre-service: the Shape of mouth that input port is read in is identified conversion and the Shape of mouth after identification conversion is shown in real time at display module, and the position of oral area is analyzed and locked to the human body information that input port is read in;

Shape of the mouth as one speaks reconstruct: the temporal evolution method based on cyclic spring spatial dynamics, according to pretreated Shape of mouth and human body information, carry out human body shape of the mouth as one speaks video reconstruction;

Video output: the human body shape of the mouth as one speaks video after delivery outlet output reconstruct completes.

2. human body shape of the mouth as one speaks video reconstruction method according to claim 1, it is characterized in that: in described step (3) is association inversion method based on cyclic spring spatial dynamics temporal evolution method, the Shape of mouth being demonstrated as synchronization object simulant display model by on-the-spot true man, by Real-time Collection module, gather analog video again, the human body information having read carries out the coupling based on cyclic spring space, thereby completes the reconstruct of human body shape of the mouth as one speaks video.

3. human body shape of the mouth as one speaks video reconstruction method according to claim 1, it is characterized in that: in described step (3) is logic revised law based on cyclic spring spatial dynamics temporal evolution method, not relying on on-the-spot true man deduces, directly according to required Shape of mouth, call shape of the mouth as one speaks primitive module and build artificially shape of the mouth as one speaks state template, the transition state that generates disappearance completes video reconstruction.

4. the human body shape of the mouth as one speaks video reconstruction system of reconstructing method according to claim 2, is characterized in that: described video reconstruction system comprises input port, delivery outlet, processing module, display module and Real-time Collection module, wherein:

5. human body shape of the mouth as one speaks video reconstruction system according to claim 4, is characterized in that: described Real-time Collection module is selected from video camera, camera, camera, digitized image equipment, have camera function intelligent terminal any one or multiple arbitrarily.

6. the human body shape of the mouth as one speaks video reconstruction system of reconstructing method according to claim 3, is characterized in that: described video reconstruction system comprises input port, delivery outlet, processing module, display module and shape of the mouth as one speaks primitive module, wherein:

7. according to the human body shape of the mouth as one speaks video reconstruction system described in claim 4-6 any one, it is characterized in that: described processing module is the terminal with video image processing and information analysis ability.

8. according to the human body shape of the mouth as one speaks video reconstruction system described in claim 4-6 any one, it is characterized in that: described display module be selected from display, display screen, projector, intelligent terminal any one or multiple arbitrarily.

9. according to the human body shape of the mouth as one speaks video reconstruction system described in claim 4-6 any one, it is characterized in that: described shape of the mouth as one speaks video reconstruction system is desktop computer, notebook computer or the mobile intelligent terminal with camera function.

10. human body shape of the mouth as one speaks video reconstruction system according to claim 9, is characterized in that: described shape of the mouth as one speaks video reconstruction system is smart mobile phone, panel computer, palm PC, intelligent handheld device.