CN103745462A - Human body mouth shape video reconfiguration system and reconfiguration method - Google Patents

Human body mouth shape video reconfiguration system and reconfiguration method Download PDF

Info

Publication number
CN103745462A
CN103745462A CN201310745441.XA CN201310745441A CN103745462A CN 103745462 A CN103745462 A CN 103745462A CN 201310745441 A CN201310745441 A CN 201310745441A CN 103745462 A CN103745462 A CN 103745462A
Authority
CN
China
Prior art keywords
mouth
shape
speaks
human body
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310745441.XA
Other languages
Chinese (zh)
Other versions
CN103745462B (en
Inventor
孟濬
黄吉羊
刘琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310745441.XA priority Critical patent/CN103745462B/en
Publication of CN103745462A publication Critical patent/CN103745462A/en
Application granted granted Critical
Publication of CN103745462B publication Critical patent/CN103745462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The invention provides a human body mouth shape video reconfiguration system and a corresponding method based on the annular resilient space dynamic time evolution. The method comprises the steps of information reading, preprocessing, mouth shape reconfiguration and video output and adopts a correlation inversion method and a logic correction method. By adopting the reconfiguration method and the reconfiguration system, not only can the inversion of the read mouth shape be realized on a single-frame image to generate a reconfigured human body mouth shape video, but also the correction of the read mouth shape information can be realized on the video consisting of a multiple frame images to generate a configured human body mouth shape video. Compared with the traditional mouth shape reconfiguration method and system, the method and system are accurate and high efficient, no data base is needed, and the flexibility for changing the mouth shapes can be enhanced while the space is saved. Preferably, all units of the system are integrated onto one intelligent terminal, and the intelligent terminal can be different intelligent mobile phones, flat computers (such as iPad), handheld computers, intelligent handheld game machines and the like.

Description

A kind of human body shape of the mouth as one speaks video reconstruction system and reconstructing method
Technical field
The present invention relates to field of video image processing, be specifically related to a kind of human body shape of the mouth as one speaks video reconstruction system and reconstructing method based on cyclic spring spatial dynamics temporal evolution.
Background technology
Along with the development of computer technology be gradually improved, the moulding of people's face and animation also more and more receive people's concern as a unique branch in computer graphics, and wherein the variation for the human body shape of the mouth as one speaks in video, image is widely used especially.Many occasions need to be reconstructed the shape of the mouth as one speaks of the people in existing video or image, by a static image, generate a series of shape of the mouth as one speaks and move, or the shape of the mouth as one speaks in existing video is revised.In order to reach such object, existing technical method is all generally by existing multitude of video image information analyzing and processing in addition, sets up mouth shape data storehouse, then carries out calling relevant information from described mouth shape data storehouse for particular problem.Although such technological means can convert the human body shape of the mouth as one speaks in video, image comparatively exactly, its limitation is also obvious.On the one hand, its realization depends on the huge mouth shape data storehouse building in advance, needs huge data sample, portable poor; On the other hand, the realization of algorithm relates to a large amount of computational analysiss, and complexity is very high, has also limited its range of application.
Summary of the invention
For the deficiencies in the prior art, technical matters to be solved by this invention is to provide high, the portable good human body shape of the mouth as one speaks video reconstruction method and system of a kind of precision, with the evolution to video according to needed shape of the mouth as one speaks realize target object single-frame images, or modification and the inverting of the video of realize target object multiple image composition.Traditional shape of the mouth as one speaks converter technique depends on huge mouth shape data storehouse, has comprised sound bank and corresponding mouth shape image with it in this mouth shape data storehouse, so that called in conversion, has taken on the one hand a large amount of spaces; On the other hand because this mouth shape data storehouse itself can not independently build the shape of the mouth as one speaks making new advances, the transformation problem that does not comprise the shape of the mouth as one speaks in cannot process database in practice.System of the present invention is different from traditional shape of the mouth as one speaks transformation system, does not need such mouth shape data storehouse, can complete quickly and accurately the video reconstruction of the human body shape of the mouth as one speaks.
The technical solution used in the present invention is as follows:
A human body shape of the mouth as one speaks video reconstruction method, specifically comprises following four steps:
(1) information is read in: from input port, read in human body information and Shape of mouth, described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;
(2) pre-service: the Shape of mouth that input port is read in is identified conversion and the Shape of mouth after identification conversion is shown in real time at display module, and the position of oral area is analyzed and locked to the human body information that input port is read in;
(3) shape of the mouth as one speaks reconstruct: the temporal evolution method based on cyclic spring spatial dynamics, according to pretreated Shape of mouth and human body information, carry out human body shape of the mouth as one speaks video reconstruction;
(4) video output: the human body shape of the mouth as one speaks video after delivery outlet output reconstruct completes.
The process flow diagram of technical solution of the present invention as shown in Figure 1.
In described step (3), the method for described shape of the mouth as one speaks reconstruct is based on cyclic spring spatial dynamics temporal evolution.Described cyclic spring space is a kind of order of point and plane space of distance of having defined, and it has following 4 character:
1, any two points P in cyclic spring space 1and P 2-, distance variable therebetween.
2, any two points P in cyclic spring space 1and P 2-, its order is strictly constant, that is: choose in cyclic spring space and differ from P 1, P 2any point P 3, the order of these 3 clockwise (or counterclockwise) can not change in conversion arbitrarily.
3, can be subject to transverse axis angle be the effect that α, size are the power F of f to any point P in cyclic spring space, and therefore produce locational variation, and direction produces certain displacement along being α with transverse axis angle to show as relative original position.
4, any point P in cyclic spring space is subject to the used time of doing of a power F, and this power F also can have influence in cyclic spring space other point when affecting P, makes it be equal to that to be subject to one be α with transverse axis angle ', size is f 'the effect of power, be called correlation.This locus with respect to P has determined α 'size, the distance of this point and P has determined f 'size, when coverage R that the distance of this point and P is greater than, think the correlation impact of its F that do not stress.
Cyclic spring space schematic diagram as shown in Figure 2.
The conversion of the shape of the mouth as one speaks is that the musculus orbicularis oris of lip affects generation by buccal branch of facial nerve domination, therefore for the shape of the mouth as one speaks, can set up described cyclic spring spatial model and study.When t constantly the shape of the mouth as one speaks change, can think now to put P for certain n on this cyclic spring space 1, P 2..., P nbe subject to respectively power F 1, F 2..., F neffect, the acting in conjunction of this n power makes this cyclic spring space that local displacement, rotation or stretching occur, and produces the conversion of the shape of the mouth as one speaks.In described step (3), system processing module can pick out the position of the shape of the mouth as one speaks in video, image and change based on seasonal effect in time series, set up corresponding cyclic spring spatial model, while extracting each t, be engraved in the effect of the power producing on this model regional.Meanwhile, the described human body information of recycling is set up new cyclic spring spatial model, by extracted power according to corresponding time effect the correspondence position on new cyclic spring spatial model, can complete human body shape of the mouth as one speaks video reconstruction.Described correspondence position can determine by 4 outline lines and the unique point on outline line of the shape of the mouth as one speaks, and in order to guarantee the precision of conversion, the unique point in practical operation on every outline line should be more than or equal to 3, as shown in Figure 3.The process of described definite correspondence position is the association based on cyclic spring space.
As preferably, in described step (3) is association inversion method based on cyclic spring spatial dynamics temporal evolution method, the Shape of mouth being demonstrated as synchronization object simulant display model by on-the-spot true man, by Real-time Collection module, gather analog video again, the human body information having read carries out the coupling based on cyclic spring space, thereby completes the reconstruct of human body shape of the mouth as one speaks video.As shown in Figure 4, in this method, the Shape of mouth of reconstruct is treated in synchronization object in-site modeling, this process is collected as analog video, based on this analog video, set up cyclic spring spatial model and then it is carried out to analyzing and processing, can make the Shape of mouth for the treatment of reconstruct be reappeared accurately and efficiently on the human body information of destination object, thereby realize this shape of the mouth as one speaks in the reconstruct of destination object oral area.This method schematic flow sheet as shown in Figure 6.Particularly, the Shape of mouth simulation shape of the mouth as one speaks that synchronization object demonstrates according to display module, for example, read the passage of demonstration or imitate the some shape of the mouth as one speaks pictures that show, now, processing module is controlled the analog video that Real-time Collection module gathers synchronization object, as the foundation of shape of the mouth as one speaks reconstruct.After collection completes, processing module is on average divided into n frame (when described sample shape of the mouth as one speaks video duration is T during second, having n=TN), respectively corresponding time t by the analog video collecting according to certain frame number N 1, t 2..., t n, locate the shape of the mouth as one speaks of each frame, and profile and the unique point of the shape of the mouth as one speaks in the profile of the shape of the mouth as one speaks and unique point and the human body information that read linked corresponding.Described frame number N can determine according to actual conditions, meet sampling thheorem and can reflect the Shape of mouth of required reconstruct with the image after guaranteeing to cut apart; The frequency of cutting apart is higher, and the complexity of shape of the mouth as one speaks reconstruct is higher, and the precision of reconstruct is also higher; The frequency of cutting apart is lower, and the complexity of shape of the mouth as one speaks reconstruct is lower, and the precision of reconstruct is also lower.When the human body information reading in step (1) is single-frame images, described interlock correspondence refers to the shape of the mouth as one speaks unique point in each frame of analog video is all corresponded on single frames human body information image; When the human body information reading in step (1) is the video of multiple image composition, described interlock correspondence refers to the shape of the mouth as one speaks unique point in each frame of analog video is all corresponded on the corresponding frame of human body information video.Described corresponding frame can be determined by method below: the frame figure that the frame figure that human body information Video segmentation is gone out and analog video are partitioned into is numbered, if the frame number of human body information video and analog video is equal, described corresponding frame is the frame that numbering is identical; If the frame number of human body information video and analog video is unequal, described corresponding frame is the identical frame in proportion position in sum.When the frame number of analog video is greater than human body information video frame number, unnecessary frame is cast out in proportion; When the frame number of analog video is less than human body information video frame number, not enough frame is carried out to interpolation processing in proportion, the middle entry shape of the mouth as one speaks of interpolation is constructed by the dynamics temporal evolution based on cyclic spring space.After having completed the interlock correspondence of the shape of the mouth as one speaks, can draw according to the mutation analysis of the shape of the mouth as one speaks of i frame to the (i+1) frame in analog video the effect of this power that each unique point is subject in corresponding cyclic spring spatial model constantly of t=(i/N) second, the masterpiece obtaining is used in the cyclic spring spatial model that human body information is corresponding, can completes this reconstruct of Shape of mouth constantly in t=(i/N) second.After each frame figure reconstruct of new video completes, the human body shape of the mouth as one speaks video after obtaining reconstruct and completing.
As preferably, in described step (3) is logic revised law based on cyclic spring spatial dynamics temporal evolution method, not relying on on-the-spot true man deduces, directly according to required Shape of mouth, call shape of the mouth as one speaks primitive module and build artificially shape of the mouth as one speaks state template, then complete video reconstruction by the transition state of the dynamics temporal evolution generation disappearance based on cyclic spring space.As shown in Figure 5, this method is without synchronization object in-site modeling, but on the basis of human body information and Shape of mouth, generate shape of the mouth as one speaks state template by calling artificially shape of the mouth as one speaks primitive, set up again the shape of the mouth as one speaks video that the evolution of cyclic spring spatial model generates destination object, the video reconstruction of realize target object oral area.This method schematic flow sheet as shown in Figure 7.Described shape of the mouth as one speaks primitive is the model of the basic scenario of the human body shape of the mouth as one speaks, for example a shape of the mouth as one speaks (opening one's mouth) in phonetic, the o shape of the mouth as one speaks (pouting one's lips), i shape of the mouth as one speaks (grinning) etc., can generate by the dynamics temporal evolution based on cyclic spring space the shape of the mouth as one speaks state of all transition.The shape of the mouth as one speaks state of described transition refers to that a shape of the mouth as one speaks primitive transforms to the shape of the mouth as one speaks state producing in the process of another shape of the mouth as one speaks primitive, for example, from the shape of the mouth as one speaks primitive of remaining silent to the shape of the mouth as one speaks primitive that sends phonetic " a ", the shape of the mouth as one speaks state of its transition is exactly that oral area slowly magnifies the shape of the mouth as one speaks in process.Particularly, when display module demonstrates the Shape of mouth of required reconstruct, can in shape of the mouth as one speaks storehouse, choose artificially n the basic shape of the mouth as one speaks that meets demand the shape of the mouth as one speaks in ad-hoc location frame is carried out to association correction, simulate and construct based on seasonal effect in time series shape of the mouth as one speaks state template.When the human body information reading in step (1) is single-frame images, the information in described shape of the mouth as one speaks state template outside the shape of the mouth as one speaks is all expanded by single-frame images; When the human body information reading in step (1) is the video of multiple image composition, the information in described shape of the mouth as one speaks state template outside the shape of the mouth as one speaks is consistent with video.Information outside the described shape of the mouth as one speaks is all information outside oral area in image or video, comprises other parts (such as nose, eye, cheek, trunk, four limbs etc.) and the residing environment of people outside human body oral area.For example, the rocking of the blinking of eye, health, other people process etc. is all considered as the variation that all information outside oral area occur after one's death.After shape of the mouth as one speaks state template has built, then the oral area peripheral position in human body information is carried out to the association based on cyclic spring space, make the variation of oral area cause corresponding impact to the region of oral area peripheral extent, construct corresponding cyclic spring spatial model.Now, analyze the variation of the individual shape of the mouth as one speaks primitive of i shape of the mouth as one speaks primitive to the (i+1), can draw the effect of the power that in cyclic spring spatial model corresponding to i stage, each point is received, again the effect of power is extended in longer time series, can obtain all shape of the mouth as one speaks transition states in these two stages.When the whole reconstruct of (n-1) individual transition state completes, realize the reconstruct of human body shape of the mouth as one speaks video.
For association inversion method, the present invention also provides a kind of human body shape of the mouth as one speaks video reconstruction system, comprises input port, delivery outlet, processing module, display module and Real-time Collection module, wherein:
Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;
Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after reconstruct completes;
Described display module shows in real time for the Shape of mouth that input port is read in;
Described processing module is carried out conversion process for the Shape of mouth that input port is read in, then on the basis of human body information, realizes the reconstruct of human body shape of the mouth as one speaks video;
Described Real-time Collection module is for carrying out Real-time Collection in the process that adopts association inversion method to be reconstructed to the video of synchronization object.
The connected mode of modules as shown in Figure 8.Wherein, between described input port and processing module, between processing module and delivery outlet, between processing module and Real-time Collection module, between processing module and display module, can partly or entirely by wired or wireless mode, be connected, to guarantee effective transmission of data.Can all adopt wired mode to connect according to actual needs, all adopt wireless mode to connect, or part adopts, and wired mode connects, part adopts wireless mode to connect.
Described processing module is the terminal with video image processing and information analysis ability, can be selected from digit chip, intelligent terminal.Described intelligent terminal refers to can catch external information, can calculate, analyzes and process, and between different terminals, can carry out the equipment of communication, includes but not limited to desktop computer, notebook computer, mobile intelligent terminal.The described mobile intelligent terminal intelligent terminal that is of portable form, includes but not limited to various smart mobile phones, panel computer (as iPad etc.), palm PC, intelligent handheld device.Described digit chip refers to, through design, adopt integrated electronic technique, the chip that can calculate, analyze and process, and can control other equipment by expansion, include but not limited to single-chip microcomputer, ARM, DSP, FPGA etc.
Described Real-time Collection module is selected from video camera, camera, camera, digitized image equipment, have camera function intelligent terminal any one or multiple arbitrarily.
Described display module be selected from display, display screen, projector, intelligent terminal any one or multiple arbitrarily.
Particularly, the Shape of mouth simulation shape of the mouth as one speaks that synchronization object demonstrates according to display module, for example, read the passage of demonstration or imitate the some shape of the mouth as one speaks pictures that show, now, processing module is controlled the analog video that Real-time Collection module gathers synchronization object, as the foundation of shape of the mouth as one speaks reconstruct.After collection completes, processing module is on average divided into n frame (when described sample shape of the mouth as one speaks video duration is T during second, having n=TN), respectively corresponding time t by the analog video collecting according to certain frame number N 1, t 2..., t n, locate the shape of the mouth as one speaks of each frame, and profile and the unique point of the shape of the mouth as one speaks in the profile of the shape of the mouth as one speaks and unique point and the human body information that read linked corresponding.The described frequency of cutting apart can determine according to actual conditions, meet sampling thheorem and can reflect the Shape of mouth of required reconstruct with the image after guaranteeing to cut apart; The frequency of cutting apart is higher, and the complexity of shape of the mouth as one speaks reconstruct is higher, and the precision of reconstruct is also higher; The frequency of cutting apart is lower, and the complexity of shape of the mouth as one speaks reconstruct is lower, and the precision of reconstruct is also lower.When the human body information reading in described input port is single-frame images, described interlock correspondence refers to the shape of the mouth as one speaks unique point in each frame of analog video is all corresponded on single frames human body information image; When the human body information reading in described input port is the video of multiple image composition, described interlock correspondence refers to the shape of the mouth as one speaks unique point in each frame of analog video is all corresponded on the corresponding frame of human body information video.Described corresponding frame can be determined by method below: the frame figure that the frame figure that human body information Video segmentation is gone out and analog video are partitioned into is numbered, if the frame number of human body information video and analog video is equal, described corresponding frame is the frame that numbering is identical; If the frame number of human body information video and analog video is unequal, described corresponding frame is the identical frame in proportion position in sum.When the frame number of analog video is greater than human body information video frame number, unnecessary frame is cast out in proportion; When the frame number of analog video is less than human body information video frame number, not enough frame is carried out to interpolation processing in proportion, the middle entry shape of the mouth as one speaks of interpolation is constructed by the dynamics temporal evolution based on cyclic spring space.After having completed the interlock correspondence of the shape of the mouth as one speaks, can draw according to the mutation analysis of the shape of the mouth as one speaks of i frame to the (i+1) frame in analog video the effect of this power that each unique point is subject in corresponding cyclic spring spatial model constantly of t=(i/N) second, the masterpiece obtaining is used in the cyclic spring spatial model that human body information is corresponding, can completes this reconstruct of Shape of mouth constantly in t=(i/N) second.After each frame figure reconstruct of new video completes, the human body shape of the mouth as one speaks video after obtaining reconstruct and completing.
For logic revised law, the present invention also provides a kind of human body shape of the mouth as one speaks video reconstruction system, comprises input port, delivery outlet, processing module, display module and shape of the mouth as one speaks primitive module, wherein:
Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;
Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after reconstruct completes;
Described display module shows in real time for the Shape of mouth that input port is read in;
Described processing module is carried out conversion process for the Shape of mouth that input port is read in, then on the basis of human body information, realizes the reconstruct of human body shape of the mouth as one speaks video;
Described shape of the mouth as one speaks primitive module, for storing basic shape of the mouth as one speaks primitive, in order to call in the process adopting logic revised law to be reconstructed, builds shape of the mouth as one speaks state template artificially.
Described shape of the mouth as one speaks primitive is the model of the basic scenario of the human body shape of the mouth as one speaks, for example a shape of the mouth as one speaks (opening one's mouth) in phonetic, the o shape of the mouth as one speaks (pouting one's lips), i shape of the mouth as one speaks (grinning) etc., can generate by the dynamics temporal evolution based on cyclic spring space the shape of the mouth as one speaks state of all transition.The shape of the mouth as one speaks state of described transition refers to that a shape of the mouth as one speaks primitive transforms to the shape of the mouth as one speaks state producing in the process of another shape of the mouth as one speaks primitive, for example, from the shape of the mouth as one speaks primitive of remaining silent to the shape of the mouth as one speaks primitive that sends phonetic " a ", the shape of the mouth as one speaks state of its transition is exactly that oral area slowly magnifies the shape of the mouth as one speaks in process.
The connected mode of modules as shown in Figure 9.Wherein, between described input port and processing module, between processing module and delivery outlet, between processing module and shape of the mouth as one speaks primitive module, between processing module and display module, can partly or entirely by wired or wireless mode, be connected, to guarantee effective transmission of data.Can all adopt wired mode to connect according to actual needs, all adopt wireless mode to connect, or part adopts, and wired mode connects, part adopts wireless mode to connect.
Described processing module is the terminal with video image processing and information analysis ability, comprises and is selected from digit chip, intelligent terminal.Described intelligent terminal refers to can catch external information, can calculate, analyzes and process, and between different terminals, can carry out the equipment of communication, includes but not limited to desktop computer, notebook computer, mobile intelligent terminal.The described mobile intelligent terminal intelligent terminal that is of portable form, includes but not limited to various smart mobile phones, panel computer (as iPad etc.), palm PC, intelligent handheld device.Described digit chip refers to, through design, adopt integrated electronic technique, the chip that can calculate, analyze and process, and can control other equipment by expansion, include but not limited to single-chip microcomputer, ARM, DSP, FPGA etc.
Described display module be selected from display, display screen, projector, intelligent terminal any one or multiple arbitrarily.
Described shape of the mouth as one speaks primitive module, for storing basic shape of the mouth as one speaks model, in order to call in the process adopting logic revised law to be reconstructed, builds shape of the mouth as one speaks state template artificially.Traditional shape of the mouth as one speaks converter technique depends on huge mouth shape data storehouse, in this mouth shape data storehouse, comprised sound bank and with it corresponding mouth shape image so that called, taken on the one hand a large amount of spaces in conversion; On the other hand because this mouth shape data storehouse itself can not independently build the shape of the mouth as one speaks making new advances, the transformation problem that does not comprise the shape of the mouth as one speaks in cannot process database in practice.System of the present invention is different from traditional shape of the mouth as one speaks transformation system, does not need such mouth shape data storehouse, can complete quickly and accurately the video reconstruction of the human body shape of the mouth as one speaks.
As preferably, shape of the mouth as one speaks video reconstruction system of the present invention can be desktop computer, notebook computer or the mobile intelligent terminal with camera function.The described mobile intelligent terminal intelligent terminal that is of portable form, includes but not limited to various smart mobile phones, panel computer (as iPad etc.), palm PC, intelligent handheld device.Particularly, shape of the mouth as one speaks video reconstruction system of the present invention can be only a desktop computer with camera function, or a notebook computer with camera function, or a mobile intelligent terminal with camera function.Now, the communication of equipment and data transmission module, as input port and the delivery outlet of system, are processed kernel as the processing module of system, and camera is as the Real-time Collection module of system, display screen is as the display module of system, and storage unit is as the shape of the mouth as one speaks primitive module of system.Shape of the mouth as one speaks video reconstruction system of the present invention can be also the combination of desktop computer, notebook computer or the mobile intelligent terminal with camera function, for example, there is the camera of mobile intelligent terminal of camera function and display screen respectively as Real-time Collection module and display module, the communication module of notebook computer, process kernel and storage unit respectively as input/output port, processing module and the shape of the mouth as one speaks primitive module of system, etc.
As preferably, in described step (3) is logic revised law based on cyclic spring spatial dynamics temporal evolution method, not relying on on-the-spot true man deduces, directly according to required Shape of mouth, call shape of the mouth as one speaks primitive module and build artificially shape of the mouth as one speaks state template, by the dynamics temporal evolution based on cyclic spring space, generate the transition state lacking again and complete video reconstruction, its schematic flow sheet as shown in Figure 7.Described shape of the mouth as one speaks primitive is the model of the basic scenario of the human body shape of the mouth as one speaks, for example a shape of the mouth as one speaks (opening one's mouth) in phonetic, the o shape of the mouth as one speaks (pouting one's lips), i shape of the mouth as one speaks (grinning) etc., can generate by the dynamics temporal evolution based on cyclic spring space the shape of the mouth as one speaks state of all transition.The shape of the mouth as one speaks state of described transition refers to that a shape of the mouth as one speaks primitive transforms to the shape of the mouth as one speaks state producing in the process of another shape of the mouth as one speaks primitive, for example, from the shape of the mouth as one speaks primitive of remaining silent to the shape of the mouth as one speaks primitive that sends phonetic " a ", the shape of the mouth as one speaks state of its transition is exactly that oral area slowly magnifies the shape of the mouth as one speaks in process.Particularly, when display module demonstrates the Shape of mouth of required reconstruct, can in shape of the mouth as one speaks storehouse, choose artificially n the basic shape of the mouth as one speaks that meets demand the shape of the mouth as one speaks in ad-hoc location frame is carried out to association correction, simulate and construct based on seasonal effect in time series shape of the mouth as one speaks state template.When the human body information reading in described input port is single-frame images, the information in described shape of the mouth as one speaks state template outside the shape of the mouth as one speaks is all expanded by single-frame images; When the human body information reading in described input port is the video of multiple image composition, the information in described shape of the mouth as one speaks state template outside the shape of the mouth as one speaks is consistent with video.Information outside the described shape of the mouth as one speaks is all information outside oral area in image or video, comprises other parts (such as nose, eye, cheek, trunk, four limbs etc.) and the residing environment of people outside human body oral area.For example, the rocking of the blinking of eye, health, other people process etc. is all considered as the variation that all information outside oral area occur after one's death.After shape of the mouth as one speaks state template has built, then the oral area peripheral position in human body information is carried out to the association based on cyclic spring space, make the variation of oral area cause corresponding impact to the region of oral area peripheral extent, construct corresponding cyclic spring spatial model.Now, analyze the variation of the individual shape of the mouth as one speaks primitive of i shape of the mouth as one speaks primitive to the (i+1), can draw the effect of the power that in cyclic spring spatial model corresponding to i stage, each point is received, again the effect of power is extended in longer time series, can obtain all shape of the mouth as one speaks transition states in these two stages.When the whole reconstruct of (n-1) individual transition state completes, realize the reconstruct of human body shape of the mouth as one speaks video.
The invention has the beneficial effects as follows:
(1) the present invention both can realize the inverting of reading in Shape of mouth on this single-frame images, generate the human body shape of the mouth as one speaks video after reconstruct, on the video that also can form at multiple image, realize the correction of reading in Shape of mouth, generate the human body shape of the mouth as one speaks video after reconstruct, have very strong applicability.
(2) the present invention has association inversion and two kinds of specific embodiments of logic correction, and the former can complete the reconstruct of human body shape of the mouth as one speaks video fast, efficiently by on-the-spot true man's synchronous deduction; But the latter needs people for calling shape of the mouth as one speaks primitive does not rely on on-the-spot deduction, can realize off-line and revise, and two kinds of methods can meet the demand of shape of the mouth as one speaks video reconstruction under different situations.
(3) the present invention configures simply aspect system hardware, with low cost; Software aspect also only needs common video, image processing software and small-sized shape of the mouth as one speaks primitive, do not relate to extra Software deployment, especially relatively traditional shape of the mouth as one speaks reconfiguration system, system of the present invention, without database, has also strengthened the flexibility ratio of shape of the mouth as one speaks conversion when having saved space.
(4) more preferably, all unit of system of the present invention can be integrated on an intelligent terminal, and described intelligent terminal can be smart mobile phone, panel computer, palm PC, intelligent handheld device, therefore have very high portability.
Accompanying drawing explanation
Fig. 1 is the inventive method process flow diagram.
Fig. 2 is cyclic spring space schematic diagram.
Fig. 3 is the schematic diagram of outline line and unique point when shape of the mouth as one speaks position is corresponding in the inventive method, in figure, L1 to L4 and L1 ' are respectively the outline line of two shape of the mouth as one speaks to L4 ', P1 to P6 and P1 ' are respectively two key points on shape of the mouth as one speaks outline line to P6 ', and need to guarantee has at least 3 corresponding point to guarantee the accuracy of conversion on every outline line.
Fig. 4 is the information conversion sketch of association inversion method in the present invention.
Fig. 5 is the information conversion sketch of logic revised law in the present invention.
Fig. 6 is the schematic flow sheet of association inversion method in the present invention.
Fig. 7 is the schematic flow sheet of logic revised law in the present invention.
Fig. 8 is system construction drawing corresponding to association inversion method of the present invention.
Fig. 9 is system construction drawing corresponding to logic revised law of the present invention.
Embodiment
In order to illustrate in greater detail human body shape of the mouth as one speaks video reconstruction method of the present invention, describe with reference to the accompanying drawings the present invention below in detail.
Embodiment 1
As shown in Figure 6, using B as synchronization object, using association inversion method from a photo of destination object A, to reconstruct the video that A reads aloud a lecture original text is example, illustrates shape of the mouth as one speaks reconstructing method of the present invention.Here using a desktop computer that has been equipped with camera as reconfiguration system, wherein: USB interface is as input, the delivery outlet of system, processor is as the processing module of system, and camera is as the Real-time Collection module of system, and display is as the display module of system.
(1) information is read in: system is read in the photo of A as pending human body information from USB interface, read in speech draft document as pending Shape of mouth.
(2) pre-service: it is text formatting that processor identifies Shape of mouth, considers that the utilization of association inversion method is convenient, directly the Shape of mouth of text formatting is passed to display and shows; Meanwhile, processor carries out graphical analysis to the photo of A, identifies and lock out the position of A oral area in photo, selects the unique point of the shape of the mouth as one speaks, as two labial angles, four Tiao Chunxian centers.
(3) shape of the mouth as one speaks reconstruct: the Word message simulation shape of the mouth as one speaks that synchronization object B demonstrates according to display, the content of reading speech draft.Meanwhile, camera collection B reads the video (duration 1000 seconds) of this speech draft, i.e. analog video is used as the foundation of shape of the mouth as one speaks reconstruct.After collection completes, processor is divided into 30000 frames by the analog video of the B collecting by the frame number of 30 frame/seconds, respectively corresponding time t 1, t 2..., t 30000, and locate the shape of the mouth as one speaks in each frame, choose same shape of the mouth as one speaks unique point, i.e. two labial angles, four Tiao Chunxian centers.Because human body information, the photo of A is single-frame images, 30000 frames that the analog video of B is partitioned into respectively with the photo of A in characteristic of correspondence point carry out correspondence, interlock peripheral position, sets up based on seasonal effect in time series cyclic spring spatial model.Afterwards, can draw according to the mutation analysis of the shape of the mouth as one speaks of the 1st frame to the 2 frames in the analog video of B the effect of this power that each unique point is subject in corresponding cyclic spring spatial model constantly of t=(1/30) second, the masterpiece obtaining is used in the cyclic spring spatial model that A photo is corresponding, can completes this reconstruct of A Shape of mouth constantly in t=(1/30) second.When 30000 frame reconstruct all complete, the A after obtaining reconstruct and completing reads aloud the video of this speech draft.
(4) video output: the A after USB interface output reconstruct completes reads aloud the video of this speech draft.
In the present embodiment, also can use and using a smart mobile phone as reconfiguration system, wherein here: WIFI interface is as input, the delivery outlet of system, and handset processes device is as the processing module of system, mobile phone camera is as the Real-time Collection module of system, and mobile phone display screen is as the display module of system.
(1) information is read in: system is read in the photo of A as pending human body information from WIFI interface, read in speech draft document as pending Shape of mouth.
(2) pre-service: it is text formatting that handset processes device identifies Shape of mouth, considers that the utilization of association inversion method is convenient, directly the Shape of mouth of text formatting is passed to display and shows; Meanwhile, processor carries out graphical analysis to the photo of A, identifies and lock out the position of A oral area in photo, selects the unique point of the shape of the mouth as one speaks, as two labial angles, four Tiao Chunxian centers.
(3) shape of the mouth as one speaks reconstruct: the Word message simulation shape of the mouth as one speaks that synchronization object B demonstrates according to display, the content of reading speech draft.Meanwhile, mobile phone camera is divided into 30000 frames by the analog video of the B collecting by the frame number of 30 frame/seconds, respectively corresponding time t 1, t 2..., t 30000, and locate the shape of the mouth as one speaks in each frame, choose same shape of the mouth as one speaks unique point, i.e. two labial angles, four Tiao Chunxian centers.Because human body information, the photo of A is single-frame images, 30000 frames that the analog video of B is partitioned into respectively with the photo of A in characteristic of correspondence point carry out correspondence, interlock peripheral position, sets up based on seasonal effect in time series cyclic spring spatial model.Afterwards, can draw according to the mutation analysis of the shape of the mouth as one speaks of the 1st frame to the 2 frames in the analog video of B the effect of this power that each unique point is subject in corresponding cyclic spring spatial model constantly of t=(1/30) second, the masterpiece obtaining is used in the cyclic spring spatial model that A photo is corresponding, can completes this reconstruct of A Shape of mouth constantly in t=(1/30) second.When 30000 frame reconstruct all complete, the A after obtaining reconstruct and completing reads aloud the video of this speech draft.
(4) video output: the A after WIFI interface output reconstruct completes reads aloud the video of this speech draft.
Embodiment 2
As shown in Figure 7, to use logic revised law to be modified to example to the shape of the mouth as one speaks of certain fragment in announcer C video, illustrate shape of the mouth as one speaks reconstructing method of the present invention below, in the present embodiment, C is destination object.Here using a smart mobile phone as reconfiguration system, wherein: WIFI interface is as input, the delivery outlet of system, handset processes device is as the processing module of system, and mobile phone display screen is as the display module of system, and the storage unit of mobile phone is as the shape of the mouth as one speaks primitive module of system.
(1) information is read in: system is read in the video of announcer C from WIFI interface, and montage goes out part to be revised as pending human body information, reads in voice correction content as pending Shape of mouth simultaneously.
(2) pre-service: it is phonetic matrix that processor identifies Shape of mouth, considers that the utilization of logic revised law is convenient, shape of the mouth as one speaks delivery of video to display screen corresponding to the Shape of mouth of phonetic matrix converts to and shows.
(3) shape of the mouth as one speaks reconstruct: when display screen demonstrates the Shape of mouth of required reconstruct, can in shape of the mouth as one speaks primitive module, call artificially the basic shape of the mouth as one speaks that meets demand the shape of the mouth as one speaks in ad-hoc location frame is carried out to association correction, simulation is constructed based on seasonal effect in time series shape of the mouth as one speaks state template, information in template outside the shape of the mouth as one speaks, rocking of people's limbs here for example, the variations of surrounding environment etc., need consistent with video.For example, until reconstruct is by the state of remaining silent, to be sent one section of voice that after the sound of " a ", recovery is remained silent again, only need to be the shape of the mouth as one speaks of initially remaining silent, open these three shape of the mouth as one speaks of the shape of the mouth as one speaks that the maximum shape of the mouth as one speaks, pronunciation remain silent after finishing while sending out " a " and rewrite the frame into the corresponding time, can be used as the state template of this section of shape of the mouth as one speaks, set up corresponding cyclic spring spatial model.Two variations between this model three phases are analyzed, can obtain the effect of the power that in these two stage cyclic spring spatial models, each unique point is received, again the effect of power is extended in longer time series, can obtain the shape of the mouth as one speaks state of transition all in these two stages, the some frames that magnify slowly and the some frames that are slow of speech and close up slowly are slow of speech.For example, need between these two shape of the mouth as one speaks primitives, build 30 frames to complete video reconstruction, just the effect of analyzed power is divided into 30 parts, act on successively in this cyclic spring spatial model, produce the shape of the mouth as one speaks state of 30 transition.
(4) video output: cover with the video generating the part that in original video, montage goes out, the video of the announcer C after WIFI interface output reconstruct completes.
Should be understood that to one skilled in the art according to designing requirement and other factors and can carry out various modifications, combination, certainly combine and change, as long as they all fall within claims and equivalents limited range thereof.

Claims (10)

1. a human body shape of the mouth as one speaks video reconstruction method, is characterized in that comprising following four steps:
Information is read in: from input port, read in human body information and Shape of mouth, described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;
Pre-service: the Shape of mouth that input port is read in is identified conversion and the Shape of mouth after identification conversion is shown in real time at display module, and the position of oral area is analyzed and locked to the human body information that input port is read in;
Shape of the mouth as one speaks reconstruct: the temporal evolution method based on cyclic spring spatial dynamics, according to pretreated Shape of mouth and human body information, carry out human body shape of the mouth as one speaks video reconstruction;
Video output: the human body shape of the mouth as one speaks video after delivery outlet output reconstruct completes.
2. human body shape of the mouth as one speaks video reconstruction method according to claim 1, it is characterized in that: in described step (3) is association inversion method based on cyclic spring spatial dynamics temporal evolution method, the Shape of mouth being demonstrated as synchronization object simulant display model by on-the-spot true man, by Real-time Collection module, gather analog video again, the human body information having read carries out the coupling based on cyclic spring space, thereby completes the reconstruct of human body shape of the mouth as one speaks video.
3. human body shape of the mouth as one speaks video reconstruction method according to claim 1, it is characterized in that: in described step (3) is logic revised law based on cyclic spring spatial dynamics temporal evolution method, not relying on on-the-spot true man deduces, directly according to required Shape of mouth, call shape of the mouth as one speaks primitive module and build artificially shape of the mouth as one speaks state template, the transition state that generates disappearance completes video reconstruction.
4. the human body shape of the mouth as one speaks video reconstruction system of reconstructing method according to claim 2, is characterized in that: described video reconstruction system comprises input port, delivery outlet, processing module, display module and Real-time Collection module, wherein:
Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;
Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after reconstruct completes;
Described display module shows in real time for the Shape of mouth that input port is read in;
Described processing module is carried out conversion process for the Shape of mouth that input port is read in, then on the basis of human body information, realizes the reconstruct of human body shape of the mouth as one speaks video;
Described Real-time Collection module is for carrying out Real-time Collection in the process that adopts association inversion method to be reconstructed to the video of synchronization object.
5. human body shape of the mouth as one speaks video reconstruction system according to claim 4, is characterized in that: described Real-time Collection module is selected from video camera, camera, camera, digitized image equipment, have camera function intelligent terminal any one or multiple arbitrarily.
6. the human body shape of the mouth as one speaks video reconstruction system of reconstructing method according to claim 3, is characterized in that: described video reconstruction system comprises input port, delivery outlet, processing module, display module and shape of the mouth as one speaks primitive module, wherein:
Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from the video that the single-frame images of destination object or multiple image form, described Shape of mouth be selected from word, sound, image, video any one or multiple arbitrarily;
Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after reconstruct completes;
Described display module shows in real time for the Shape of mouth that input port is read in;
Described processing module is carried out conversion process for the Shape of mouth that input port is read in, then on the basis of human body information, realizes the reconstruct of human body shape of the mouth as one speaks video;
Described shape of the mouth as one speaks primitive module, for storing basic shape of the mouth as one speaks primitive, in order to call in the process adopting logic revised law to be reconstructed, builds shape of the mouth as one speaks state template artificially.
7. according to the human body shape of the mouth as one speaks video reconstruction system described in claim 4-6 any one, it is characterized in that: described processing module is the terminal with video image processing and information analysis ability.
8. according to the human body shape of the mouth as one speaks video reconstruction system described in claim 4-6 any one, it is characterized in that: described display module be selected from display, display screen, projector, intelligent terminal any one or multiple arbitrarily.
9. according to the human body shape of the mouth as one speaks video reconstruction system described in claim 4-6 any one, it is characterized in that: described shape of the mouth as one speaks video reconstruction system is desktop computer, notebook computer or the mobile intelligent terminal with camera function.
10. human body shape of the mouth as one speaks video reconstruction system according to claim 9, is characterized in that: described shape of the mouth as one speaks video reconstruction system is smart mobile phone, panel computer, palm PC, intelligent handheld device.
CN201310745441.XA 2013-12-27 2013-12-27 A kind of human body mouth shape video reconfiguration system and reconstructing method Active CN103745462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310745441.XA CN103745462B (en) 2013-12-27 2013-12-27 A kind of human body mouth shape video reconfiguration system and reconstructing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310745441.XA CN103745462B (en) 2013-12-27 2013-12-27 A kind of human body mouth shape video reconfiguration system and reconstructing method

Publications (2)

Publication Number Publication Date
CN103745462A true CN103745462A (en) 2014-04-23
CN103745462B CN103745462B (en) 2016-11-02

Family

ID=50502477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310745441.XA Active CN103745462B (en) 2013-12-27 2013-12-27 A kind of human body mouth shape video reconfiguration system and reconstructing method

Country Status (1)

Country Link
CN (1) CN103745462B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298961A (en) * 2014-06-30 2015-01-21 中国传媒大学 Mouth-movement-identification-based video marshalling method
CN108831463A (en) * 2018-06-28 2018-11-16 广州华多网络科技有限公司 Lip reading synthetic method, device, electronic equipment and storage medium
CN109168067A (en) * 2018-11-02 2019-01-08 深圳Tcl新技术有限公司 Video timing correction method, correction terminal and computer readable storage medium
CN114554267A (en) * 2022-02-22 2022-05-27 上海艾融软件股份有限公司 Audio and video synchronization method and device based on digital twin technology

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
US20100332229A1 (en) * 2009-06-30 2010-12-30 Sony Corporation Apparatus control based on visual lip share recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character
US20100332229A1 (en) * 2009-06-30 2010-12-30 Sony Corporation Apparatus control based on visual lip share recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LAIN MATTHEWS ET AL.: "《Extraction of Visual Features for Lipreading》", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
李刚等: "《视觉驱动的语音合成***中唇形轮廓的正交变换描述》", 《光学精密工程》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298961A (en) * 2014-06-30 2015-01-21 中国传媒大学 Mouth-movement-identification-based video marshalling method
CN104298961B (en) * 2014-06-30 2018-02-16 中国传媒大学 Video method of combination based on Mouth-Shape Recognition
CN108831463A (en) * 2018-06-28 2018-11-16 广州华多网络科技有限公司 Lip reading synthetic method, device, electronic equipment and storage medium
CN108831463B (en) * 2018-06-28 2021-11-12 广州方硅信息技术有限公司 Lip language synthesis method and device, electronic equipment and storage medium
CN109168067A (en) * 2018-11-02 2019-01-08 深圳Tcl新技术有限公司 Video timing correction method, correction terminal and computer readable storage medium
CN114554267A (en) * 2022-02-22 2022-05-27 上海艾融软件股份有限公司 Audio and video synchronization method and device based on digital twin technology
CN114554267B (en) * 2022-02-22 2024-04-02 上海艾融软件股份有限公司 Audio and video synchronization method and device based on digital twin technology

Also Published As

Publication number Publication date
CN103745462B (en) 2016-11-02

Similar Documents

Publication Publication Date Title
CN110531860B (en) Animation image driving method and device based on artificial intelligence
CN110163054A (en) A kind of face three-dimensional image generating method and device
CN112669417B (en) Virtual image generation method and device, storage medium and electronic equipment
CN103745423B (en) A kind of shape of the mouth as one speaks teaching system and teaching method
CN110853614A (en) Virtual object mouth shape driving method and device and terminal equipment
CN110415701A (en) The recognition methods of lip reading and its device
CN110751708A (en) Method and system for driving face animation in real time through voice
CN103745462A (en) Human body mouth shape video reconfiguration system and reconfiguration method
CN111261177A (en) Voice conversion method, electronic device and computer readable storage medium
CN113421547A (en) Voice processing method and related equipment
CN110379411A (en) For the phoneme synthesizing method and device of target speaker
CN114697759B (en) Virtual image video generation method and system, electronic device and storage medium
CN115049016A (en) Model driving method and device based on emotion recognition
CN114529785B (en) Model training method, video generating method and device, equipment and medium
CN115953521A (en) Remote digital human rendering method, device and system
CN112652037A (en) Method for drawing real-time music frequency spectrum vector graph
CN114170648A (en) Video generation method and device, electronic equipment and storage medium
CN111933154B (en) Method, equipment and computer readable storage medium for recognizing fake voice
CN111105487B (en) Face synthesis method and device in virtual teacher system
CN104484034A (en) Gesture motion element transition frame positioning method based on gesture recognition
CN114898018A (en) Animation generation method and device for digital object, electronic equipment and storage medium
CN114626458A (en) High-voltage rear part identification method and device, storage medium and terminal
CN114494542A (en) Character driving animation method and system based on convolutional neural network
CN112258392A (en) Super-resolution image training method, device, medium and equipment
CN117456063B (en) Face driving method and device based on voice, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant