CN115065844A

CN115065844A - Self-adaptive adjustment method for action rhythm of anchor limb

Info

Publication number: CN115065844A
Application number: CN202210568788.0A
Authority: CN
Inventors: 包英泽
Original assignee: Beijing Tiaoyue Intelligent Technology Co ltd
Current assignee: Beijing Tiaoyue Intelligent Technology Co ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-09-16
Anticipated expiration: 2042-05-24
Also published as: CN115065844B

Abstract

The invention relates to a self-adaptive adjustment method of a podcast limb action rhythm, which adopts a brand new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches an initial video frame corresponding to the audio segment in a target video aiming at each audio segment and based on the consideration of a perception difference value between video frames, and accordingly completes the synchronization of the audio segment and the corresponding position on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization between the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.

Description

Self-adaptive adjustment method for action rhythm of anchor limb

Technical Field

The invention relates to a self-adaptive adjustment method for the movement rhythm of a anchor limb, belonging to the technical field of audio and video synthesis.

Background

At present, many algorithms based on AI are used for changing the mouth shape of a character in a video by using a given sound to realize the synchronization of the mouth shape and the sound, but the algorithms in the prior art can only change the mouth shape of the character in the video and cannot change the limb actions (including head movement) of the character, which can cause the presented video character to have unnatural performance, for example, when the character speaks, no limb actions exist; or when the person is not speaking, there are many body movements.

Some methods in the prior art, such as the paper "Motion Representations for organized analysis", can change the body movements of the persons in the video, but these methods can only map the movements of the persons in the video a into the video B, and cannot automatically change the body movements according to the voices of the speakers, and the videos synthesized by these methods generally have the problems of "ghosting" and the like, and the visual effect is not acceptable.

Disclosure of Invention

The invention aims to solve the technical problem of providing a self-adaptive adjustment method for the action rhythm of the anchor limb, which can efficiently and accurately realize the synchronization of audio and video by adopting a brand-new design strategy and improve the audio and video processing efficiency.

The invention adopts the following technical scheme for solving the technical problems: the invention designs a self-adaptive adjustment method of a main body action rhythm, which is used for realizing the synchronization between a target audio and a target video, wherein the duration of the target video is more than or equal to the duration of the target audio, and the method comprises the following steps:

step A, processing a target audio by applying VAD technology to obtain the starting time and the ending time of each silence segment in the target audio, further obtaining each audio segment in sequence in the target audio, and then entering step B;

b, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining the PD values corresponding to the video frames respectively according to a perceptual difference mode, then initializing the target video as a video to be matched, and entering the step C; wherein I represents the length of a target video, and n is an integer value which is preset to be larger than 1;

c, aiming at each audio clip which is not synchronized in the target audio, selecting the first audio clip in sequence as the audio clip to be matched, and entering the step D;

d, determining the initial video frame of the audio clip to be matched corresponding to the video to be matched by using a minimum cost method according to the duration of the preset multiple of the duration of the audio clip to be matched, and then entering the step E;

step E, aiming at the video segment of the duration of the audio segment to be matched from the starting video frame corresponding to the audio segment to be matched in the video to be matched, executing corresponding frame supplementing or frame deleting operation according to the preset multiple corresponding to the duration of the audio segment to be matched, updating the video to be matched, and then entering the step F;

step F, based on the duration of the preset multiple of the duration of the audio segment to be matched in the step D, intercepting the audio segment of the audio segment to be matched in the target audio, synchronizing the intercepted audio segment to the video to be matched according to the initial video frame in the video to be matched corresponding to the audio segment to be matched, acquiring the video frame in the video to be matched corresponding to the tail of the intercepted audio segment, using the video frame as the breakpoint video frame of the video to be matched, and entering the step G;

g, judging whether each audio clip of the target audio has an unsynchronized audio clip, if so, updating a video segment from a breakpoint video frame to the tail in the video to be matched into the video to be matched, and returning to the step C; otherwise, the synchronization between the target audio and the target video is completed.

As a preferred technical scheme of the invention: in the step B, a perceptual difference value between the ith video frame and the (I + n) th video frame is obtained for each of the 1 st video frame to the I-n th video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-n th video frame in the target video.

As a preferred technical scheme of the invention: the step D executes the following steps D1 to D5, and determines that the audio clip to be matched corresponds to the initial video frame in the video to be matched;

step D1, obtaining the duration t of the preset multiple of the duration of the audio clip to be matched, and entering the step D2;

step D2., obtaining a time length T of the time length exceeding the time length T of the video to be matched, intercepting video segments of each time length T from the starting video frame according to preset interval time length aiming at the time length video segments from the starting video frame on the video to be matched, and then entering the step D3;

step D3, respectively aiming at each video clip, obtaining a video frame corresponding to the minimum PD value in the video clip as a reference video frame corresponding to the video clip, further obtaining a reference video frame corresponding to each video clip, and then entering the step D4;

step D4. is executed the following steps D4-1 to D4-5 for each video clip, respectively, to obtain a cost value corresponding to each video clip, and then step D5 is executed;

step D4-1, initializing j-a reference video frame corresponding to the video clip, and then entering step D4-2;

d4-2, obtaining the minimum PD value from the J + n video frame to the J video frame of the video clip as the costA corresponding to the video clip, and then entering the step D4-3; wherein, the J video frame is the last video frame in the video segment;

d4-3, obtaining the difference result of the PD value of the j + n video frame minus the PD value of the j-n video frame in the video clip, and entering the step D4-4;

d4-4, judging whether J + n is equal to J, if yes, entering the step D4-5; otherwise, updating by adding 1 according to the value of j, and returning to the step D4-3;

d4-5, obtaining the maximum value of the difference results as the costB corresponding to the video clip, and further obtaining the cost value corresponding to the video clip through the sum of the costA and the costB;

step D5. selects the reference video frame of the video segment corresponding to the minimum cost value as the starting video frame of the video to be matched corresponding to the audio segment to be matched according to the cost values corresponding to the video segments.

As a preferred technical scheme of the invention: and D, presetting the time length of the audio clip to be matched in the step D to be 0.7-1.3 times.

As a preferred technical scheme of the invention: said n is equal to 5.

Compared with the prior art, the self-adaptive adjustment method for the action rhythm of the anchor limb has the following technical effects:

the invention designs a self-adaptive adjustment method of the movement rhythm of the anchor limb, which adopts a brand-new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches an initial video frame corresponding to the audio segment in a target video according to each audio segment and based on the consideration of the perception difference value between video frames, and accordingly completes the synchronization of the audio segment and the corresponding position on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization among the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.

Drawings

FIG. 1 is a schematic flow chart of a self-adaptive adjustment method for the action rhythm of a anchor limb designed by the invention;

fig. 2 is a schematic diagram of an embodiment of the present invention.

Detailed Description

The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.

The invention designs a self-adaptive adjustment method of a body movement rhythm of a anchor, which is used for realizing synchronization between a target audio and a target video, wherein the time length of the target video is greater than or equal to that of the target audio, and as shown in fig. 1, in practical application, the following steps A to G are specifically executed.

And step A, processing the target audio by applying VAD technology to obtain the starting time and the ending time of each silence segment in the target audio, further obtaining each audio segment in sequence in the target audio, and then entering the step B.

B, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining PD values corresponding to the video frames respectively according to a perceptual difference (perceptual difference), then initializing the target video as a video to be matched, and entering the step C; wherein, I represents the length of the target video, and n is an integer value greater than 1.

In practical application, in step B, perceptual difference (perceptual difference) values between the ith video frame and the I + nth video frame are obtained for each of the 1 st video frame to the I-nth video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-nth video frame in the target video.

And C, aiming at each audio segment which is not synchronized in the target audio, selecting the first audio segment in sequence as the audio segment to be matched, and entering the step D.

And D, determining the initial video frame of the audio clip to be matched in the video to be matched by using a minimum cost method according to the duration of the preset multiple of the duration of the audio clip to be matched, and then entering the step E.

In practical applications, the step D is specifically designed to execute the following steps D1 to D5, and determine that the audio clip to be matched corresponds to the starting video frame in the video to be matched.

And D1, obtaining the duration t of the preset multiple of the duration of the audio segment to be matched, and entering the step D2, wherein the duration t is 0.7 to 1.3 times of the duration of the audio segment to be matched.

Step D2. obtains the duration T of the time length exceeding the time length T of the video to be matched, and for the time length T video segment from the starting video frame on the video to be matched, the video segment of each time length T is intercepted from the starting video frame according to the preset interval time length, and then the step D3 is carried out.

And D3, respectively aiming at each video clip, obtaining a video frame corresponding to the minimum PD value in the video clip as a reference video frame corresponding to the video clip, further obtaining a reference video frame corresponding to each video clip, and then entering the step D4.

The step D4. executes the following steps D4-1 to D4-5 for each video clip, respectively, to obtain a cost value corresponding to each video clip, and then proceeds to the step D5.

Step D4-1, initialize j the reference video frame corresponding to the video clip, and then go to step D4-2.

D4-2, obtaining the minimum PD value from the J + n video frame to the J video frame of the video clip as the costA corresponding to the video clip, and then entering the step D4-3; wherein, the J video frame is the last video frame in the video segment.

And D4-3, obtaining the difference result of the PD value of the j + n video frame in the video clip minus the PD value of the j-n video frame, and proceeding to the step D4-4.

D4-4, judging whether J + n is equal to J, if yes, entering the step D4-5; otherwise, updating by adding 1 for the value of j, and returning to the step D4-3.

And D4-5, obtaining the maximum value of the difference results as the costB corresponding to the video clip, and further obtaining the cost value corresponding to the video clip through the sum of the costA and the costB.

And E, aiming at the video segment of the duration of the audio segment to be matched from the starting video frame corresponding to the audio segment to be matched in the video to be matched, executing corresponding frame supplementing or frame deleting operation according to the preset multiple corresponding to the duration of the audio segment to be matched, such as the corresponding 0.7-1.3 times, updating the video to be matched, and then entering the step F.

And F, intercepting the audio segment containing the audio segment to be matched in the target audio based on the duration of the preset multiple of the duration of the audio segment to be matched in the step D, synchronizing the intercepted audio segment to the video to be matched according to the initial video frame in the video to be matched corresponding to the audio segment to be matched, acquiring the video frame in the video to be matched corresponding to the tail of the intercepted audio segment, using the video frame as the breakpoint video frame of the video to be matched, and entering the step G.

In practical applications, such as when n is equal to 5, when VAD technology is applied to each target audio, each silence segment in the target audio is obtained, as shown in fig. 2, a small segment at the top, that is, each silence segment, a wave line at the bottom and distributed along the abscissa is a target video, a dotted line between the top silence segment and the bottom target video is an adjustment range of an interval between each audio segment and a corresponding video segment, and a solid line between the top silence segment and the bottom target video is a corresponding relationship between the head and tail positions of the audio segments and corresponding video frames.

The technical scheme designs a self-adaptive adjustment method for the movement rhythm of the anchor limb, which adopts a brand-new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches initial video frames corresponding to the audio segments in a target video according to the audio segments and the consideration of perceptual difference (perceptual difference) values between video frames, and accordingly completes the synchronization of the audio segments and corresponding positions on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization between the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.

The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. A self-adaptive adjustment method for the action rhythm of a anchor limb is used for realizing the synchronization between a target audio and a target video, and the time length of the target video is greater than or equal to the time length of the target audio, and is characterized by comprising the following steps:

b, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining the PD values corresponding to the video frames respectively according to a perception difference mode, then initializing the target video as a video to be matched, and entering the step C; wherein I represents the length of a target video, and n is an integer value which is preset to be larger than 1;

2. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: in the step B, a perceptual difference value between the ith video frame and the (I + n) th video frame is obtained for each of the 1 st video frame to the I-n th video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-n th video frame in the target video.

3. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: the step D executes the following steps D1 to D5, and determines that the audio clip to be matched corresponds to the initial video frame in the video to be matched;

4. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: and D, presetting the time length of the audio clip to be matched in the step D to be 0.7-1.3 times.

5. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: said n is equal to 5.