CN115065844A - Self-adaptive adjustment method for action rhythm of anchor limb - Google Patents

Self-adaptive adjustment method for action rhythm of anchor limb Download PDF

Info

Publication number
CN115065844A
CN115065844A CN202210568788.0A CN202210568788A CN115065844A CN 115065844 A CN115065844 A CN 115065844A CN 202210568788 A CN202210568788 A CN 202210568788A CN 115065844 A CN115065844 A CN 115065844A
Authority
CN
China
Prior art keywords
video
audio
matched
video frame
clip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210568788.0A
Other languages
Chinese (zh)
Other versions
CN115065844B (en
Inventor
包英泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tiaoyue Intelligent Technology Co ltd
Original Assignee
Beijing Tiaoyue Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tiaoyue Intelligent Technology Co ltd filed Critical Beijing Tiaoyue Intelligent Technology Co ltd
Priority to CN202210568788.0A priority Critical patent/CN115065844B/en
Publication of CN115065844A publication Critical patent/CN115065844A/en
Application granted granted Critical
Publication of CN115065844B publication Critical patent/CN115065844B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention relates to a self-adaptive adjustment method of a podcast limb action rhythm, which adopts a brand new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches an initial video frame corresponding to the audio segment in a target video aiming at each audio segment and based on the consideration of a perception difference value between video frames, and accordingly completes the synchronization of the audio segment and the corresponding position on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization between the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.

Description

Self-adaptive adjustment method for action rhythm of anchor limb
Technical Field
The invention relates to a self-adaptive adjustment method for the movement rhythm of a anchor limb, belonging to the technical field of audio and video synthesis.
Background
At present, many algorithms based on AI are used for changing the mouth shape of a character in a video by using a given sound to realize the synchronization of the mouth shape and the sound, but the algorithms in the prior art can only change the mouth shape of the character in the video and cannot change the limb actions (including head movement) of the character, which can cause the presented video character to have unnatural performance, for example, when the character speaks, no limb actions exist; or when the person is not speaking, there are many body movements.
Some methods in the prior art, such as the paper "Motion Representations for organized analysis", can change the body movements of the persons in the video, but these methods can only map the movements of the persons in the video a into the video B, and cannot automatically change the body movements according to the voices of the speakers, and the videos synthesized by these methods generally have the problems of "ghosting" and the like, and the visual effect is not acceptable.
Disclosure of Invention
The invention aims to solve the technical problem of providing a self-adaptive adjustment method for the action rhythm of the anchor limb, which can efficiently and accurately realize the synchronization of audio and video by adopting a brand-new design strategy and improve the audio and video processing efficiency.
The invention adopts the following technical scheme for solving the technical problems: the invention designs a self-adaptive adjustment method of a main body action rhythm, which is used for realizing the synchronization between a target audio and a target video, wherein the duration of the target video is more than or equal to the duration of the target audio, and the method comprises the following steps:
step A, processing a target audio by applying VAD technology to obtain the starting time and the ending time of each silence segment in the target audio, further obtaining each audio segment in sequence in the target audio, and then entering step B;
b, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining the PD values corresponding to the video frames respectively according to a perceptual difference mode, then initializing the target video as a video to be matched, and entering the step C; wherein I represents the length of a target video, and n is an integer value which is preset to be larger than 1;
c, aiming at each audio clip which is not synchronized in the target audio, selecting the first audio clip in sequence as the audio clip to be matched, and entering the step D;
d, determining the initial video frame of the audio clip to be matched corresponding to the video to be matched by using a minimum cost method according to the duration of the preset multiple of the duration of the audio clip to be matched, and then entering the step E;
step E, aiming at the video segment of the duration of the audio segment to be matched from the starting video frame corresponding to the audio segment to be matched in the video to be matched, executing corresponding frame supplementing or frame deleting operation according to the preset multiple corresponding to the duration of the audio segment to be matched, updating the video to be matched, and then entering the step F;
step F, based on the duration of the preset multiple of the duration of the audio segment to be matched in the step D, intercepting the audio segment of the audio segment to be matched in the target audio, synchronizing the intercepted audio segment to the video to be matched according to the initial video frame in the video to be matched corresponding to the audio segment to be matched, acquiring the video frame in the video to be matched corresponding to the tail of the intercepted audio segment, using the video frame as the breakpoint video frame of the video to be matched, and entering the step G;
g, judging whether each audio clip of the target audio has an unsynchronized audio clip, if so, updating a video segment from a breakpoint video frame to the tail in the video to be matched into the video to be matched, and returning to the step C; otherwise, the synchronization between the target audio and the target video is completed.
As a preferred technical scheme of the invention: in the step B, a perceptual difference value between the ith video frame and the (I + n) th video frame is obtained for each of the 1 st video frame to the I-n th video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-n th video frame in the target video.
As a preferred technical scheme of the invention: the step D executes the following steps D1 to D5, and determines that the audio clip to be matched corresponds to the initial video frame in the video to be matched;
step D1, obtaining the duration t of the preset multiple of the duration of the audio clip to be matched, and entering the step D2;
step D2., obtaining a time length T of the time length exceeding the time length T of the video to be matched, intercepting video segments of each time length T from the starting video frame according to preset interval time length aiming at the time length video segments from the starting video frame on the video to be matched, and then entering the step D3;
step D3, respectively aiming at each video clip, obtaining a video frame corresponding to the minimum PD value in the video clip as a reference video frame corresponding to the video clip, further obtaining a reference video frame corresponding to each video clip, and then entering the step D4;
step D4. is executed the following steps D4-1 to D4-5 for each video clip, respectively, to obtain a cost value corresponding to each video clip, and then step D5 is executed;
step D4-1, initializing j-a reference video frame corresponding to the video clip, and then entering step D4-2;
d4-2, obtaining the minimum PD value from the J + n video frame to the J video frame of the video clip as the costA corresponding to the video clip, and then entering the step D4-3; wherein, the J video frame is the last video frame in the video segment;
d4-3, obtaining the difference result of the PD value of the j + n video frame minus the PD value of the j-n video frame in the video clip, and entering the step D4-4;
d4-4, judging whether J + n is equal to J, if yes, entering the step D4-5; otherwise, updating by adding 1 according to the value of j, and returning to the step D4-3;
d4-5, obtaining the maximum value of the difference results as the costB corresponding to the video clip, and further obtaining the cost value corresponding to the video clip through the sum of the costA and the costB;
step D5. selects the reference video frame of the video segment corresponding to the minimum cost value as the starting video frame of the video to be matched corresponding to the audio segment to be matched according to the cost values corresponding to the video segments.
As a preferred technical scheme of the invention: and D, presetting the time length of the audio clip to be matched in the step D to be 0.7-1.3 times.
As a preferred technical scheme of the invention: said n is equal to 5.
Compared with the prior art, the self-adaptive adjustment method for the action rhythm of the anchor limb has the following technical effects:
the invention designs a self-adaptive adjustment method of the movement rhythm of the anchor limb, which adopts a brand-new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches an initial video frame corresponding to the audio segment in a target video according to each audio segment and based on the consideration of the perception difference value between video frames, and accordingly completes the synchronization of the audio segment and the corresponding position on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization among the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.
Drawings
FIG. 1 is a schematic flow chart of a self-adaptive adjustment method for the action rhythm of a anchor limb designed by the invention;
fig. 2 is a schematic diagram of an embodiment of the present invention.
Detailed Description
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention designs a self-adaptive adjustment method of a body movement rhythm of a anchor, which is used for realizing synchronization between a target audio and a target video, wherein the time length of the target video is greater than or equal to that of the target audio, and as shown in fig. 1, in practical application, the following steps A to G are specifically executed.
And step A, processing the target audio by applying VAD technology to obtain the starting time and the ending time of each silence segment in the target audio, further obtaining each audio segment in sequence in the target audio, and then entering the step B.
B, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining PD values corresponding to the video frames respectively according to a perceptual difference (perceptual difference), then initializing the target video as a video to be matched, and entering the step C; wherein, I represents the length of the target video, and n is an integer value greater than 1.
In practical application, in step B, perceptual difference (perceptual difference) values between the ith video frame and the I + nth video frame are obtained for each of the 1 st video frame to the I-nth video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-nth video frame in the target video.
And C, aiming at each audio segment which is not synchronized in the target audio, selecting the first audio segment in sequence as the audio segment to be matched, and entering the step D.
And D, determining the initial video frame of the audio clip to be matched in the video to be matched by using a minimum cost method according to the duration of the preset multiple of the duration of the audio clip to be matched, and then entering the step E.
In practical applications, the step D is specifically designed to execute the following steps D1 to D5, and determine that the audio clip to be matched corresponds to the starting video frame in the video to be matched.
And D1, obtaining the duration t of the preset multiple of the duration of the audio segment to be matched, and entering the step D2, wherein the duration t is 0.7 to 1.3 times of the duration of the audio segment to be matched.
Step D2. obtains the duration T of the time length exceeding the time length T of the video to be matched, and for the time length T video segment from the starting video frame on the video to be matched, the video segment of each time length T is intercepted from the starting video frame according to the preset interval time length, and then the step D3 is carried out.
And D3, respectively aiming at each video clip, obtaining a video frame corresponding to the minimum PD value in the video clip as a reference video frame corresponding to the video clip, further obtaining a reference video frame corresponding to each video clip, and then entering the step D4.
The step D4. executes the following steps D4-1 to D4-5 for each video clip, respectively, to obtain a cost value corresponding to each video clip, and then proceeds to the step D5.
Step D4-1, initialize j the reference video frame corresponding to the video clip, and then go to step D4-2.
D4-2, obtaining the minimum PD value from the J + n video frame to the J video frame of the video clip as the costA corresponding to the video clip, and then entering the step D4-3; wherein, the J video frame is the last video frame in the video segment.
And D4-3, obtaining the difference result of the PD value of the j + n video frame in the video clip minus the PD value of the j-n video frame, and proceeding to the step D4-4.
D4-4, judging whether J + n is equal to J, if yes, entering the step D4-5; otherwise, updating by adding 1 for the value of j, and returning to the step D4-3.
And D4-5, obtaining the maximum value of the difference results as the costB corresponding to the video clip, and further obtaining the cost value corresponding to the video clip through the sum of the costA and the costB.
Step D5. selects the reference video frame of the video segment corresponding to the minimum cost value as the starting video frame of the video to be matched corresponding to the audio segment to be matched according to the cost values corresponding to the video segments.
And E, aiming at the video segment of the duration of the audio segment to be matched from the starting video frame corresponding to the audio segment to be matched in the video to be matched, executing corresponding frame supplementing or frame deleting operation according to the preset multiple corresponding to the duration of the audio segment to be matched, such as the corresponding 0.7-1.3 times, updating the video to be matched, and then entering the step F.
And F, intercepting the audio segment containing the audio segment to be matched in the target audio based on the duration of the preset multiple of the duration of the audio segment to be matched in the step D, synchronizing the intercepted audio segment to the video to be matched according to the initial video frame in the video to be matched corresponding to the audio segment to be matched, acquiring the video frame in the video to be matched corresponding to the tail of the intercepted audio segment, using the video frame as the breakpoint video frame of the video to be matched, and entering the step G.
G, judging whether each audio clip of the target audio has an unsynchronized audio clip, if so, updating a video segment from a breakpoint video frame to the tail in the video to be matched into the video to be matched, and returning to the step C; otherwise, the synchronization between the target audio and the target video is completed.
In practical applications, such as when n is equal to 5, when VAD technology is applied to each target audio, each silence segment in the target audio is obtained, as shown in fig. 2, a small segment at the top, that is, each silence segment, a wave line at the bottom and distributed along the abscissa is a target video, a dotted line between the top silence segment and the bottom target video is an adjustment range of an interval between each audio segment and a corresponding video segment, and a solid line between the top silence segment and the bottom target video is a corresponding relationship between the head and tail positions of the audio segments and corresponding video frames.
The technical scheme designs a self-adaptive adjustment method for the movement rhythm of the anchor limb, which adopts a brand-new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches initial video frames corresponding to the audio segments in a target video according to the audio segments and the consideration of perceptual difference (perceptual difference) values between video frames, and accordingly completes the synchronization of the audio segments and corresponding positions on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization between the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (5)

1. A self-adaptive adjustment method for the action rhythm of a anchor limb is used for realizing the synchronization between a target audio and a target video, and the time length of the target video is greater than or equal to the time length of the target audio, and is characterized by comprising the following steps:
step A, processing a target audio by applying VAD technology to obtain the starting time and the ending time of each silence segment in the target audio, further obtaining each audio segment in sequence in the target audio, and then entering step B;
b, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining the PD values corresponding to the video frames respectively according to a perception difference mode, then initializing the target video as a video to be matched, and entering the step C; wherein I represents the length of a target video, and n is an integer value which is preset to be larger than 1;
c, aiming at each audio clip which is not synchronized in the target audio, selecting the first audio clip in sequence as the audio clip to be matched, and entering the step D;
d, determining the initial video frame of the audio clip to be matched corresponding to the video to be matched by using a minimum cost method according to the duration of the preset multiple of the duration of the audio clip to be matched, and then entering the step E;
step E, aiming at the video segment of the duration of the audio segment to be matched from the starting video frame corresponding to the audio segment to be matched in the video to be matched, executing corresponding frame supplementing or frame deleting operation according to the preset multiple corresponding to the duration of the audio segment to be matched, updating the video to be matched, and then entering the step F;
step F, based on the duration of the preset multiple of the duration of the audio segment to be matched in the step D, intercepting the audio segment of the audio segment to be matched in the target audio, synchronizing the intercepted audio segment to the video to be matched according to the initial video frame in the video to be matched corresponding to the audio segment to be matched, acquiring the video frame in the video to be matched corresponding to the tail of the intercepted audio segment, using the video frame as the breakpoint video frame of the video to be matched, and entering the step G;
g, judging whether each audio clip of the target audio has an unsynchronized audio clip, if so, updating a video segment from a breakpoint video frame to the tail in the video to be matched into the video to be matched, and returning to the step C; otherwise, the synchronization between the target audio and the target video is completed.
2. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: in the step B, a perceptual difference value between the ith video frame and the (I + n) th video frame is obtained for each of the 1 st video frame to the I-n th video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-n th video frame in the target video.
3. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: the step D executes the following steps D1 to D5, and determines that the audio clip to be matched corresponds to the initial video frame in the video to be matched;
step D1, obtaining the duration t of the preset multiple of the duration of the audio clip to be matched, and entering the step D2;
step D2., obtaining a time length T of the time length exceeding the time length T of the video to be matched, intercepting video segments of each time length T from the starting video frame according to preset interval time length aiming at the time length video segments from the starting video frame on the video to be matched, and then entering the step D3;
step D3, respectively aiming at each video clip, obtaining a video frame corresponding to the minimum PD value in the video clip as a reference video frame corresponding to the video clip, further obtaining a reference video frame corresponding to each video clip, and then entering the step D4;
step D4. is executed the following steps D4-1 to D4-5 for each video clip, respectively, to obtain a cost value corresponding to each video clip, and then step D5 is executed;
step D4-1, initializing j-a reference video frame corresponding to the video clip, and then entering step D4-2;
d4-2, obtaining the minimum PD value from the J + n video frame to the J video frame of the video clip as the costA corresponding to the video clip, and then entering the step D4-3; wherein, the J video frame is the last video frame in the video segment;
d4-3, obtaining the difference result of the PD value of the j + n video frame minus the PD value of the j-n video frame in the video clip, and entering the step D4-4;
d4-4, judging whether J + n is equal to J, if yes, entering the step D4-5; otherwise, updating by adding 1 according to the value of j, and returning to the step D4-3;
d4-5, obtaining the maximum value of the difference results as the costB corresponding to the video clip, and further obtaining the cost value corresponding to the video clip through the sum of the costA and the costB;
step D5. selects the reference video frame of the video segment corresponding to the minimum cost value as the starting video frame of the video to be matched corresponding to the audio segment to be matched according to the cost values corresponding to the video segments.
4. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: and D, presetting the time length of the audio clip to be matched in the step D to be 0.7-1.3 times.
5. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: said n is equal to 5.
CN202210568788.0A 2022-05-24 2022-05-24 Self-adaptive adjustment method for motion rhythm of anchor limb Active CN115065844B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210568788.0A CN115065844B (en) 2022-05-24 2022-05-24 Self-adaptive adjustment method for motion rhythm of anchor limb

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210568788.0A CN115065844B (en) 2022-05-24 2022-05-24 Self-adaptive adjustment method for motion rhythm of anchor limb

Publications (2)

Publication Number Publication Date
CN115065844A true CN115065844A (en) 2022-09-16
CN115065844B CN115065844B (en) 2023-09-12

Family

ID=83198183

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210568788.0A Active CN115065844B (en) 2022-05-24 2022-05-24 Self-adaptive adjustment method for motion rhythm of anchor limb

Country Status (1)

Country Link
CN (1) CN115065844B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170142458A1 (en) * 2015-11-16 2017-05-18 Goji Watanabe System and method for online collaboration of synchronized audio and video data from multiple users through an online browser
CN109416842A (en) * 2016-05-02 2019-03-01 华纳兄弟娱乐公司 Geometric match in virtual reality and augmented reality
US20190373237A1 (en) * 2017-01-26 2019-12-05 D-Box Technologies Inc. Capturing and synchronizing motion with recorded audio/video
CN111225237A (en) * 2020-04-23 2020-06-02 腾讯科技(深圳)有限公司 Sound and picture matching method of video, related device and storage medium
WO2020228473A1 (en) * 2019-05-14 2020-11-19 Goodix Technology (Hk) Company Limited Method and system for speaker loudness control
CN112437336A (en) * 2020-11-19 2021-03-02 维沃移动通信有限公司 Audio and video playing method and device, electronic equipment and storage medium
CN113825005A (en) * 2021-09-30 2021-12-21 北京跳悦智能科技有限公司 Face video and audio synchronization method and system based on joint training
CN113902818A (en) * 2021-09-13 2022-01-07 上海科技大学 Voice-driven human body action generation method based on implicit coding enhancement
CN113992979A (en) * 2021-10-27 2022-01-28 北京跳悦智能科技有限公司 Video expansion method and system and computer equipment
WO2022100262A1 (en) * 2020-11-12 2022-05-19 海信视像科技股份有限公司 Display device, human body posture detection method, and application

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170142458A1 (en) * 2015-11-16 2017-05-18 Goji Watanabe System and method for online collaboration of synchronized audio and video data from multiple users through an online browser
CN109416842A (en) * 2016-05-02 2019-03-01 华纳兄弟娱乐公司 Geometric match in virtual reality and augmented reality
US20190373237A1 (en) * 2017-01-26 2019-12-05 D-Box Technologies Inc. Capturing and synchronizing motion with recorded audio/video
WO2020228473A1 (en) * 2019-05-14 2020-11-19 Goodix Technology (Hk) Company Limited Method and system for speaker loudness control
CN111225237A (en) * 2020-04-23 2020-06-02 腾讯科技(深圳)有限公司 Sound and picture matching method of video, related device and storage medium
WO2022100262A1 (en) * 2020-11-12 2022-05-19 海信视像科技股份有限公司 Display device, human body posture detection method, and application
CN112437336A (en) * 2020-11-19 2021-03-02 维沃移动通信有限公司 Audio and video playing method and device, electronic equipment and storage medium
CN113902818A (en) * 2021-09-13 2022-01-07 上海科技大学 Voice-driven human body action generation method based on implicit coding enhancement
CN113825005A (en) * 2021-09-30 2021-12-21 北京跳悦智能科技有限公司 Face video and audio synchronization method and system based on joint training
CN113992979A (en) * 2021-10-27 2022-01-28 北京跳悦智能科技有限公司 Video expansion method and system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孔庆月;常淑凤;: "基于人体视听觉生理特征的3G可视电话评估方法", 煤炭技术, no. 04 *

Also Published As

Publication number Publication date
CN115065844B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN105282591B (en) The synchronization of independent output stream
US7848924B2 (en) Method, apparatus and computer program product for providing voice conversion using temporal dynamic features
US20080259085A1 (en) Method for Animating an Image Using Speech Data
WO2005059892A3 (en) Virtual voiceprint system and method for generating voiceprints
JP2009537037A (en) Method for switching from a first adaptive data processing version to a second adaptive data processing version
WO2021196646A1 (en) Interactive object driving method and apparatus, device, and storage medium
CN113383384A (en) Real-time generation of speech animation
US9913033B2 (en) Synchronization of independent output streams
WO2021213008A1 (en) Video sound and picture matching method, related device and storage medium
CN106057220B (en) High-frequency extension method of audio signal and audio player
US11763813B2 (en) Methods and systems for reducing latency in automated assistant interactions
CN109413475A (en) Method of adjustment, device and the server of subtitle in a kind of video
CN113704390A (en) Interaction method and device of virtual objects, computer readable medium and electronic equipment
CN115065844A (en) Self-adaptive adjustment method for action rhythm of anchor limb
CN108290289B (en) Method and system for synchronizing vibro-kinetic effects with virtual reality sessions
CN110491366B (en) Audio smoothing method and device, computer equipment and storage medium
US7418388B2 (en) Voice synthesizing method using independent sampling frequencies and apparatus therefor
US20220020196A1 (en) System and method for voice driven lip syncing and head reenactment
CN106469559B (en) Voice data adjusting method and device
CN109360588A (en) A kind of mobile device-based audio-frequency processing method and device
JP2006243215A (en) Data generating device for articulatory parameter interpolation, speech synthesizing device, and computer program
CN110310639B (en) Interactive expression implementation method and terminal
GB2423905A (en) Animated messaging
CN112398912A (en) Voice signal acceleration method and device, computer equipment and storage medium
CN111063339A (en) Intelligent interaction method, device, equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant