CN115065844A - Self-adaptive adjustment method for action rhythm of anchor limb - Google Patents
Self-adaptive adjustment method for action rhythm of anchor limb Download PDFInfo
- Publication number
- CN115065844A CN115065844A CN202210568788.0A CN202210568788A CN115065844A CN 115065844 A CN115065844 A CN 115065844A CN 202210568788 A CN202210568788 A CN 202210568788A CN 115065844 A CN115065844 A CN 115065844A
- Authority
- CN
- China
- Prior art keywords
- video
- audio
- matched
- video frame
- clip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000033764 rhythmic process Effects 0.000 title claims abstract description 16
- 230000008447 perception Effects 0.000 claims abstract description 3
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 230000001360 synchronised effect Effects 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims 4
- 230000015572 biosynthetic process Effects 0.000 abstract description 4
- 238000003786 synthesis reaction Methods 0.000 abstract description 4
- 230000033001 locomotion Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004886 head movement Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/242—Synchronization processes, e.g. processing of PCR [Program Clock References]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention relates to a self-adaptive adjustment method of a podcast limb action rhythm, which adopts a brand new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches an initial video frame corresponding to the audio segment in a target video aiming at each audio segment and based on the consideration of a perception difference value between video frames, and accordingly completes the synchronization of the audio segment and the corresponding position on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization between the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.
Description
Technical Field
The invention relates to a self-adaptive adjustment method for the movement rhythm of a anchor limb, belonging to the technical field of audio and video synthesis.
Background
At present, many algorithms based on AI are used for changing the mouth shape of a character in a video by using a given sound to realize the synchronization of the mouth shape and the sound, but the algorithms in the prior art can only change the mouth shape of the character in the video and cannot change the limb actions (including head movement) of the character, which can cause the presented video character to have unnatural performance, for example, when the character speaks, no limb actions exist; or when the person is not speaking, there are many body movements.
Some methods in the prior art, such as the paper "Motion Representations for organized analysis", can change the body movements of the persons in the video, but these methods can only map the movements of the persons in the video a into the video B, and cannot automatically change the body movements according to the voices of the speakers, and the videos synthesized by these methods generally have the problems of "ghosting" and the like, and the visual effect is not acceptable.
Disclosure of Invention
The invention aims to solve the technical problem of providing a self-adaptive adjustment method for the action rhythm of the anchor limb, which can efficiently and accurately realize the synchronization of audio and video by adopting a brand-new design strategy and improve the audio and video processing efficiency.
The invention adopts the following technical scheme for solving the technical problems: the invention designs a self-adaptive adjustment method of a main body action rhythm, which is used for realizing the synchronization between a target audio and a target video, wherein the duration of the target video is more than or equal to the duration of the target audio, and the method comprises the following steps:
step A, processing a target audio by applying VAD technology to obtain the starting time and the ending time of each silence segment in the target audio, further obtaining each audio segment in sequence in the target audio, and then entering step B;
b, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining the PD values corresponding to the video frames respectively according to a perceptual difference mode, then initializing the target video as a video to be matched, and entering the step C; wherein I represents the length of a target video, and n is an integer value which is preset to be larger than 1;
c, aiming at each audio clip which is not synchronized in the target audio, selecting the first audio clip in sequence as the audio clip to be matched, and entering the step D;
d, determining the initial video frame of the audio clip to be matched corresponding to the video to be matched by using a minimum cost method according to the duration of the preset multiple of the duration of the audio clip to be matched, and then entering the step E;
step E, aiming at the video segment of the duration of the audio segment to be matched from the starting video frame corresponding to the audio segment to be matched in the video to be matched, executing corresponding frame supplementing or frame deleting operation according to the preset multiple corresponding to the duration of the audio segment to be matched, updating the video to be matched, and then entering the step F;
step F, based on the duration of the preset multiple of the duration of the audio segment to be matched in the step D, intercepting the audio segment of the audio segment to be matched in the target audio, synchronizing the intercepted audio segment to the video to be matched according to the initial video frame in the video to be matched corresponding to the audio segment to be matched, acquiring the video frame in the video to be matched corresponding to the tail of the intercepted audio segment, using the video frame as the breakpoint video frame of the video to be matched, and entering the step G;
g, judging whether each audio clip of the target audio has an unsynchronized audio clip, if so, updating a video segment from a breakpoint video frame to the tail in the video to be matched into the video to be matched, and returning to the step C; otherwise, the synchronization between the target audio and the target video is completed.
As a preferred technical scheme of the invention: in the step B, a perceptual difference value between the ith video frame and the (I + n) th video frame is obtained for each of the 1 st video frame to the I-n th video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-n th video frame in the target video.
As a preferred technical scheme of the invention: the step D executes the following steps D1 to D5, and determines that the audio clip to be matched corresponds to the initial video frame in the video to be matched;
step D1, obtaining the duration t of the preset multiple of the duration of the audio clip to be matched, and entering the step D2;
step D2., obtaining a time length T of the time length exceeding the time length T of the video to be matched, intercepting video segments of each time length T from the starting video frame according to preset interval time length aiming at the time length video segments from the starting video frame on the video to be matched, and then entering the step D3;
step D3, respectively aiming at each video clip, obtaining a video frame corresponding to the minimum PD value in the video clip as a reference video frame corresponding to the video clip, further obtaining a reference video frame corresponding to each video clip, and then entering the step D4;
step D4. is executed the following steps D4-1 to D4-5 for each video clip, respectively, to obtain a cost value corresponding to each video clip, and then step D5 is executed;
step D4-1, initializing j-a reference video frame corresponding to the video clip, and then entering step D4-2;
d4-2, obtaining the minimum PD value from the J + n video frame to the J video frame of the video clip as the costA corresponding to the video clip, and then entering the step D4-3; wherein, the J video frame is the last video frame in the video segment;
d4-3, obtaining the difference result of the PD value of the j + n video frame minus the PD value of the j-n video frame in the video clip, and entering the step D4-4;
d4-4, judging whether J + n is equal to J, if yes, entering the step D4-5; otherwise, updating by adding 1 according to the value of j, and returning to the step D4-3;
d4-5, obtaining the maximum value of the difference results as the costB corresponding to the video clip, and further obtaining the cost value corresponding to the video clip through the sum of the costA and the costB;
step D5. selects the reference video frame of the video segment corresponding to the minimum cost value as the starting video frame of the video to be matched corresponding to the audio segment to be matched according to the cost values corresponding to the video segments.
As a preferred technical scheme of the invention: and D, presetting the time length of the audio clip to be matched in the step D to be 0.7-1.3 times.
As a preferred technical scheme of the invention: said n is equal to 5.
Compared with the prior art, the self-adaptive adjustment method for the action rhythm of the anchor limb has the following technical effects:
the invention designs a self-adaptive adjustment method of the movement rhythm of the anchor limb, which adopts a brand-new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches an initial video frame corresponding to the audio segment in a target video according to each audio segment and based on the consideration of the perception difference value between video frames, and accordingly completes the synchronization of the audio segment and the corresponding position on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization among the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.
Drawings
FIG. 1 is a schematic flow chart of a self-adaptive adjustment method for the action rhythm of a anchor limb designed by the invention;
fig. 2 is a schematic diagram of an embodiment of the present invention.
Detailed Description
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention designs a self-adaptive adjustment method of a body movement rhythm of a anchor, which is used for realizing synchronization between a target audio and a target video, wherein the time length of the target video is greater than or equal to that of the target audio, and as shown in fig. 1, in practical application, the following steps A to G are specifically executed.
And step A, processing the target audio by applying VAD technology to obtain the starting time and the ending time of each silence segment in the target audio, further obtaining each audio segment in sequence in the target audio, and then entering the step B.
B, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining PD values corresponding to the video frames respectively according to a perceptual difference (perceptual difference), then initializing the target video as a video to be matched, and entering the step C; wherein, I represents the length of the target video, and n is an integer value greater than 1.
In practical application, in step B, perceptual difference (perceptual difference) values between the ith video frame and the I + nth video frame are obtained for each of the 1 st video frame to the I-nth video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-nth video frame in the target video.
And C, aiming at each audio segment which is not synchronized in the target audio, selecting the first audio segment in sequence as the audio segment to be matched, and entering the step D.
And D, determining the initial video frame of the audio clip to be matched in the video to be matched by using a minimum cost method according to the duration of the preset multiple of the duration of the audio clip to be matched, and then entering the step E.
In practical applications, the step D is specifically designed to execute the following steps D1 to D5, and determine that the audio clip to be matched corresponds to the starting video frame in the video to be matched.
And D1, obtaining the duration t of the preset multiple of the duration of the audio segment to be matched, and entering the step D2, wherein the duration t is 0.7 to 1.3 times of the duration of the audio segment to be matched.
Step D2. obtains the duration T of the time length exceeding the time length T of the video to be matched, and for the time length T video segment from the starting video frame on the video to be matched, the video segment of each time length T is intercepted from the starting video frame according to the preset interval time length, and then the step D3 is carried out.
And D3, respectively aiming at each video clip, obtaining a video frame corresponding to the minimum PD value in the video clip as a reference video frame corresponding to the video clip, further obtaining a reference video frame corresponding to each video clip, and then entering the step D4.
The step D4. executes the following steps D4-1 to D4-5 for each video clip, respectively, to obtain a cost value corresponding to each video clip, and then proceeds to the step D5.
Step D4-1, initialize j the reference video frame corresponding to the video clip, and then go to step D4-2.
D4-2, obtaining the minimum PD value from the J + n video frame to the J video frame of the video clip as the costA corresponding to the video clip, and then entering the step D4-3; wherein, the J video frame is the last video frame in the video segment.
And D4-3, obtaining the difference result of the PD value of the j + n video frame in the video clip minus the PD value of the j-n video frame, and proceeding to the step D4-4.
D4-4, judging whether J + n is equal to J, if yes, entering the step D4-5; otherwise, updating by adding 1 for the value of j, and returning to the step D4-3.
And D4-5, obtaining the maximum value of the difference results as the costB corresponding to the video clip, and further obtaining the cost value corresponding to the video clip through the sum of the costA and the costB.
Step D5. selects the reference video frame of the video segment corresponding to the minimum cost value as the starting video frame of the video to be matched corresponding to the audio segment to be matched according to the cost values corresponding to the video segments.
And E, aiming at the video segment of the duration of the audio segment to be matched from the starting video frame corresponding to the audio segment to be matched in the video to be matched, executing corresponding frame supplementing or frame deleting operation according to the preset multiple corresponding to the duration of the audio segment to be matched, such as the corresponding 0.7-1.3 times, updating the video to be matched, and then entering the step F.
And F, intercepting the audio segment containing the audio segment to be matched in the target audio based on the duration of the preset multiple of the duration of the audio segment to be matched in the step D, synchronizing the intercepted audio segment to the video to be matched according to the initial video frame in the video to be matched corresponding to the audio segment to be matched, acquiring the video frame in the video to be matched corresponding to the tail of the intercepted audio segment, using the video frame as the breakpoint video frame of the video to be matched, and entering the step G.
G, judging whether each audio clip of the target audio has an unsynchronized audio clip, if so, updating a video segment from a breakpoint video frame to the tail in the video to be matched into the video to be matched, and returning to the step C; otherwise, the synchronization between the target audio and the target video is completed.
In practical applications, such as when n is equal to 5, when VAD technology is applied to each target audio, each silence segment in the target audio is obtained, as shown in fig. 2, a small segment at the top, that is, each silence segment, a wave line at the bottom and distributed along the abscissa is a target video, a dotted line between the top silence segment and the bottom target video is an adjustment range of an interval between each audio segment and a corresponding video segment, and a solid line between the top silence segment and the bottom target video is a corresponding relationship between the head and tail positions of the audio segments and corresponding video frames.
The technical scheme designs a self-adaptive adjustment method for the movement rhythm of the anchor limb, which adopts a brand-new audio and video processing strategy, firstly obtains each audio segment in a target audio, then sequentially searches initial video frames corresponding to the audio segments in a target video according to the audio segments and the consideration of perceptual difference (perceptual difference) values between video frames, and accordingly completes the synchronization of the audio segments and corresponding positions on the target video, and finally realizes the synchronization between the target audio and the target video; in the execution of the design method, the corresponding relation between each audio clip and the target video can be accurately found, the synchronization between the audio clips and the target video is completed, and the efficiency of audio and video synthesis processing is effectively improved.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.
Claims (5)
1. A self-adaptive adjustment method for the action rhythm of a anchor limb is used for realizing the synchronization between a target audio and a target video, and the time length of the target video is greater than or equal to the time length of the target audio, and is characterized by comprising the following steps:
step A, processing a target audio by applying VAD technology to obtain the starting time and the ending time of each silence segment in the target audio, further obtaining each audio segment in sequence in the target audio, and then entering step B;
b, aiming at the 1 st video frame to the I-n video frame in the target video, obtaining the PD values corresponding to the video frames respectively according to a perception difference mode, then initializing the target video as a video to be matched, and entering the step C; wherein I represents the length of a target video, and n is an integer value which is preset to be larger than 1;
c, aiming at each audio clip which is not synchronized in the target audio, selecting the first audio clip in sequence as the audio clip to be matched, and entering the step D;
d, determining the initial video frame of the audio clip to be matched corresponding to the video to be matched by using a minimum cost method according to the duration of the preset multiple of the duration of the audio clip to be matched, and then entering the step E;
step E, aiming at the video segment of the duration of the audio segment to be matched from the starting video frame corresponding to the audio segment to be matched in the video to be matched, executing corresponding frame supplementing or frame deleting operation according to the preset multiple corresponding to the duration of the audio segment to be matched, updating the video to be matched, and then entering the step F;
step F, based on the duration of the preset multiple of the duration of the audio segment to be matched in the step D, intercepting the audio segment of the audio segment to be matched in the target audio, synchronizing the intercepted audio segment to the video to be matched according to the initial video frame in the video to be matched corresponding to the audio segment to be matched, acquiring the video frame in the video to be matched corresponding to the tail of the intercepted audio segment, using the video frame as the breakpoint video frame of the video to be matched, and entering the step G;
g, judging whether each audio clip of the target audio has an unsynchronized audio clip, if so, updating a video segment from a breakpoint video frame to the tail in the video to be matched into the video to be matched, and returning to the step C; otherwise, the synchronization between the target audio and the target video is completed.
2. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: in the step B, a perceptual difference value between the ith video frame and the (I + n) th video frame is obtained for each of the 1 st video frame to the I-n th video frame in the target video, so as to form a PD value corresponding to the ith video frame, where I is {1, …, I-n }, and then the PD values corresponding to each of the 1 st video frame to the I-n th video frame in the target video.
3. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: the step D executes the following steps D1 to D5, and determines that the audio clip to be matched corresponds to the initial video frame in the video to be matched;
step D1, obtaining the duration t of the preset multiple of the duration of the audio clip to be matched, and entering the step D2;
step D2., obtaining a time length T of the time length exceeding the time length T of the video to be matched, intercepting video segments of each time length T from the starting video frame according to preset interval time length aiming at the time length video segments from the starting video frame on the video to be matched, and then entering the step D3;
step D3, respectively aiming at each video clip, obtaining a video frame corresponding to the minimum PD value in the video clip as a reference video frame corresponding to the video clip, further obtaining a reference video frame corresponding to each video clip, and then entering the step D4;
step D4. is executed the following steps D4-1 to D4-5 for each video clip, respectively, to obtain a cost value corresponding to each video clip, and then step D5 is executed;
step D4-1, initializing j-a reference video frame corresponding to the video clip, and then entering step D4-2;
d4-2, obtaining the minimum PD value from the J + n video frame to the J video frame of the video clip as the costA corresponding to the video clip, and then entering the step D4-3; wherein, the J video frame is the last video frame in the video segment;
d4-3, obtaining the difference result of the PD value of the j + n video frame minus the PD value of the j-n video frame in the video clip, and entering the step D4-4;
d4-4, judging whether J + n is equal to J, if yes, entering the step D4-5; otherwise, updating by adding 1 according to the value of j, and returning to the step D4-3;
d4-5, obtaining the maximum value of the difference results as the costB corresponding to the video clip, and further obtaining the cost value corresponding to the video clip through the sum of the costA and the costB;
step D5. selects the reference video frame of the video segment corresponding to the minimum cost value as the starting video frame of the video to be matched corresponding to the audio segment to be matched according to the cost values corresponding to the video segments.
4. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: and D, presetting the time length of the audio clip to be matched in the step D to be 0.7-1.3 times.
5. The adaptive adjustment method for the rhythm of the anchor limb actions according to claim 1, characterized in that: said n is equal to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210568788.0A CN115065844B (en) | 2022-05-24 | 2022-05-24 | Self-adaptive adjustment method for motion rhythm of anchor limb |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210568788.0A CN115065844B (en) | 2022-05-24 | 2022-05-24 | Self-adaptive adjustment method for motion rhythm of anchor limb |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115065844A true CN115065844A (en) | 2022-09-16 |
CN115065844B CN115065844B (en) | 2023-09-12 |
Family
ID=83198183
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210568788.0A Active CN115065844B (en) | 2022-05-24 | 2022-05-24 | Self-adaptive adjustment method for motion rhythm of anchor limb |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115065844B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170142458A1 (en) * | 2015-11-16 | 2017-05-18 | Goji Watanabe | System and method for online collaboration of synchronized audio and video data from multiple users through an online browser |
CN109416842A (en) * | 2016-05-02 | 2019-03-01 | 华纳兄弟娱乐公司 | Geometric match in virtual reality and augmented reality |
US20190373237A1 (en) * | 2017-01-26 | 2019-12-05 | D-Box Technologies Inc. | Capturing and synchronizing motion with recorded audio/video |
CN111225237A (en) * | 2020-04-23 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Sound and picture matching method of video, related device and storage medium |
WO2020228473A1 (en) * | 2019-05-14 | 2020-11-19 | Goodix Technology (Hk) Company Limited | Method and system for speaker loudness control |
CN112437336A (en) * | 2020-11-19 | 2021-03-02 | 维沃移动通信有限公司 | Audio and video playing method and device, electronic equipment and storage medium |
CN113825005A (en) * | 2021-09-30 | 2021-12-21 | 北京跳悦智能科技有限公司 | Face video and audio synchronization method and system based on joint training |
CN113902818A (en) * | 2021-09-13 | 2022-01-07 | 上海科技大学 | Voice-driven human body action generation method based on implicit coding enhancement |
CN113992979A (en) * | 2021-10-27 | 2022-01-28 | 北京跳悦智能科技有限公司 | Video expansion method and system and computer equipment |
WO2022100262A1 (en) * | 2020-11-12 | 2022-05-19 | 海信视像科技股份有限公司 | Display device, human body posture detection method, and application |
-
2022
- 2022-05-24 CN CN202210568788.0A patent/CN115065844B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170142458A1 (en) * | 2015-11-16 | 2017-05-18 | Goji Watanabe | System and method for online collaboration of synchronized audio and video data from multiple users through an online browser |
CN109416842A (en) * | 2016-05-02 | 2019-03-01 | 华纳兄弟娱乐公司 | Geometric match in virtual reality and augmented reality |
US20190373237A1 (en) * | 2017-01-26 | 2019-12-05 | D-Box Technologies Inc. | Capturing and synchronizing motion with recorded audio/video |
WO2020228473A1 (en) * | 2019-05-14 | 2020-11-19 | Goodix Technology (Hk) Company Limited | Method and system for speaker loudness control |
CN111225237A (en) * | 2020-04-23 | 2020-06-02 | 腾讯科技(深圳)有限公司 | Sound and picture matching method of video, related device and storage medium |
WO2022100262A1 (en) * | 2020-11-12 | 2022-05-19 | 海信视像科技股份有限公司 | Display device, human body posture detection method, and application |
CN112437336A (en) * | 2020-11-19 | 2021-03-02 | 维沃移动通信有限公司 | Audio and video playing method and device, electronic equipment and storage medium |
CN113902818A (en) * | 2021-09-13 | 2022-01-07 | 上海科技大学 | Voice-driven human body action generation method based on implicit coding enhancement |
CN113825005A (en) * | 2021-09-30 | 2021-12-21 | 北京跳悦智能科技有限公司 | Face video and audio synchronization method and system based on joint training |
CN113992979A (en) * | 2021-10-27 | 2022-01-28 | 北京跳悦智能科技有限公司 | Video expansion method and system and computer equipment |
Non-Patent Citations (1)
Title |
---|
孔庆月;常淑凤;: "基于人体视听觉生理特征的3G可视电话评估方法", 煤炭技术, no. 04 * |
Also Published As
Publication number | Publication date |
---|---|
CN115065844B (en) | 2023-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105282591B (en) | The synchronization of independent output stream | |
US7848924B2 (en) | Method, apparatus and computer program product for providing voice conversion using temporal dynamic features | |
US20080259085A1 (en) | Method for Animating an Image Using Speech Data | |
WO2005059892A3 (en) | Virtual voiceprint system and method for generating voiceprints | |
JP2009537037A (en) | Method for switching from a first adaptive data processing version to a second adaptive data processing version | |
WO2021196646A1 (en) | Interactive object driving method and apparatus, device, and storage medium | |
CN113383384A (en) | Real-time generation of speech animation | |
US9913033B2 (en) | Synchronization of independent output streams | |
WO2021213008A1 (en) | Video sound and picture matching method, related device and storage medium | |
CN106057220B (en) | High-frequency extension method of audio signal and audio player | |
US11763813B2 (en) | Methods and systems for reducing latency in automated assistant interactions | |
CN109413475A (en) | Method of adjustment, device and the server of subtitle in a kind of video | |
CN113704390A (en) | Interaction method and device of virtual objects, computer readable medium and electronic equipment | |
CN115065844A (en) | Self-adaptive adjustment method for action rhythm of anchor limb | |
CN108290289B (en) | Method and system for synchronizing vibro-kinetic effects with virtual reality sessions | |
CN110491366B (en) | Audio smoothing method and device, computer equipment and storage medium | |
US7418388B2 (en) | Voice synthesizing method using independent sampling frequencies and apparatus therefor | |
US20220020196A1 (en) | System and method for voice driven lip syncing and head reenactment | |
CN106469559B (en) | Voice data adjusting method and device | |
CN109360588A (en) | A kind of mobile device-based audio-frequency processing method and device | |
JP2006243215A (en) | Data generating device for articulatory parameter interpolation, speech synthesizing device, and computer program | |
CN110310639B (en) | Interactive expression implementation method and terminal | |
GB2423905A (en) | Animated messaging | |
CN112398912A (en) | Voice signal acceleration method and device, computer equipment and storage medium | |
CN111063339A (en) | Intelligent interaction method, device, equipment and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |