CN110602550A - Video processing method, electronic equipment and storage medium - Google Patents

Video processing method, electronic equipment and storage medium Download PDF

Info

Publication number
CN110602550A
CN110602550A CN201910733269.3A CN201910733269A CN110602550A CN 110602550 A CN110602550 A CN 110602550A CN 201910733269 A CN201910733269 A CN 201910733269A CN 110602550 A CN110602550 A CN 110602550A
Authority
CN
China
Prior art keywords
video
music
frames
frame
marking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910733269.3A
Other languages
Chinese (zh)
Inventor
颜伟婷
杜欧杰
吴嘉旭
李立锋
陈国仕
李鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MIGU Culture Technology Co Ltd
MIGU Animation Co Ltd
MIGU Comic Co Ltd
Original Assignee
MIGU Culture Technology Co Ltd
MIGU Animation Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MIGU Culture Technology Co Ltd, MIGU Animation Co Ltd filed Critical MIGU Culture Technology Co Ltd
Priority to CN201910733269.3A priority Critical patent/CN110602550A/en
Publication of CN110602550A publication Critical patent/CN110602550A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/637Control signals issued by the client directed to the server or network components
    • H04N21/6373Control signals issued by the client directed to the server or network components for rate control, e.g. request to the server to modify its transmission rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The embodiment of the invention relates to the field of electronic information, and discloses a video processing method, electronic equipment and a storage medium, wherein the video processing method comprises the following steps: analyzing images of a video to be processed to obtain characteristic actions in the video, and acquiring music beats of music; matching the beats of the characteristic actions in the video with the music beats of the music respectively to obtain the music matched with the beats of the characteristic actions in the video; and taking the matched music as the score of the video, and synthesizing the music with the video. Through automatic audio and video analysis, match, greatly reduced consuming time of video preparation process when having promoted the quality of the video of making, practiced thrift the time cost, improve user and use experience.

Description

Video processing method, electronic equipment and storage medium
Technical Field
The present invention relates to the field of electronic information, and in particular, to a video processing method, an electronic device, and a storage medium.
Background
With the development of electronic information technology, more and more people can start to make small videos for recording life experiences for watching by themselves or others, and the making method adopts a mode of combining videos and background music; the other method is that a user selects a piece of music first, then records a piece of video according to the music rhythm, and the rhythm of the music on the card is measured in the recording process.
However, the inventors of the present invention found that: in the process of video production, whether the first video production method or the second video production method is adopted, the video and music are required to be artificially stepped, so that more production time of a user is required to be consumed, and the situation of inaccurate stepping is easy to occur in the process of post production.
Disclosure of Invention
The embodiment of the invention aims to provide a video processing method, an electronic device and a storage medium, solves the problems of long time consumption, high cost and inaccurate stepping in the short video making process, and improves the quality of made videos and the use experience of users.
To solve the above technical problem, an embodiment of the present invention provides a video processing method, including: analyzing images of a video to be processed to obtain characteristic actions in the video, and acquiring music beats of music; matching the beats of the characteristic actions in the video with the music beats of the music respectively to obtain the music matched with the beats of the characteristic actions in the video; and taking the matched music as the score of the video, and synthesizing the music with the video.
An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the video processing method.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, and the computer program realizes the video processing method when being executed by a processor.
Compared with the prior art, the short video production method has the main differences and effects that in the process of producing the short video, the image analysis is carried out on the video to be synthesized, the characteristic action of the corresponding stepping point in the video is obtained, the music beat of each music is obtained, the music beat and the characteristic action beat are matched, the dubbing music is determined according to the matching degree of the music beat and the characteristic action beat, then the audio and the video are synthesized to obtain the finished product short video, the video and the music are automatically analyzed and matched through the system, the proper music is automatically selected to be used as the video dubbing music, the production time required in the production process of the short video is reduced, the matching degree of the dubbing music and the video is improved, and meanwhile the quality of the short video and the use experience of a user are improved.
In addition, image analysis is performed on the video to be processed to obtain each characteristic action in the video, and the method specifically comprises the following steps: marking video frames containing characteristic actions in a video to obtain video marked frames; and acquiring the music tempo of each music, comprising: marking the corresponding position of the beat point in the music, which accords with the preset rule, on the audio track to obtain an audio marking frame; the method for matching the beats of the characteristic actions in the video with the music beats of the music respectively comprises the following steps: calculating the similarity between the distance between every two adjacent audio mark frames in the music and the distance between every two adjacent video mark frames in the video; and taking the music with the calculated similarity larger than a preset threshold value as the music matched with the beats of the characteristic actions in the video. The video frames and the audio frames corresponding to the beats are marked firstly, and the matching degree of the music and the video is judged according to the similarity of the distance between every two adjacent marks, so that the music matched with the video can be accurately and quickly screened out.
In addition, before the beat of each characteristic motion in the video is respectively matched with the music beat of each music, the method further comprises the following steps: if the number of the video marking frames is less than the preset number, calculating the picture similarity of two adjacent key frames in the video, and marking any key frame of the two key frames with the picture similarity less than the preset value to obtain a supplemented video marking frame. By supplementing the number of video marker frames, the matching effect of matching music and video is better.
In addition, the step of synthesizing the matched music as the score of the video and the video comprises the following steps: matching the video marking frame of the video with the audio marking frame of the matched music on a time axis, and positioning a target video marking frame of which the matching degree with the audio marking frame is lower than a preset threshold; adjusting the video clip to which the target video marking frame belongs, wherein the coincidence degree of the adjusted target video marking frame and the audio marking frame reaches a preset threshold; wherein, the adjustment comprises one of the following or any combination thereof: the method comprises the steps of regulating the speed of partial video frames of a video, inserting the video frames into a video clip to which a target video mark frame belongs, and deleting the video frames from the video clip to which the target video mark frame belongs, and considering the situation that the coincidence degree between marks on a time axis is insufficient in the process of synthesizing the video and the audio, the marks with the coincidence degree not meeting the requirement need to be correspondingly regulated, so that the effect of improving the quality of synthesizing the video and the audio is achieved.
In addition, the adjusting the video segment according to the target video mark frame comprises: acquiring the video type of a video; determining an adjusting mode corresponding to the video type according to the video type; and adjusting the video clip to which the target video mark frame belongs in a determined adjusting mode. Due to the fact that the videos have different types, corresponding adjustment schemes are adopted for the videos of different types during adjustment, video frames needing to be adjusted can be adjusted more effectively, and the adjusted videos and the audio can have better synthesis effect during synthesis.
In addition, the video types include a dance type, a scenario type, a landscape type, and a pet type; the adjustment mode corresponding to the dance type comprises the following steps: according to the position of a target video marking frame on a time axis, regulating the speed of a video frame corresponding to highlight moments in a video; the adjustment mode corresponding to the scenario type comprises the following steps: according to the position of the target video mark frame on a time axis, selectively inserting or deleting the video frame in the video clip to which the target video mark frame belongs; the adjustment mode corresponding to the landscape type comprises the following steps: according to the position of a target video marking frame on a time axis, carrying out speed regulation on a video frame of scene switching in a video; the adjustment mode corresponding to the pet type comprises the following steps: and repeatedly playing the video frames corresponding to the highlight moments in the video according to the positions of the target video mark frames on the time axis.
In addition, before the matching music is used as the score of the video and is synthesized with the video, the method further comprises the following steps: judging whether the number of the video marking frames and the number of the audio marking frames of the matched music reach a preset number matching degree or not; if the number matching degree is met, performing the combination of the matched music serving as the video and the video; if the number matching degree is not satisfied, calculating the picture similarity of two adjacent key frames in the video, and marking any key frame of the two key frames with the picture similarity smaller than a preset value to obtain a supplemented video marking frame; videomark frames matched on the time axis, including supplemental videomark frames.
In addition, whether the number of the video mark frames and the number of the audio mark frames matched with the music reach the preset number matching degree is judged through the following formula:
where N is the number of audio marker frames, M is the number of video marker frames containing characteristic actions, ThIs the degree of number matching.
Drawings
One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.
Fig. 1 is a flow chart of a video processing method according to a first embodiment of the present invention;
fig. 2 is a schematic diagram of a method for selecting a score by a user in a video processing method according to a first embodiment of the present invention;
fig. 3 is a flow chart of a video processing method according to a second embodiment of the present invention;
FIG. 4 is a schematic diagram of a video tag frame in a video processing method according to a second embodiment of the present invention
Fig. 5 is a schematic diagram of an audio marker frame in a video processing method according to a second embodiment of the present invention;
FIG. 6 is a diagram illustrating the matching result between a video frame and an audio frame in a video processing method according to a second embodiment of the present invention;
fig. 7 is a flowchart of a video processing method according to a third embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to a fourth embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
A first embodiment of the present invention relates to a video processing method, which is applied to electronic devices, such as a user mobile phone and a tablet computer, and is not described herein again. In this embodiment, taking application to a mobile phone of a user as an example, after the mobile phone obtains a video to be processed, the mobile phone performs image analysis on the video to obtain characteristic actions in the video, obtains music beats of each piece of music, matches the music beats with the video characteristic action beats, determines music matching according to the matching degree, and performs synthesis of the video and the music.
The following describes implementation details of a video processing method according to this embodiment in detail, and the following is only provided for easy understanding and is not necessary for implementing this embodiment.
A specific flowchart of a video processing method according to this embodiment is shown in fig. 1, and specifically includes the following steps:
step 101, acquiring a video to be processed; when the short video needs to be made, the mobile phone calls the application to shoot a section of short video, or directly reads a section of shot short video stored in the storage space of the mobile phone as the video to be processed.
And 102, carrying out image analysis on the video to obtain characteristic actions and obtain the music beats of the music.
Specifically, after the mobile phone obtains video data to be processed, the mobile phone performs image analysis on the video, and finds characteristic actions specially made in the video for stepping on a beat or generating a change in the beat, wherein the characteristic actions include actions of a character in the video such as clapping a hand, stomping a foot, pausing a dance, ringing a finger, jumping and the like, or any combination of the actions. And meanwhile, the mobile phone analyzes the audio frequency of all the music in the music library to obtain the music beat corresponding to each piece of music and stores the music beat information of each piece of music for use when other videos are matched.
And 103, determining the music according to the matching degree of the music beat and the characteristic action beat.
Specifically, after the music beats and the characteristic actions are obtained, the mobile phone matches the music beats of each music with the beats of the characteristic actions in the video, the matching degree of each music and the video is judged according to the number of corresponding music beats and the characteristic action beats of the video, the more corresponding beats, the higher the matching degree is, and the music to be synthesized with the video is determined according to the matching degree. In practical application, when the video has a plurality of music with high matching degree, the user can select a specific score, the specific selection method is schematically shown as fig. 2, and the system selects the music with the best matching degree as the score and the video to be synthesized according to the sequence of the music beat of each music and the beat matching degree of the video characteristic action. Or the system outputs the music to the user for trial listening according to the sequence of the matching degree of the music beats of the music and the beats of the video characteristic actions, and the user decides which music is finally adopted as the dubbing music and the video to be synthesized according to the user's will.
And 104, synthesizing the score and the video, and directly synthesizing the determined score and the video on a time axis to manufacture a short video.
Therefore, the embodiment provides an automatic point-stepping video processing method, which comprises the steps of acquiring a video to be processed, analyzing the video to obtain characteristic actions in the video, acquiring music beats of each music, matching the music beats with the video characteristic actions, obtaining music with higher matching degree with the video characteristic actions according to the matching degree or the selection of a user as dubbing music, and finally synthesizing a short video.
A second embodiment of the present invention relates to a video processing method, which is applied to electronic devices, such as a user mobile phone and a tablet computer, and is not repeated herein. In this embodiment, the music and the video are analyzed and processed separately, the audio frames corresponding to the music beat and the video frames corresponding to the characteristic motion in the video are marked, the matching degree of the music and the video is determined according to the similarity corresponding to the distance between each adjacent marked frame, the score of the video is determined according to the matching degree of each music and the video, and the video and the score are synthesized.
The following describes implementation details of a video processing method according to this embodiment in detail, and the following is only provided for easy understanding and is not necessary for implementing this embodiment.
A specific flowchart of a video processing method according to this embodiment is shown in fig. 3, and specifically includes the following steps:
step 301, acquiring a video to be processed; when the short video needs to be made, the mobile phone calls the application to shoot a section of short video, or directly reads a section of shot short video stored in the storage space of the mobile phone as the video to be processed.
Step 302, performing image analysis on the video, and marking the video frames containing the characteristic actions.
Specifically, the mobile phone performs image analysis on the video to be processed, and marks the video frame corresponding to the characteristic action strongly related to the stepping point in the video to obtain a marked video frame. The characteristic motion can be a motion of clapping hands, stomping feet, dance pause, making a finger, jumping and the like of a character in the video or a combination of any motions, and the schematic diagram of the obtained video frame marker is shown in fig. 4.
Step 303, acquiring a music beat, and marking the position of the beat on the audio track to obtain an audio mark frame.
Specifically, while the video is subjected to image analysis, the mobile phone analyzes the audio data of each music, and marks the corresponding position of the beat point in the music, which meets the preset rule, on the audio track to obtain an audio mark frame. The preset rule is used for setting the strength value of the selection beat. A specific method of marking the beat point at the position of the audio track is described here as a specific example.
Firstly, the audio frequency is analyzed by adopting a fast Fourier transform method to obtain the frequency spectrum of the music. Carrying out differential processing on the frequency spectrum obtained by analysis, namely subtracting the audio frequency at the current time from the audio frequency at the previous time, and calculating the audio frequency changes at the front time and the rear time so as to obtain the preliminary fluctuation condition of the audio frequency; and then, calculating the fluctuation mean value of the audio within a certain time interval by adopting a moving mean line method, and comparing the fluctuation value of the audio after the difference with the mean value to obtain more obvious data difference values, wherein points with the values larger than 0 are corresponding beat points. At the moment, the system performs dotting marking on the detected beat which accords with the preset rule of the system so as to be matched with the video subsequently. For example, if the system sets the filtering value of the beat to 3, the corresponding position of the beat with the value greater than or equal to 3 on the audio track is marked. The schematic diagram of the obtained audio mark frame is shown in fig. 5, and this embodiment provides a specific method for analyzing and determining a music beat and a method for marking a beat point, but in practical application, the method may be implemented by increasing or decreasing the method or changing a system setting value, and this embodiment is not limited.
Step 304, determining whether the number of video markup frames is greater than a preset number, if so, entering step 306, and if not, entering step 305.
Specifically, before matching of the audio marked frames and the video marked frames, the mobile phone checks the number of the video marked frames, and judges whether the number of the video marked frames is greater than a preset number, wherein the preset number can be changed according to requirements, in one example, the preset number is set to be 10, when the number of the video marked frames does not reach 10, the video marked frames need to be supplemented in order to ensure a matching effect when music matching is performed, and when the number of the video marked frames exceeds 10, the selection of the music is directly entered. It will be understood by those skilled in the art that another way of marking video frames is described here as an example, and the set parameters and calculation method may be changed as required in practical applications.
Step 305, mark key frames supplementally.
Specifically, when the number of video marker frames is lower than the preset number, the similarity of each adjacent key frame of the video is calculated by a cosine similarity algorithm based on an image analysis mobile phone, and when the similarity difference of the key frames is large, namely the similarity of the key frames does not exceed a system preset rule, the two key frames are considered as a picture switching part which can be used as a dotting position to supplement the video marker frames. In one example, the system is set to preset the similarity to be 1 or more, the image vector of a certain frame of the video is x (1,2,4,0,6), the image vector of the next frame is y (3,1,5,2,1), and the cosine calculation formula is substituted:
where cos (θ) represents the degree of similarity, xi,yiThe parameters of each vector are respectively expressed, the specific parameters of the two vectors are brought into the formula, the cosine similarity is calculated, the value of the similarity obtained through calculation is 0.74, the distance from 1 is larger, namely, the similarity between the video frames corresponding to the two vectors is smaller than a preset rule, the difference between the pictures and the content of the two video frames is larger, and the picture is the place for picture switching. And then, selecting one video frame from the two corresponding video frames to be marked as a supplemented video marking frame, after the supplemented marking is finished, continuing to analyze the images of the other video segments by the mobile phone, selecting a proper video frame to be marked until the number of the video marking frames exceeds the preset number, and then entering the selection of the score.
And step 306, determining the score according to the similarity of the distance between the adjacent audio marks and the distance between the adjacent video marks.
Specifically, the mobile phone calculates the similarity between the distance between each adjacent audio mark frame in the music and the distance between each adjacent video mark frame in the video; and taking the music with the calculated similarity larger than a preset threshold value as the music matched with the beats of the characteristic actions in the video.
In one example, assuming that the distance between each adjacent marker in the track is X (a1, b1, c1, …, n1), and the distance between each adjacent marker in the video is Y (a2, b2, c2, d2, …, m2), the similarity between the two is calculated by using the pearson correlation coefficient, and the calculation formula is as follows:
Cov(x,y)=E((x-E(x))(y-E(y)))
the above formula can be understood as calculating the quotient of the covariance and the standard deviation between two variables, namely:
where ρ isx,yRepresenting the calculated similarity, X being the distance between the marks in the audio track, Y being the distance between the marks in the video,respectively asking the average value of the two distances, wherein the preset threshold is the rho calculated in the formulax,yRepresents the similarity between the two variables. In one example, the preset threshold is set to be 30%, when the calculated value of the similarity between the distance between the audio marks and the distance between the video marks is greater than 30%, the music is considered to be matched with the video, and the music can be used as the score of the video, when the calculated result of the similarity of all music is not greater than 30% and the calculated values of the similarity of a plurality of pieces of music are greater than 30%, the mobile phone outputs all pieces of music according to the sequence of the similarity for the user to select, the user determines which piece of music is finally used as the score of the video, and the score is directly determined according to the matching degree or the user selects the score finally synthesized with the video, so that the satisfaction degree of the user on the score is improved.
And 307, synthesizing the score and the video.
Specifically, the mobile phone puts the audio track corresponding to the soundtrack and the video frame corresponding to the video on the same time axis, and the specific schematic diagram is as shown in fig. 6, where some video mark frames and audio mark frames have a lower degree of coincidence on the time axis, and at this time, the video is adjusted in speed according to the relationship between the video mark frames and the audio mark frames, so that the degree of coincidence between the video mark frames and the audio mark frames is higher, and then the video is synthesized.
Therefore, the embodiment provides a video processing method, an audio frame corresponding to a music beat and a video frame corresponding to a characteristic action in a video are marked firstly, music matched with the video is determined according to the similarity of the distance between adjacent marks, when the video frame is not marked sufficiently, the video mark frames are supplemented through the similarity calculation of adjacent key frames, and before synthesis is carried out on a time axis, the speed of the video is regulated, so that the stepping quality is higher, the music and the video can be well matched through enough video mark frames, dubbing music with high matching degree with the video is selected, the adjustment of the video before synthesis improves the stepping quality and the quality of a finished product video, and the use experience of a user is further improved.
A third embodiment of the present invention relates to a video processing method, which is applied to electronic devices, such as a user mobile phone and a tablet computer, and is not repeated herein. In the present embodiment, an application to a mobile phone of a user will be described as an example. The mobile phone automatically matches suitable music for the video, checks whether the number of video mark frames and the number of audio mark frames reach the preset number matching degree before synthesis, supplements the number of video mark frames when the number of video mark frames and the number of audio mark frames do not reach the preset number matching degree, then matches the video and the audio on a time axis, adjusts the video corresponding to the video segment of the video frame with the matching degree lower than the preset threshold according to the video type, so that the matching degree meets the requirement of the preset threshold, synthesizes the final video, determines that the method for dubbing music is approximately similar to the second embodiment, and mainly refines the adjustment before synthesis.
The following describes implementation details of a video processing method according to this embodiment in detail, and the following is only provided for easy understanding and is not necessary for implementing this embodiment.
A specific flowchart of a video processing method according to this embodiment is shown in fig. 7.
Step 701, acquiring a video to be processed; when the short video needs to be made, the mobile phone calls the application to shoot a section of short video, or directly reads a section of shot short video stored in the storage space of the mobile phone as the video to be processed.
Step 702, performing image analysis on the video, and marking the video frame containing the characteristic action.
Step 703, acquiring the music beat, and marking the position of the beat on the audio track to obtain an audio mark frame.
Step 704, determining whether the number of video markup frames is greater than a preset number, if so, entering step 706, and if not, entering step 705.
Step 705, mark key frames additionally.
In step 706, the score is determined based on the similarity between the distance between adjacent audio markers and the distance between adjacent video markers.
Steps 701 to 706 in this embodiment are similar to steps 301 to 306 in the second embodiment, and are not described herein again, and specific differences will be described below.
Step 707, determining whether the number of video mark frames and the number of audio mark frames of the determined score reach a preset matching degree, if so, entering step 709, and if not, entering step 708.
Specifically, before the music and the video are synthesized, the mobile phone checks the number of the video mark frames again, and determines whether the number of the video mark frames and the number of the audio mark frames reach the preset matching degree, that is, whether the number of the video mark frames and the number of the audio mark frames reach the ratio of the preset requirement, and whether the ratio of the preset requirement reaches the preset matching degree is checked through the following formula:
where N is the number of audio marker frames, M is the number of video marker frames containing characteristic actions, ThThe matching degree is a preset number.
In one example, the preset matching degree is set to 70%, that is, whether the number of the video mark frames reaches 70% of the number of the audio mark frames is judged, when the number of the video mark frames reaches 70% of the number of the audio mark frames, the video is directly adjusted before being synthesized, when the number of the video mark frames does not reach 70% of the number of the audio mark frames, the mark frames are supplemented first, and after the number of the video mark frames reaches 70% of the number of the audio mark frames, the video is adjusted before being synthesized. Here, the preset matching degree is merely an example for convenience of description, and the preset matching degree may be automatically adjusted, and the present embodiment is not limited.
Step 708, supplementing the markup key frame, and the method for supplementing the markup frame in step 708 is similar to the method in step 705 in this embodiment, and will not be described herein again. The videomark frames are supplemented so that the number of videomark frames meets the requirements.
And 709, positioning the video frame with the inter-frame matching degree lower than a preset threshold, and adjusting the video frame according to the video scene.
Specifically, after the number of the video mark frames reaches 70% of the number of the audio mark frames, the mobile phone places the video and the score on the same time axis for synthesis, before synthesis, whether the matching degree between the mark frames meets a preset threshold is checked, the specific position of a target video mark frame with the matching degree lower than the preset threshold with the audio mark frame and the video segment to which the video mark frame belongs are located, the specific type of the video is judged, and the video is adjusted by adopting a corresponding adjusting method according to the type of the video, so that the matching degree meets the preset threshold. In one example, the error of a preset threshold audio mark frame and a preset threshold video mark frame on a time axis is set to be not more than 0.1 second, when the mobile phone judges that the error of the video mark frame and the audio mark frame on the time axis is more than 0.1 second, the specific position of the video mark frame and the video segment to which the video mark frame belongs are located, the specific type of the video is judged, the speed of the video is regulated according to the video type, the video frame is inserted into the video segment to which the target video mark frame belongs, and the video frame is deleted from the video segment to which the target video mark frame belongs. In practical applications, these methods can be combined according to specific situations. For example, the video is firstly regulated, and some video frames with the matching degree not meeting the requirement still exist after the speed regulation, and considering the reason of causing the non-matching, the key frame deletion can be firstly carried out on the position where the video mark frame exceeds the audio mark frame in the video, the key frame insertion is carried out on the position where the other video mark frame lags the audio mark frame, and the matching degree of the final video mark frame and the audio mark frame meets the requirement by using a method of combining the key frame insertion and the key frame deletion on different positions in a section of the video.
In one example, after the video is analyzed, when the content in the video is judged to be dance content, the position of the video mark frame is changed in a mode of slowing down or speeding up the highlight video frame, so that the video frame at the highlight moment can just correspond to the audio mark frame, and the combination of the picture at the highlight moment and the music beat is better; when the detected video content is in a landscape type, changing the position of a video mark frame in a mode of slowing down a transition picture frame, and lengthening the switching process in a slow-down mode to ensure that the video mark frame of the just switched picture is matched with an audio mark frame corresponding to a music beat; when the detected video content is of a pet type, changing the position of a video mark frame in a mode of repeatedly playing highlight video frames, and circulating highlight moments of the pet until the video mark frame corresponding to video switching is matched with an audio mark frame corresponding to music beat; when the detected video content is a scenario, the position of the video mark frame is changed by inserting the key frame or deleting the key frame according to the position relation of the video mark frame and the audio mark frame, some switched video frames can be deleted at some common time picture switching positions in the scenario, and meanwhile, the video frames can be repeatedly inserted at important switching positions and highlight moments to be played circularly, so that the video mark frame and the audio mark frame are matched. After corresponding adjustment, the marked frames of the music and the video are more matched, and the corresponding stepping and point-adding music matching effect is better.
Step 710, synthesizing the audio and the video; after the matching degree between the mark frames is corrected, the video and the score are directly combined into a short video on the time axis.
In addition, when the video is adjusted, in order to achieve the overall effect of the finally synthesized video, some switching special effects can be added to the key frames of the video while the key frames are adjusted, so that the switched pictures are more attractive, or beautifying means such as filters, stickers and the like are added to the video frames corresponding to highlight moments in various types of videos, so that the video effect of the finished product is better. It can be understood by those skilled in the art that the method for adjusting the video according to the video type is described as an example, and the set video type, the preset threshold and the corresponding adjusting method can be changed according to the requirement in practical application.
It is obvious that the above embodiments can be used in cooperation with each other, the details of the related technologies in the embodiments can be cited to each other, and during the process of matching score and adjusting video, each preset value can be adjusted according to the requirements, the steps of the above methods are divided for clarity of description, and the steps can be combined into one step or split into a plurality of steps during implementation, and are all within the protection scope of the patent as long as the same logical relationship is included; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A fourth embodiment of the invention relates to an electronic device, as shown in fig. 8, comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method embodiments of video processing described above.
Where the memory and processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting together one or more of the various circuits of the processor and the memory. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.
A fifth embodiment of the invention relates to a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (10)

1. A video processing method, comprising:
analyzing images of a video to be processed to obtain characteristic actions in the video, and acquiring music beats of music;
matching the beats of the characteristic actions in the video with the music beats of the music respectively to obtain the music matched with the beats of the characteristic actions in the video;
and taking the matched music as the score of the video, and synthesizing the music with the video.
2. The video processing method according to claim 1, wherein the performing image analysis on the video to be processed to obtain each feature action in the video specifically comprises:
marking video frames containing the characteristic actions in the video to obtain video marked frames;
the acquiring of the music beat of each music comprises:
marking the corresponding position of the beat point in the music, which accords with the preset rule, on the audio track to obtain an audio marking frame;
the matching of the beat of each characteristic action in the video with the music beat of each music respectively comprises:
calculating the similarity between the distance between each adjacent audio mark frame in the music and the distance between each adjacent video mark frame in the video;
and taking the music with the calculated similarity larger than a preset threshold value as music matched with the beats of the characteristic actions in the video.
3. The video processing method according to claim 2, wherein before said matching the tempo of each characteristic action in the video with the music tempo of each music respectively, further comprising:
if the video marking frames are less than the preset number, calculating the picture similarity of two adjacent key frames in the video, and marking any key frame of the two key frames with the picture similarity less than the preset value to obtain a supplemented video marking frame.
4. The video processing method according to claim 2 or 3, wherein said synthesizing the matched music with the video as a soundtrack of the video comprises:
matching the video marking frame of the video with the audio marking frame of the matched music on a time axis, and positioning a target video marking frame with the matching degree of the video marking frame and the audio marking frame lower than a preset threshold;
adjusting the video according to the target video marking frame, wherein the matching degree of the adjusted target video marking frame and the audio marking frame reaches the preset threshold;
wherein the adjustment comprises one or any combination of the following:
and regulating the speed of partial video frames of the video, inserting the video frames into the video clip to which the target video mark frame belongs, and deleting the video frames from the video clip to which the target video mark frame belongs.
5. The video processing method of claim 4, wherein said adjusting the video clip according to the target video markup frame comprises:
acquiring the video type of the video;
determining an adjusting mode corresponding to the video type according to the video type;
and adjusting the video in the determined adjustment mode.
6. The video processing method according to claim 5, wherein the video types include a dance type, a drama type, a landscape type, and a pet type;
the adjustment mode corresponding to the dance type comprises the following steps: according to the position of the target video marking frame on a time axis, regulating the speed of a video frame corresponding to the highlight moment in the video;
the adjustment mode corresponding to the plot type comprises the following steps: according to the position of the target video marking frame on a time axis, selecting to insert or delete a video frame in a video clip to which the target video marking frame belongs;
the adjustment mode corresponding to the landscape type comprises the following steps: according to the position of the target video marking frame on a time axis, the speed of the video frame of the scene switching in the video is regulated;
the adjustment mode corresponding to the pet type comprises the following steps: and repeatedly playing the video frames corresponding to the highlight moments in the video according to the positions of the target video marking frames on the time axis.
7. The video processing method according to claim 5, wherein before synthesizing the matched music with the video as a soundtrack of the video, further comprising:
judging whether the number of the video marking frames and the number of the audio marking frames of the matched music reach a preset number matching degree or not;
if the number matching degree is met, executing the matching music as the score of the video, and synthesizing the score with the video; if the number matching degree is not satisfied, calculating the picture similarity of two adjacent key frames in the video, and marking any key frame of the two key frames with the picture similarity smaller than a preset value to obtain a supplemented video marking frame;
the videomark frames matched on the time axis comprise the supplementary videomark frames.
8. The video processing method according to claim 7, wherein the number of the video tag frames and the number of the audio tag frames of the matched music are determined to match a preset number by the following formula:
wherein N is the number of audio marker frames, M is the number of video marker frames containing the characteristic action, and ThThe number matching degree is used.
9. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video processing method of any of claims 1 to 8.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a video processing method according to any one of claims 1 to 8.
CN201910733269.3A 2019-08-09 2019-08-09 Video processing method, electronic equipment and storage medium Pending CN110602550A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910733269.3A CN110602550A (en) 2019-08-09 2019-08-09 Video processing method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910733269.3A CN110602550A (en) 2019-08-09 2019-08-09 Video processing method, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110602550A true CN110602550A (en) 2019-12-20

Family

ID=68853799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910733269.3A Pending CN110602550A (en) 2019-08-09 2019-08-09 Video processing method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110602550A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339865A (en) * 2020-02-17 2020-06-26 杭州慧川智能科技有限公司 Method for synthesizing video MV (music video) by music based on self-supervision learning
CN111597984A (en) * 2020-05-15 2020-08-28 北京百度网讯科技有限公司 Sticker testing method, device, electronic equipment and computer readable storage medium
CN111782858A (en) * 2020-03-31 2020-10-16 北京沃东天骏信息技术有限公司 Music matching method and device
CN111800650A (en) * 2020-06-05 2020-10-20 腾讯科技(深圳)有限公司 Video dubbing method and device, electronic equipment and computer readable medium
CN111901626A (en) * 2020-08-05 2020-11-06 腾讯科技(深圳)有限公司 Background audio determining method, video editing method, device and computer equipment
CN112291612A (en) * 2020-10-12 2021-01-29 北京沃东天骏信息技术有限公司 Video and audio matching method and device, storage medium and electronic equipment
CN112468741A (en) * 2020-11-13 2021-03-09 咪咕文化科技有限公司 Video generation method, electronic device and storage medium
CN112839257A (en) * 2020-12-31 2021-05-25 四川金熊猫新媒体有限公司 Video content detection method, device, server and storage medium
CN113099297A (en) * 2021-03-24 2021-07-09 北京达佳互联信息技术有限公司 Method and device for generating click video, electronic equipment and storage medium
CN113365147A (en) * 2021-08-11 2021-09-07 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium based on music card point
CN113438547A (en) * 2021-05-28 2021-09-24 北京达佳互联信息技术有限公司 Music generation method and device, electronic equipment and storage medium
CN113473177A (en) * 2021-05-27 2021-10-01 北京达佳互联信息技术有限公司 Music recommendation method and device, electronic equipment and computer readable storage medium
CN113473201A (en) * 2021-07-29 2021-10-01 腾讯音乐娱乐科技(深圳)有限公司 Audio and video alignment method, device, equipment and storage medium
WO2021218311A1 (en) * 2020-04-28 2021-11-04 Oppo广东移动通信有限公司 Video playing progress control method and apparatus, and storage medium and electronic device
WO2023001115A1 (en) * 2021-07-23 2023-01-26 花瓣云科技有限公司 Video generation method, electronic device and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107124624A (en) * 2017-04-21 2017-09-01 腾讯科技(深圳)有限公司 The method and apparatus of video data generation
CN107360383A (en) * 2017-07-26 2017-11-17 北京百思科技有限公司 A kind of method and system for automatically generating video
CN107393569A (en) * 2017-08-16 2017-11-24 成都品果科技有限公司 Audio frequency and video clipping method and device
US20180018948A1 (en) * 2015-09-29 2018-01-18 Amper Music, Inc. System for embedding electronic messages and documents with automatically-composed music user-specified by emotion and style descriptors
CN107770596A (en) * 2017-09-25 2018-03-06 北京达佳互联信息技术有限公司 A kind of special efficacy synchronous method, device and mobile terminal
CN108419035A (en) * 2018-02-28 2018-08-17 北京小米移动软件有限公司 The synthetic method and device of picture video
CN109379643A (en) * 2018-11-21 2019-02-22 北京达佳互联信息技术有限公司 Image synthesizing method, device, terminal and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018948A1 (en) * 2015-09-29 2018-01-18 Amper Music, Inc. System for embedding electronic messages and documents with automatically-composed music user-specified by emotion and style descriptors
CN107124624A (en) * 2017-04-21 2017-09-01 腾讯科技(深圳)有限公司 The method and apparatus of video data generation
CN107360383A (en) * 2017-07-26 2017-11-17 北京百思科技有限公司 A kind of method and system for automatically generating video
CN107393569A (en) * 2017-08-16 2017-11-24 成都品果科技有限公司 Audio frequency and video clipping method and device
CN107770596A (en) * 2017-09-25 2018-03-06 北京达佳互联信息技术有限公司 A kind of special efficacy synchronous method, device and mobile terminal
CN108419035A (en) * 2018-02-28 2018-08-17 北京小米移动软件有限公司 The synthetic method and device of picture video
CN109379643A (en) * 2018-11-21 2019-02-22 北京达佳互联信息技术有限公司 Image synthesizing method, device, terminal and storage medium

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339865A (en) * 2020-02-17 2020-06-26 杭州慧川智能科技有限公司 Method for synthesizing video MV (music video) by music based on self-supervision learning
CN111782858B (en) * 2020-03-31 2024-04-05 北京沃东天骏信息技术有限公司 Music matching method and device
CN111782858A (en) * 2020-03-31 2020-10-16 北京沃东天骏信息技术有限公司 Music matching method and device
WO2021218311A1 (en) * 2020-04-28 2021-11-04 Oppo广东移动通信有限公司 Video playing progress control method and apparatus, and storage medium and electronic device
CN111597984A (en) * 2020-05-15 2020-08-28 北京百度网讯科技有限公司 Sticker testing method, device, electronic equipment and computer readable storage medium
CN111597984B (en) * 2020-05-15 2023-09-26 北京百度网讯科技有限公司 Label paper testing method, device, electronic equipment and computer readable storage medium
CN111800650A (en) * 2020-06-05 2020-10-20 腾讯科技(深圳)有限公司 Video dubbing method and device, electronic equipment and computer readable medium
CN111901626A (en) * 2020-08-05 2020-11-06 腾讯科技(深圳)有限公司 Background audio determining method, video editing method, device and computer equipment
CN111901626B (en) * 2020-08-05 2021-12-14 腾讯科技(深圳)有限公司 Background audio determining method, video editing method, device and computer equipment
CN112291612A (en) * 2020-10-12 2021-01-29 北京沃东天骏信息技术有限公司 Video and audio matching method and device, storage medium and electronic equipment
CN112468741A (en) * 2020-11-13 2021-03-09 咪咕文化科技有限公司 Video generation method, electronic device and storage medium
CN112839257A (en) * 2020-12-31 2021-05-25 四川金熊猫新媒体有限公司 Video content detection method, device, server and storage medium
CN112839257B (en) * 2020-12-31 2023-05-09 四川金熊猫新媒体有限公司 Video content detection method, device, server and storage medium
CN113099297B (en) * 2021-03-24 2022-09-30 北京达佳互联信息技术有限公司 Method and device for generating click video, electronic equipment and storage medium
WO2022198934A1 (en) * 2021-03-24 2022-09-29 北京达佳互联信息技术有限公司 Method and apparatus for generating video synchronized to beat of music
CN113099297A (en) * 2021-03-24 2021-07-09 北京达佳互联信息技术有限公司 Method and device for generating click video, electronic equipment and storage medium
CN113473177A (en) * 2021-05-27 2021-10-01 北京达佳互联信息技术有限公司 Music recommendation method and device, electronic equipment and computer readable storage medium
CN113473177B (en) * 2021-05-27 2023-10-31 北京达佳互联信息技术有限公司 Music recommendation method, device, electronic equipment and computer readable storage medium
CN113438547B (en) * 2021-05-28 2022-03-25 北京达佳互联信息技术有限公司 Music generation method and device, electronic equipment and storage medium
CN113438547A (en) * 2021-05-28 2021-09-24 北京达佳互联信息技术有限公司 Music generation method and device, electronic equipment and storage medium
WO2023001115A1 (en) * 2021-07-23 2023-01-26 花瓣云科技有限公司 Video generation method, electronic device and medium
CN115695899A (en) * 2021-07-23 2023-02-03 花瓣云科技有限公司 Video generation method, electronic device and medium thereof
CN113473201A (en) * 2021-07-29 2021-10-01 腾讯音乐娱乐科技(深圳)有限公司 Audio and video alignment method, device, equipment and storage medium
CN113365147B (en) * 2021-08-11 2021-11-19 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium based on music card point
CN113365147A (en) * 2021-08-11 2021-09-07 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium based on music card point

Similar Documents

Publication Publication Date Title
CN110602550A (en) Video processing method, electronic equipment and storage medium
US10452919B2 (en) Detecting segments of a video program through image comparisons
US8103107B2 (en) Video-attribute-information output apparatus, video digest forming apparatus, computer program product, and video-attribute-information output method
US8558952B2 (en) Image-sound segment corresponding apparatus, method and program
US8121462B2 (en) Video edition device and method
US9208227B2 (en) Electronic apparatus, reproduction control system, reproduction control method, and program therefor
US20020051081A1 (en) Special reproduction control information describing method, special reproduction control information creating apparatus and method therefor, and video reproduction apparatus and method therefor
US8503770B2 (en) Information processing apparatus and method, and program
EP1067800A1 (en) Signal processing method and video/voice processing device
US10178365B1 (en) System and method for combining audio tracks with video files
US20090259653A1 (en) Information processing apparatus, method, and program
JP6557592B2 (en) Video scene division apparatus and video scene division program
CN110866563B (en) Similar video detection and recommendation method, electronic device and storage medium
CN101807393A (en) KTV system, implement method thereof and TV set
CN112822546A (en) Content-aware-based double-speed playing method, system, storage medium and device
CN110933459B (en) Event video clipping method, device, server and readable storage medium
CN112468741A (en) Video generation method, electronic device and storage medium
US20130218570A1 (en) Apparatus and method for correcting speech, and non-transitory computer readable medium thereof
US20150078729A1 (en) Synchronizing videos with frame-based metadata using video content
CN112312142B (en) Video playing control method and device and computer readable storage medium
KR102037997B1 (en) Electronic apparatus and method for generating contents
US20240033626A1 (en) Game plot interaction method, apparatus, and system
CN111681678A (en) Method, system, device and storage medium for automatically generating sound effect and matching video
US20230069920A1 (en) Estimation device, estimation method, and estimation system
US20100079673A1 (en) Video processing apparatus and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220

RJ01 Rejection of invention patent application after publication