CN110781835A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110781835A
CN110781835A CN201911029239.0A CN201911029239A CN110781835A CN 110781835 A CN110781835 A CN 110781835A CN 201911029239 A CN201911029239 A CN 201911029239A CN 110781835 A CN110781835 A CN 110781835A
Authority
CN
China
Prior art keywords
vector
key frame
feature vector
vectors
sorted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911029239.0A
Other languages
Chinese (zh)
Other versions
CN110781835B (en
Inventor
靳聪
帖云
严文彩
李小兵
王南苏
吕欣
宋雷雨
李亚杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Communication University of China
Original Assignee
Zhengzhou University
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University, Communication University of China filed Critical Zhengzhou University
Priority to CN201911029239.0A priority Critical patent/CN110781835B/en
Publication of CN110781835A publication Critical patent/CN110781835A/en
Application granted granted Critical
Publication of CN110781835B publication Critical patent/CN110781835B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4852End-user interface for client configuration for modifying audio parameters, e.g. switching between mono and stereo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • H04N21/8113Monomedia components thereof involving special audio data, e.g. different tracks for different languages comprising music, e.g. song in MP3 format

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The application provides a data processing method, a data processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring attribute feature vectors of key frames of a target video; obtaining the feature vector of the key frame according to the attribute feature vector of the key frame; inputting the feature vector of the key frame and the vector to be ordered of the musical notes used for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes used for representing the key frame; according to the obtained all the vectors to be sequenced, the background music of the target video is obtained, and by the method, the background music of the target video can be obtained without manual participation, so that the method is beneficial to reducing the manual workload.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, an apparatus, an electronic device, and a storage medium.
Background
With the development of technology, the popularity of computer multimedia technology is higher and higher, and video production becomes a work that ordinary people can operate. People can shoot video production materials through tools such as digital video cameras, mobile phones and cameras, and then produce videos so as to record the learning, work and daily life of users.
When a high-quality video is produced, background music needs to be configured for the video after the video is produced, so that the produced video has a good scene reproduction effect when being played, and when the background music is produced, a user needs to find the background music which accords with the video from a large number of music materials, so that the manual workload is large when the background music is configured for the video.
Disclosure of Invention
In view of the above, embodiments of the present application provide a data processing method, an apparatus, an electronic device, and a storage medium, so as to reduce the workload of configuring background music for a video.
In a first aspect, an embodiment of the present application provides a data processing method, including:
acquiring attribute feature vectors of key frames of a target video;
obtaining the feature vector of the key frame according to the attribute feature vector of the key frame;
inputting the feature vector of the key frame and the vector to be ordered of the musical notes used for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes used for representing the key frame;
and obtaining the background music of the target video according to all the obtained vectors to be sequenced.
Optionally, the attribute feature vector of the key frame includes:
the dynamic feature vector of the key frame, the static feature vector of the key frame, and/or the optical flow feature vector of the key frame.
Optionally, the obtaining the feature vector of the key frame according to the attribute feature vector of the key frame includes:
and according to the attribute feature vector of the key frame, carrying out vector splicing processing through a full connection layer to obtain the feature vector of the key frame.
Optionally, the obtaining the background music of the target video according to all the obtained vectors to be sorted includes:
judging the vector to be sorted by using a preset target note vector set so as to determine whether the vector to be sorted meets a preset requirement;
and sequencing all vectors to be sequenced which meet the preset requirement according to a preset note arrangement rule so as to take a sequencing result as the background music of the target video.
Optionally, the determining, by using a preset target note vector set, the vector to be sorted to determine whether the vector to be sorted meets a preset requirement includes:
carrying out mean square error operation on the vector to be ordered and a preset target note vector set to obtain a loss function value of the vector to be ordered relative to the target note vector;
when the loss function value is within a preset range, determining that the vector to be ordered meets the preset requirement;
and when the loss function value is not in the preset range, determining that the vector to be sorted does not meet the preset requirement.
Optionally, the method further comprises:
and taking all vectors to be ordered which do not meet the preset requirement as training samples to train the decoding model.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including:
the acquiring unit is used for acquiring the attribute feature vector of the key frame of the target video;
the first processing unit is used for obtaining the feature vector of the key frame according to the attribute feature vector of the key frame;
the second processing unit is used for inputting the feature vector of the key frame and the vector to be ordered of the musical notes for representing the previous key frame into the decoding model as input parameters to obtain the vector to be ordered of the musical notes for representing the key frame;
and the third processing unit is used for obtaining the background music of the target video according to all the obtained vectors to be sequenced.
Optionally, the attribute feature vector of the key frame includes:
the dynamic feature vector of the key frame, the static feature vector of the key frame, and/or the optical flow feature vector of the key frame.
Optionally, when the configuration of the first processing unit is configured to obtain the feature vector of the key frame according to the attribute feature vector of the key frame, the configuration includes:
and according to the attribute feature vector of the key frame, carrying out vector splicing processing through a full connection layer to obtain the feature vector of the key frame.
Optionally, when the third processing unit is configured to obtain the background music of the target video according to all obtained vectors to be sorted, the third processing unit includes:
judging the vector to be sorted by using a preset target note vector set so as to determine whether the vector to be sorted meets a preset requirement;
and sequencing all vectors to be sequenced which meet the preset requirement according to a preset note arrangement rule so as to take a sequencing result as the background music of the target video.
Optionally, when the third processing unit is configured to determine the vector to be sorted by using a preset set of target note vectors to determine whether the vector to be sorted meets a preset requirement, the method includes:
carrying out mean square error operation on the vector to be ordered and a preset target note vector set to obtain a loss function value of the vector to be ordered relative to the target note vector;
when the loss function value is within a preset range, determining that the vector to be ordered meets the preset requirement;
and when the loss function value is not in the preset range, determining that the vector to be sorted does not meet the preset requirement.
Optionally, the data processing apparatus further includes:
and the training unit is used for training the decoding model by taking all vectors to be ordered which do not meet the preset requirement as training samples.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the data processing method according to any one of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the data processing method according to any one of the first aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in this application, when configuring background music for a target video, attribute feature vectors of key frames of the target video are obtained first, and each key frame includes different attribute feature vectors, so that the feature vectors of the key frames can be obtained according to the attribute feature vectors of the key frames, and then the feature vectors of the key frames and a to-be-ordered vector for representing notes of a previous key frame are taken as input parameters and input into a decoding model, and the feature vectors of the key frames can represent related contents represented by the key frames, for example: the method comprises the steps of obtaining a vector of a musical note, obtaining a musical note corresponding to the key frame, obtaining a background music of a target video according to the obtained vector to be sequenced after obtaining all vectors to be sequenced for representing the musical note, wherein the obtained vector is used as the vector for representing the musical note, the content expressed by the obtained musical note is matched with the key frame, the vector to be sequenced of the musical note of the previous key frame is also used as an input parameter, the purpose is to ensure that the vector of the current obtained musical note is matched with the vector of the musical note of the previous key frame, so that the musical notes corresponding to two adjacent key frames are matched, and the music is formed by a plurality of musical notes, the background music obtained by utilizing the vectors to be sequenced is also matched with the target video, and the background music of the target video can be obtained without manual participation through the method, thus facilitating a reduction in manual effort.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another data processing method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data processing apparatus according to a second embodiment of the present application;
fig. 4 is a schematic structural diagram of a data processing apparatus according to a second embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
Example one
Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application, as shown in fig. 1, the data processing method includes the following steps:
step 101, obtaining attribute feature vectors of key frames of a target video.
Specifically, the video comprises a plurality of key frames, each key frame comprises a plurality of attributes, such as objects, people, actions, scenes, human-object relationships and the like, the attributes are combined together to form all elements of one key frame, and after the attribute feature vectors of the key frames are obtained, quantized data of the key frames can be obtained, so that data support is provided for subsequent processing.
It should be noted that, the specific attribute feature vector may be set according to actual needs, and is not specifically limited herein.
And 102, obtaining the feature vector of the key frame according to the attribute feature vector of the key frame.
Specifically, since the attribute vectors of the key frames are combined together to represent a complete key frame, and when configuring the background music for the target video, consideration needs to be given to the whole target video, the attribute feature vectors of the key frames need to be used to obtain the feature vectors of the key frames, and since the feature vectors of the key frames can represent the content in the target video as a whole, the feature vectors of the key frames can provide a reference basis from a whole angle when configuring the background music for the target video.
It should be noted that, the specific implementation manner of obtaining the feature vector of the key frame according to the attribute feature vector may be set according to actual needs, and is not specifically limited herein.
And 103, inputting the feature vector of the key frame and the vector to be ordered of the musical notes for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes for representing the key frame.
Specifically, after obtaining the feature vector of the key frame, the purpose of analyzing the whole target video can be achieved, and in order to analyze the contents such as the person, the motion, the scene, the person-object relationship, and the like from the whole target video, it is necessary to input the feature vector of the key frame into the decoding model so as to obtain the vector related to the parameters, that is: after the feature vectors of the key frames are input into the decoding model, content vectors representing different characters, actions, scenes, person-to-object relationships and the like in the relevant target video can be output, and since the notes in a piece of music are all related, in order to make the notes generated by the current key frame match the notes corresponding to the previous key frame, vectors to be ordered for representing the notes of the previous key frame are also required to be input into the decoding model as input parameters, wherein the vectors to be ordered for representing the notes of the first key frame of the target video are obtained by inputting the preset note vectors and the feature vectors of the first key frame into the decoding model as input parameters, and since the output vectors are closely related to the content expressed by the target video and the obtained notes are related, the output vectors are used as the vectors to be ordered for representing the notes, the content of the musical notes corresponding to the vector to be sorted and the content of the target video are closely related, and all the obtained musical notes are also related, so that background music with high matching degree with the target video can be configured by using the musical notes corresponding to the vector to be sorted.
It should be noted that, which decoding model is specifically used may be set according to actual needs, and is not specifically limited herein.
And step 104, obtaining the background music of the target video according to all the obtained vectors to be sequenced.
Specifically, since the music is formed by combining a plurality of notes according to a certain rule, after all vectors to be sorted are obtained, the notes forming the background music can be obtained, and therefore the background music of the target video can be obtained according to the vectors to be sorted.
It should be noted that, the specific method for obtaining the background music according to all the vectors to be sorted can be set according to actual needs, for example: the method may obtain the corresponding musical notes by using the vector to be sorted, and then combine the musical notes together according to a certain rule to obtain the background music, or perform sorting on the vector to be sorted, and then sort the musical notes with the sequence corresponding to the vector to be sorted, so as to use the musical notes with the sequence as the background music, and the specific implementation manner is not specifically limited herein.
In the method, the obtained musical notes are matched with the key frames, so that the background music obtained by using the vectors to be sequenced is also matched with the target video.
In one possible embodiment, the attribute feature vector of the key frame includes:
the dynamic feature vector of the key frame, the static feature vector of the key frame, and/or the optical flow feature vector of the key frame.
Specifically, the motion feature vector of the key frame, the static feature vector of the key frame, and the optical flow feature vector of the key frame may quantitatively describe the relationship between the object, person, motion, scene, and person and object in the key frame, so after obtaining the above vectors, the feature vectors of the key frame may be obtained through the above vectors.
It should be noted that which one or more attribute feature vectors are specifically used may be set according to actual needs, and is not specifically limited herein.
In a possible embodiment, in step 102, a feature vector of the key frame may be obtained by performing a vector stitching process through the full-link layer according to the attribute feature vector of the key frame.
It should be noted that, what kind of fully connected layer is specifically used to perform the splicing processing on the attribute feature vector may be set according to actual needs, and is not specifically limited herein.
In a possible implementation, fig. 2 is a schematic flow chart of another data processing method provided in the first embodiment of the present application, and as shown in fig. 2, when step 104 is executed, the following steps may be implemented:
step 201, judging the vector to be sorted by using a preset target note vector set to determine whether the vector to be sorted meets a preset requirement.
And step 202, sequencing all vectors to be sequenced which meet the preset requirement according to a preset note arrangement rule, and taking a sequencing result as background music of the target video.
Specifically, the preset target note vectors are notes meeting the requirements of the target video, when the vectors to be sorted meet the preset requirements, the vectors to be sorted meet the preset requirements of the user, after all the vectors to be sorted meeting the preset requirements of the user are determined, all the vectors to be sorted meeting the requirements can be sorted according to a preset note arrangement rule, and the notes corresponding to all the vectors to be sorted meeting the requirements can be arranged according to a certain rule, so that the background music of the target video is formed.
It should be noted that, the specific preset requirement and the specific arrangement rule may be set according to actual needs, and are not specifically limited herein.
In a possible embodiment, in step 201, a mean square error operation may be performed on the vector to be sorted and a preset target note vector set to obtain a loss function value of the vector to be sorted relative to the target note vector, and when the loss function value is within a preset range, it is determined that the vector to be sorted meets a preset requirement; and when the loss function value is not in the preset range, determining that the vector to be sorted does not meet the preset requirement.
It should be noted that, a specific preset range may be set according to actual needs, and the preset range may be a numerical range, or may also be a specific numerical value, for example: if the loss function value is 0, determining that the vector to be sorted meets the preset requirement, and if the loss function value is not 0, determining that the vector to be sorted does not meet the preset requirement, wherein the specific preset range is not specifically limited herein.
In a possible embodiment, all vectors to be ordered which do not meet the preset requirement are used as training samples to train the decoding model.
Specifically, when a certain vector to be ordered does not meet the preset requirement, and the obtained vector to be ordered is not in the preset target note vector set, the accuracy of the result indicating the output of the decoding model needs to be improved, so that the vector to be ordered can be used as a training sample to train the decoding model, and the accuracy of the result output by the decoding model can be improved.
It should be noted that the specific model training mode may be set according to actual needs, and is not specifically limited herein.
Example two
Fig. 3 is a schematic structural diagram of a data processing apparatus according to a second embodiment of the present application, and as shown in fig. 3, the data processing apparatus includes:
an obtaining unit 31, configured to obtain an attribute feature vector of a key frame of a target video;
a first processing unit 32, configured to obtain a feature vector of the key frame according to the attribute feature vector of the key frame;
a second processing unit 33, configured to input the feature vector of the key frame and the to-be-sorted vector for representing the note of the previous key frame into the decoding model as input parameters, so as to obtain the to-be-sorted vector for representing the note of the key frame;
and the third processing unit 34 is configured to obtain background music of the target video according to all the obtained vectors to be sorted.
In one possible embodiment, the attribute feature vector of the key frame includes:
the dynamic feature vector of the key frame, the static feature vector of the key frame, and/or the optical flow feature vector of the key frame.
In a possible embodiment, the configuration of the first processing unit 32, when configured to obtain the feature vector of the key frame according to the attribute feature vector of the key frame, includes:
and according to the attribute feature vector of the key frame, carrying out vector splicing processing through a full connection layer to obtain the feature vector of the key frame.
In a possible embodiment, the configuration of the third processing unit 34, when configured to obtain the background music of the target video according to all the obtained vectors to be sorted, includes:
judging the vector to be sorted by using a preset target note vector set so as to determine whether the vector to be sorted meets a preset requirement;
and sequencing all vectors to be sequenced which meet the preset requirement according to a preset note arrangement rule so as to take a sequencing result as the background music of the target video.
In a possible embodiment, the configuration of the third processing unit 34, when configured to determine the vector to be sorted by using a preset set of target note vectors to determine whether the vector to be sorted meets a preset requirement, includes:
carrying out mean square error operation on the vector to be ordered and a preset target note vector set to obtain a loss function value of the vector to be ordered relative to the target note vector;
when the loss function value is within a preset range, determining that the vector to be ordered meets the preset requirement;
and when the loss function value is not in the preset range, determining that the vector to be sorted does not meet the preset requirement.
In a possible implementation, fig. 4 is a schematic structural diagram of a data processing apparatus provided in example two of the present application, and as shown in fig. 4, the data processing apparatus further includes:
and the training unit 35 is configured to train the decoding model by using all vectors to be ordered which do not meet the preset requirement as training samples.
For the principles of the second embodiment, reference may be made to the related descriptions of the first embodiment, which are not repeated herein.
In this application, when configuring background music for a target video, attribute feature vectors of key frames of the target video are obtained first, and each key frame includes different attribute feature vectors, so that the feature vectors of the key frames can be obtained according to the attribute feature vectors of the key frames, and then the feature vectors of the key frames and a to-be-ordered vector for representing notes of a previous key frame are taken as input parameters and input into a decoding model, and the feature vectors of the key frames can represent related contents represented by the key frames, for example: the method comprises the steps of obtaining a vector of a musical note, obtaining a musical note corresponding to the key frame, obtaining a background music of a target video according to the obtained vector to be sequenced after obtaining all vectors to be sequenced for representing the musical note, wherein the obtained vector is used as the vector for representing the musical note, the content expressed by the obtained musical note is matched with the key frame, the vector to be sequenced of the musical note of the previous key frame is also used as an input parameter, the purpose is to ensure that the vector of the current obtained musical note is matched with the vector of the musical note of the previous key frame, so that the musical notes corresponding to two adjacent key frames are matched, and the music is formed by a plurality of musical notes, the background music obtained by utilizing the vectors to be sequenced is also matched with the target video, and the background music of the target video can be obtained without manual participation through the method, thus facilitating a reduction in manual effort.
EXAMPLE III
Fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present application, including: a processor 501, a storage medium 502 and a bus 503, wherein the storage medium 502 stores machine-readable instructions executable by the processor 501, when the electronic device executes the data processing method, the processor 501 communicates with the storage medium 502 through the bus 503, and the processor 501 executes the machine-readable instructions to perform the following steps:
acquiring attribute feature vectors of key frames of a target video;
obtaining the feature vector of the key frame according to the attribute feature vector of the key frame;
inputting the feature vector of the key frame and the vector to be ordered of the musical notes used for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes used for representing the key frame;
and obtaining the background music of the target video according to all the obtained vectors to be sequenced.
In this embodiment of the application, the storage medium 502 may further execute other machine-readable instructions to perform other methods as described in the first embodiment, and for the method steps and principles to be specifically executed, refer to the description of the first embodiment, which is not described in detail herein.
Example four
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the following steps:
acquiring attribute feature vectors of key frames of a target video;
obtaining the feature vector of the key frame according to the attribute feature vector of the key frame;
inputting the feature vector of the key frame and the vector to be ordered of the musical notes used for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes used for representing the key frame;
and obtaining the background music of the target video according to all the obtained vectors to be sequenced.
In the embodiment of the present application, when being executed by a processor, the computer program may further execute other machine-readable instructions to perform other methods as described in the first embodiment, and for the specific method steps and principles to be performed, reference is made to the description of the first embodiment, which is not described in detail herein.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A data processing method, comprising:
acquiring attribute feature vectors of key frames of a target video;
obtaining the feature vector of the key frame according to the attribute feature vector of the key frame;
inputting the feature vector of the key frame and the vector to be ordered of the musical notes used for representing the previous key frame into a decoding model as input parameters to obtain the vector to be ordered of the musical notes used for representing the key frame;
and obtaining the background music of the target video according to all the obtained vectors to be sequenced.
2. The method of claim 1, wherein the attribute feature vector of the key frame comprises:
the dynamic feature vector of the key frame, the static feature vector of the key frame, and/or the optical flow feature vector of the key frame.
3. The method of claim 1, wherein the deriving the feature vector of the key frame according to the attribute feature vector of the key frame comprises:
and according to the attribute feature vector of the key frame, carrying out vector splicing processing through a full connection layer to obtain the feature vector of the key frame.
4. The method of claim 1, wherein the obtaining the background music of the target video according to all the obtained vectors to be sorted comprises:
judging the vector to be sorted by using a preset target note vector set so as to determine whether the vector to be sorted meets a preset requirement;
and sequencing all vectors to be sequenced which meet the preset requirement according to a preset note arrangement rule so as to take a sequencing result as the background music of the target video.
5. The method as claimed in claim 4, wherein said determining the vector to be sorted by using a predetermined set of target note vectors to determine whether the vector to be sorted meets a predetermined requirement comprises:
carrying out mean square error operation on the vector to be ordered and a preset target note vector set to obtain a loss function value of the vector to be ordered relative to the target note vector;
when the loss function value is within a preset range, determining that the vector to be ordered meets the preset requirement;
and when the loss function value is not in the preset range, determining that the vector to be sorted does not meet the preset requirement.
6. The method of claim 4, wherein the method further comprises:
and taking all vectors to be ordered which do not meet the preset requirement as training samples to train the decoding model.
7. A data processing apparatus, comprising:
the acquiring unit is used for acquiring the attribute feature vector of the key frame of the target video;
the first processing unit is used for obtaining the feature vector of the key frame according to the attribute feature vector of the key frame;
the second processing unit is used for inputting the feature vector of the key frame and the vector to be ordered of the musical notes for representing the previous key frame into the decoding model as input parameters to obtain the vector to be ordered of the musical notes for representing the key frame;
and the third processing unit is used for obtaining the background music of the target video according to all the obtained vectors to be sequenced.
8. The apparatus as claimed in claim 7, wherein the third processing unit is configured to, when obtaining the background music of the target video according to all the obtained vectors to be sorted, include:
judging the vector to be sorted by using a preset target note vector set so as to determine whether the vector to be sorted meets a preset requirement;
and sequencing all vectors to be sequenced which meet the preset requirement according to a preset note arrangement rule so as to take a sequencing result as the background music of the target video.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the data processing method according to any one of claims 1 to 6.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the data processing method according to any one of claims 1 to 6.
CN201911029239.0A 2019-10-28 2019-10-28 Data processing method and device, electronic equipment and storage medium Active CN110781835B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911029239.0A CN110781835B (en) 2019-10-28 2019-10-28 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911029239.0A CN110781835B (en) 2019-10-28 2019-10-28 Data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110781835A true CN110781835A (en) 2020-02-11
CN110781835B CN110781835B (en) 2022-08-23

Family

ID=69386876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911029239.0A Active CN110781835B (en) 2019-10-28 2019-10-28 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110781835B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112235517A (en) * 2020-09-29 2021-01-15 北京小米松果电子有限公司 Method and apparatus for adding voice-over, and storage medium
CN113923517A (en) * 2021-09-30 2022-01-11 北京搜狗科技发展有限公司 Background music generation method and device and electronic equipment
CN115052147A (en) * 2022-04-26 2022-09-13 中国传媒大学 Human body video compression method and system based on generative model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086416A (en) * 2018-08-06 2018-12-25 中国传媒大学 A kind of generation method of dubbing in background music, device and storage medium based on GAN
CN109599079A (en) * 2017-09-30 2019-04-09 腾讯科技(深圳)有限公司 A kind of generation method and device of music
CN109862393A (en) * 2019-03-20 2019-06-07 深圳前海微众银行股份有限公司 Method of dubbing in background music, system, equipment and the storage medium of video file
KR20190116199A (en) * 2018-10-29 2019-10-14 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 Video data processing method, device and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109599079A (en) * 2017-09-30 2019-04-09 腾讯科技(深圳)有限公司 A kind of generation method and device of music
CN109086416A (en) * 2018-08-06 2018-12-25 中国传媒大学 A kind of generation method of dubbing in background music, device and storage medium based on GAN
KR20190116199A (en) * 2018-10-29 2019-10-14 바이두 온라인 네트웍 테크놀러지 (베이징) 캄파니 리미티드 Video data processing method, device and readable storage medium
CN109862393A (en) * 2019-03-20 2019-06-07 深圳前海微众银行股份有限公司 Method of dubbing in background music, system, equipment and the storage medium of video file

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIPIN ZHOU ET AL: "Visual to Sound: Generating Natural Sound for Videos in the Wild", 《ARXIV:1712.01393V2》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112235517A (en) * 2020-09-29 2021-01-15 北京小米松果电子有限公司 Method and apparatus for adding voice-over, and storage medium
CN112235517B (en) * 2020-09-29 2023-09-12 北京小米松果电子有限公司 Method for adding white-matter, device for adding white-matter, and storage medium
CN113923517A (en) * 2021-09-30 2022-01-11 北京搜狗科技发展有限公司 Background music generation method and device and electronic equipment
CN113923517B (en) * 2021-09-30 2024-05-07 北京搜狗科技发展有限公司 Background music generation method and device and electronic equipment
CN115052147A (en) * 2022-04-26 2022-09-13 中国传媒大学 Human body video compression method and system based on generative model

Also Published As

Publication number Publication date
CN110781835B (en) 2022-08-23

Similar Documents

Publication Publication Date Title
KR102416558B1 (en) Video data processing method, device and readable storage medium
CN109688463B (en) Clip video generation method and device, terminal equipment and storage medium
CN110781835B (en) Data processing method and device, electronic equipment and storage medium
CN111259192B (en) Audio recommendation method and device
CN109408672B (en) Article generation method, article generation device, server and storage medium
JP7240505B2 (en) Voice packet recommendation method, device, electronic device and program
CN111460179A (en) Multimedia information display method and device, computer readable medium and terminal equipment
CN112584062B (en) Background audio construction method and device
CN109815448B (en) Slide generation method and device
CN112287168A (en) Method and apparatus for generating video
CN111191133A (en) Service search processing method, device and equipment
CN111435369B (en) Music recommendation method, device, terminal and storage medium
CN114065720A (en) Conference summary generation method and device, storage medium and electronic equipment
CN116261009B (en) Video detection method, device, equipment and medium for intelligently converting video audience
CN113411517B (en) Video template generation method and device, electronic equipment and storage medium
CN115115901A (en) Method and device for acquiring cross-domain learning model
CN114969427A (en) Singing list generation method and device, electronic equipment and storage medium
CN112449249A (en) Video stream processing method and device, electronic equipment and storage medium
CN114625922A (en) Word stock construction method and device, electronic equipment and storage medium
CN111050194A (en) Video sequence processing method, video sequence processing device, electronic equipment and computer readable storage medium
CN112764601B (en) Information display method and device and electronic equipment
Leszczuk et al. User-Generated Content (UGC)/In-The-Wild Video Content Recognition
CN113992866B (en) Video production method and device
CN114663963B (en) Image processing method, image processing device, storage medium and electronic equipment
CN117408979A (en) Image detection method, device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant