CN114339283A

CN114339283A - Media resource data processing method, device, equipment and storage medium

Info

Publication number: CN114339283A
Application number: CN202111616342.2A
Authority: CN
Inventors: 陈卓群; 周鹏; 柳思杨; 卢颖杰; 王晓瑞
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2022-04-12

Abstract

The present disclosure provides a media resource data processing method, apparatus, device and storage medium, which relate to the technical field of computers and at least solve the problem that the wonderful degree of a live broadcast room in a future period of time cannot be determined in the related art. The method comprises the following steps: the electronic equipment acquires a target media resource which is being played; and performing feature extraction on the target media resource in the current time period to obtain feature data, and determining the wonderful degree of the target media resource in the next time period according to the feature data.

Description

Media resource data processing method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of information processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing media resource data.

Background

Currently, live room recommendation is generally determined based on three types of data, namely live room status (such as whether a red packet exists or not, whether the live room is in a conflict state or not, and the like), statistical information (such as real-time online number of people, interactive amount, exit rate, and the like) and real-time negative information (such as low image quality resolution, long-time no-action, stuck and the like), but the three types of data do not relate to the live quality of the live room.

The live broadcast quality of the live broadcast room is mainly related to the wonderful degree of the live broadcast room in a future period of time, when the wonderful degree of the live broadcast room in the future period of time is higher, the live broadcast quality of the live broadcast room is higher, the number of users is higher, the residence time of the users is longer, and higher live broadcast consumption conversion rate is easier to bring. Therefore, in order to increase the conversion rate of live broadcast consumption, it is urgently necessary to determine the wonderful degree of a live broadcast room in a future period of time.

Disclosure of Invention

The present disclosure provides a media resource data processing method, apparatus, device, and storage medium, to at least solve the problem in the related art that the wonderful degree of a live broadcast room in a future period of time cannot be determined. The technical scheme of the disclosure is as follows:

according to a first aspect of the present disclosure, there is provided a media asset data processing method, including: the electronic equipment acquires a target media resource which is being played; and performing feature extraction on the target media resource in the current time period to obtain feature data, and determining the wonderful degree of the target media resource in the next time period according to the feature data.

Optionally, the characteristic data comprises at least one of: text feature data, image frame feature data, and audio feature data; determining the wonderful degree of the target media resource in the next time period according to the characteristic data, comprising the following steps: and determining the wonderful degree of the target media resource in the next time period according to at least one item of the text characteristic data, the image frame characteristic data and the audio characteristic data.

Optionally, the method further comprises: and recommending the target media resource to the target user when the wonderful degree of the target media resource in the next time interval meets a preset condition.

Optionally, the feature extraction is performed on the target media resource in the current time period to obtain feature data, and according to the feature data, the wonderful degree of the target media resource in the next time period is determined, including: inputting the target media resource of the current time period into a target prediction model to predict the wonderful degree, so as to obtain the wonderful degree of the target media resource in the next time period, wherein the target prediction model is used for extracting the features of the target media resource of the current time period to obtain feature data, and determining the wonderful degree of the target media resource in the next time period according to the feature data.

Optionally, the feature data obtained by the target prediction model is a feature vector; the feature vectors comprise at least two of text feature vectors, image frame feature vectors and audio feature vectors; and the wonderful degree of the target media resource in the next time period is obtained by classifying the spliced feature vectors.

Optionally, the method further comprises: the method comprises the steps of obtaining a plurality of media resource samples and each media resource sample label in the plurality of media resource samples, wherein the media resource sample label is used for representing a first wonderful degree of an associated media resource sample, a time interval corresponding to the associated media resource sample is a next time interval of the media resource sample, and the associated media resource sample and the media resource sample are from the same media resource set; inputting each media resource sample into an initial prediction model respectively to obtain a prediction label of each media resource sample; the prediction label is used for representing a second wonderful degree of the associated media resource sample of each media resource sample predicted by the initial prediction model; and training the initial prediction model according to the prediction label and the media resource sample label to obtain a target prediction model.

Optionally, the first wonderful degree is determined according to a long-play rate of a next media resource sample of the media resource samples; the long-play rate is the ratio of the number of people clicking to enter the next media resource sample at the preset time length, and the watching time length exceeds the target threshold value compared with the number of people clicking to enter the next media resource sample at the preset time length.

Optionally, training the initial prediction model according to the prediction tag and the media resource sample tag to obtain a target prediction model, including: determining a loss value based on the loss function, the prediction tag, and the media resource sample tag; and iteratively updating the parameters of the initial prediction model according to the loss value and the preset learning rate to obtain the target prediction model.

According to a second aspect of the present disclosure, there is provided a media asset data processing device comprising an acquisition unit and a processing unit. An acquisition unit configured to acquire a target media resource being played; and the processing unit is configured to extract the characteristics of the target media resource in the current time period to obtain characteristic data, and determine the wonderful degree of the target media resource in the next time period according to the characteristic data.

Optionally, the characteristic data comprises at least one of: text feature data, image frame feature data, and audio feature data; and the processing unit is also configured to determine the wonderful degree of the target media resource in the next time period according to at least one item of text characteristic data, image frame characteristic data and audio characteristic data.

Optionally, the processing unit is further configured to recommend the target media resource to the target user when the wonderful degree of the target media resource in the next time period meets a preset condition.

Optionally, the processing unit is further configured to input the target media resource in the current time period into a target prediction model to perform wonderful degree prediction, so as to obtain a wonderful degree of the target media resource in the next time period, where the target prediction model is used to perform feature extraction on the target media resource in the current time period to obtain feature data, and determine the wonderful degree of the target media resource in the next time period according to the feature data.

Optionally, the obtaining unit is further configured to obtain a plurality of media resource samples and each media resource sample label in the plurality of media resource samples, where the media resource sample label is used to represent a first highlight degree of an associated media resource sample, a time period corresponding to the associated media resource sample is a next time period of the time period in which the media resource sample is located, and the associated media resource sample and the media resource sample are derived from the same media resource set; the processing unit is also configured to input each media resource sample into the initial prediction model respectively to obtain a prediction label of each media resource sample; the prediction label is used for representing a second wonderful degree of the associated media resource sample of each media resource sample predicted by the initial prediction model; and the processing unit is also configured to train the initial prediction model according to the prediction label and the media resource sample label to obtain a target prediction model.

Optionally, the processing unit is further configured to determine a loss value based on the loss function, the prediction tag, and the media resource sample tag; and the processing unit is also configured to update the parameters of the initial prediction model in an iterative manner according to the loss value and the preset learning rate to obtain the target prediction model.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor and a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement any one of the optional media asset data processing methods as described in the first aspect above.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon instructions which, when executed by a processor of an electronic device, enable the electronic device to perform any one of the optional media asset data processing methods as in the first aspect described above.

According to a fifth aspect of the present disclosure, there is provided a computer program product containing instructions which, when executed by a processor of an electronic device, implement the optional media asset data processing method as in any one of the first aspects above.

The technical scheme provided by the embodiment of the disclosure at least has the following beneficial effects:

in the above scheme, the electronic device obtains the target media resource being played, and performs feature extraction on the target media resource in the current time period, thereby determining the wonderful degree of the target media resource in the next time period. Compared with the situation that the wonderful degree of the target media resource in the next time period cannot be predicted in the related technology, the method and the device can predict the wonderful degree of the target media resource in the next time period, and whether the target media resource is recommended by the user can be determined according to the predicted wonderful degree subsequently, so that the satisfaction degree of the user is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is one of the flow diagrams of a media asset data processing method shown in accordance with an exemplary embodiment;

FIG. 2 is a second flowchart illustrating a method of media asset data processing according to an exemplary embodiment;

FIG. 3 is a third flowchart illustrating a method of media asset data processing according to an exemplary embodiment;

FIG. 4 is a diagram illustrating a predictive application of a target predictive model in accordance with an exemplary embodiment;

FIG. 5 is a fourth flowchart illustrating a media asset data processing method according to an exemplary embodiment;

FIG. 6 is a schematic diagram illustrating the structure of an initial predictive model in accordance with an exemplary embodiment;

FIG. 7 is a fifth flowchart illustrating a method of media asset data processing, according to an exemplary embodiment;

FIG. 8 is a block diagram illustrating the structure of a media asset data processing device, according to an exemplary embodiment;

fig. 9 is a schematic structural diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The data to which the present disclosure relates may be data that is authorized by a user or sufficiently authorized by parties. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components.

Based on the background technology, the embodiment of the present disclosure provides a media resource data processing method. The method comprises the steps of obtaining a target media resource which is being played, and extracting the characteristics of the target media resource in the current time interval, so that the wonderful degree of the target media resource in the next time interval is determined.

The following is an exemplary description of a media resource data processing method provided by the embodiments of the present disclosure:

the media resource data processing method provided by the disclosure can be applied to electronic equipment.

In some embodiments, the electronic device may be a server, a terminal, or other electronic devices for performing click through rate prediction, which is not limited in this disclosure.

The server may be a single server, or may be a server cluster including a plurality of servers. In some embodiments, the server cluster may also be a distributed cluster. The present disclosure is also not limited to a specific implementation of the server.

The terminal may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR), a Virtual Reality (VR) device, and other devices that can install and use a content community application (e.g., a fast hand), and the specific form of the electronic device is not particularly limited by the present disclosure. The system can be used for man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like.

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by one of ordinary skill in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.

As shown in fig. 1, when the media asset data processing method is applied to an electronic device, the method may include:

and step 11, the electronic equipment acquires the target media resource which is playing.

In one possible implementation, the electronic device obtains a target media asset being played. Illustratively, the target media asset being played may be a live room being played.

And step 12, the electronic equipment extracts the characteristics of the target media resource in the current time period to obtain characteristic data, and determines the wonderful degree of the target media resource in the next time period according to the characteristic data.

In a possible implementation manner, the electronic device performs feature extraction on the target media resource in the current time period to obtain feature data, and then performs analysis processing on the feature data to obtain the wonderful degree of the target media resource in the current time period in the next time period.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: a method for predicting the wonderful degree of a next time interval of a target media resource which is playing in the current time interval is provided. Since the related art cannot predict the wonderful degree of the target media resource in the next time period, the method and the device can predict the wonderful degree of the target media resource in the next time period and subsequently determine whether the target media resource is recommended for the user according to the predicted wonderful degree, so that the satisfaction degree of the user is improved.

Optionally, the characteristic data comprises at least one of: text feature data, image frame feature data, and audio feature data.

In one possible implementation, if the characteristic data includes at least one of: when the text feature data, the image frame feature data and the audio feature data are used, feature extraction is performed on the target media resource in the current time interval to obtain feature data, and the method at least comprises the following steps: and performing feature extraction on the text data of the target media resource at the current time interval to obtain text feature data. Or, performing feature extraction on the image frame data of the target media resource at the current time interval to obtain image frame feature data. Or, performing feature extraction on the audio data of the target media resource in the current time period to obtain audio feature data.

Because the target media resource is a media resource with fixed duration, when the image frame data of the target media resource at the current time interval is subjected to feature extraction, feature capture can be performed on one frame of image in the target media resource with fixed duration, and the captured one frame of image can be an intermediate frame of image of the target media resource. Illustratively, when the target media resource is a 20s video segment, only one image of any frame of the 10 th, 11 th or 12 th frame of the 20s image data needs to be extracted for feature extraction.

Referring to fig. 1, as shown in fig. 2, step 12 includes:

and step 121, the electronic equipment determines the wonderful degree of the target media resource in the next time period according to at least one item of the text characteristic data, the image frame characteristic data and the audio characteristic data.

When the feature data includes at least one of text feature data, image frame feature data, and audio feature data, the wonderful degree of the target media resource in the next period can be determined according to the known feature data (text feature data, image frame feature data, or audio feature data).

The technical scheme provided by the embodiment can at least bring the following beneficial effects: there are a number of options for the feature data. Since the plurality of selectable items (text feature data, image frame feature data, and audio feature data) are derived from the content data of the target media asset, the next period of the target media asset's highlights determined from the content data will be more accurate.

Optionally, as shown in fig. 2 in conjunction with fig. 1, the media resource data processing method further includes:

and step 13, recommending the target media resource to the target user by the electronic equipment when the wonderful degree of the target media resource in the next time interval meets a preset condition.

In a possible implementation manner, after obtaining the wonderful degree of the target media resource in the next time period, the electronic device determines whether to recommend the target user according to the result of the wonderful degree of the target media resource in the next time period. For example, the result of the target media resource's next period of time wonderful degree may be "wonderful" or "not wonderful", and when the target media resource's next period of time wonderful degree is "wonderful", the target media resource is recommended to the target user. The wonderful degree of the target media resource in the next time period can also be a specific numerical value, under the condition that the wonderful degree of the target media resource in the next time period is a numerical value, a threshold value can be set according to the actual condition, and when the wonderful degree of the target media resource in the next time period meets the threshold value, the user is recommended.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: and after the wonderful degree of the target media resource in the next time period is obtained, whether the target media resource is recommended to the target user is determined according to the wonderful degree result, so that the watching experience of the user is improved.

Optionally, with reference to fig. 1, as shown in fig. 3, step 12 includes:

and step 122, the electronic equipment inputs the target media resource in the current time period into a target prediction model to perform wonderful degree prediction to obtain the wonderful degree of the target media resource in the next time period, the target prediction model is used for performing feature extraction on the target media resource in the current time period to obtain feature data, and the wonderful degree of the target media resource in the next time period is determined according to the feature data.

In one possible implementation, the target media resource being played is input into a target prediction model, which directly outputs the level of wonderness of the target media resource in the next period. The target prediction model is trained based on media resource samples similar to the target media resource, so that the target prediction model can output the wonderful degree of the target media resource in the next period of the playing process, and the output result is more accurate.

For example, refer to the prediction application diagram of fig. 4, in the prediction application diagram, the live broadcast room is divided into t1 time period, t2 time period and t3 time period according to the preset duration, and each time period corresponds to one target media resource. The electronic equipment calculates the wonderful degree of the target media resource in the time period t2 based on the target media resource in the time period t 1. Based on the target media resource at the time period t2, the wonderful degree of the target media resource at the time period t3 is calculated. Based on the target media resource at the time period t3, the wonderful degree of the target media resource at the time period t4 is calculated. And determining whether to recommend the target media resources in the corresponding time period to the user according to the output result. Wherein the t2 period is the next period of the t1 period, the t3 period is the next period of the t2 period, and the t4 period is the next period of the t3 period.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: the electronic equipment predicts the wonderful degree of the target media resource in the next time period by using the target prediction model, the wonderful degree is determined by the target prediction model through feature extraction of the target media resource, and the wonderful degree predicted by the target prediction model is more accurate due to the fact that the extracted features are derived from the content of the target media resource.

Optionally, as shown in fig. 5, the media resource data processing method further includes:

step 51, the electronic device obtains a plurality of media resource samples and each media resource sample label in the plurality of media resource samples.

The media resource sample label is used for representing a first wonderful degree of the associated media resource sample, a time interval corresponding to the associated media resource sample is a next time interval of the media resource sample, and the associated media resource sample and the media resource sample are from the same media resource set.

In one possible implementation, the electronic device may download a public media resource sample dataset from a public database to obtain a plurality of media resource samples; it is also possible to directly read a locally stored media asset sample. Each media asset sample label in the plurality of media asset samples is used to characterize a first highlight of the associated media asset sample.

Illustratively, the media asset sample tags may be "highlights" and "highlights". Or a value between 0 and 1, wherein if the value is greater than the threshold, the label of the media asset sample is determined to be "wonderful", and if the value is less than the threshold, the label of the media asset sample is determined to be "wonderful". Such as: the threshold value corresponding to the highlights is greater than 0.5; the threshold for the matte is less than 0.5. Specifically, the setting of the threshold is related to an actual application scenario, and the disclosure does not limit this.

And step 52, the electronic equipment inputs each media resource sample into the initial prediction model respectively to obtain a prediction label of each media resource sample.

Wherein the initial prediction model is used for predicting the wonderful degree of the associated media resource sample of each media resource sample; the prediction tag is used to characterize a second highlight of associated media asset samples for each media asset sample predicted by the initial prediction model.

In one possible implementation, the electronic device inputs each of the plurality of media resource samples into an initial prediction model, and the initial prediction model outputs a prediction tag for each of the media resource samples. Specifically, the prediction label output by the initial prediction model is related to the content data of each media resource sample, and a final prediction result is determined according to the content data. The content data may be any one of text data, image data, and audio data, or a combination of any two of text data, image data, and audio data, or 3 of text data, image data, and audio data. Illustratively, the prediction tag output by the initial prediction model is used for representing the wonderful degree of the associated media resource sample, and the time interval corresponding to the associated media resource sample is the prediction result 20 s-1 min after the current time interval of the media resource sample. The second highlight may also be presented in numerical form. The specific duration of the next time period is not limited by the present disclosure and is satisfactory.

Alternatively, as shown in fig. 6, the initial prediction model may include a text recognition module, an image recognition module, or an audio recognition module. And under the condition that the content data is text data, the text recognition module performs feature extraction on the text data of each media resource sample to obtain a text feature vector. And under the condition that the content data is image frame data, the image frame identification module performs feature extraction on the image frame data of each media resource sample to obtain an image frame feature vector. Under the condition that the content data is audio data, the audio identification module performs feature extraction on the audio data of each media resource sample to obtain an audio feature vector, and after the text feature vector, the image frame feature vector and the audio feature vector are determined, the prediction label of each media resource sample can be obtained through splicing processing and/or classification processing. Illustratively, the function of the text recognition module is implemented by a BERT model, the function of the image frame recognition module is implemented by a ResNet50 model, and the function of the audio recognition module is implemented by a wav2vec model. The text recognition module, the image recognition module and the audio recognition module may be other models as long as corresponding functions can be realized, and the present disclosure does not limit this.

And 53, the electronic equipment trains the initial prediction model according to the prediction label and the media resource sample label to obtain a target prediction model.

In a possible implementation manner, after obtaining a prediction tag of a media resource sample, the electronic device trains an initial prediction model based on the prediction tag and the media resource sample tag to obtain a trained initial prediction model, and performs multiple rounds of training on the initial prediction model through all media resource samples to obtain a final target prediction model.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: the electronic equipment obtains a plurality of media resource samples and labels of the media resource samples, obtains prediction labels of the media resource samples by inputting each media resource sample into the initial prediction model respectively, and finally trains the initial prediction model according to the prediction labels and the labels of the media resource samples to obtain a target prediction model. The media resource sample label in the disclosure represents the wonderful degree of the associated media resource sample, namely, the real user interest statistical result. Meanwhile, the prediction of the media resource sample by the initial prediction model is determined based on the media resource sample, and the obtained prediction label can reflect the characteristics of the media resource sample to a certain extent. The target prediction model obtained by training the initial prediction model based on the media resource sample labels and the prediction labels can be used for predicting the wonderful degree of a certain media resource in a future period of time, and the prediction accuracy can be ensured.

Optionally, with reference to fig. 5, as shown in fig. 7, in step 53, the electronic device trains an initial prediction model according to the prediction tag and the media resource sample tag to obtain a target prediction model, where the method includes:

and 531, the electronic equipment determines a loss value based on the loss function, the prediction tag and the media resource sample tag.

In a possible implementation manner, the electronic device may calculate a difference (i.e., a loss value) between the prediction tag of each media resource sample and the tag of each media resource sample based on a loss function, and then train the initial prediction model according to the loss value to obtain a target prediction model meeting the requirement. Illustratively, the loss function is a cross-entropy loss function.

And 532, the electronic equipment iteratively updates parameters of the initial prediction model according to the loss value and the preset learning rate to obtain a target prediction model.

In a possible implementation manner, the iterative update of the parameters in the initial prediction model is realized based on a gradient descent method, the gradient descent method is a given learning rate, the parameters are updated by a determined step length, the learning rate is higher in the early stage of the iterative update, and the step length is longer, so that the gradient descent can be performed at a higher speed; and in the later stage of iterative updating, gradually reducing the value of the learning rate and reducing the step length, thus being beneficial to the convergence of the loss value and leading the updated parameter to be more easily close to the optimal solution. Several learning rate attenuation methods are commonly used: piecewise constant decays, polynomial decays, exponential decays, natural exponential decays, cosine decays, linear cosine decays, noise linear cosine decays, and so forth.

Illustratively, the learning rate is 1 e-5. When the electronic equipment obtains a loss value according to a difference value between a prediction label of a media resource sample and a label of the media resource sample, and parameters of an initial prediction model are adjusted according to the loss value and a preset learning rate. And training the adjusted initial prediction model by using the prediction label of the next media resource sample and the label of the media resource sample until the loss value is smaller than a preset threshold value, thereby obtaining a final target prediction model.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: and 531-532, providing a method for training the initial prediction model based on the loss function and the preset learning rate to obtain the target prediction model, wherein by adopting the method, the training of the target prediction model can be deeper, and the target prediction model with a more accurate prediction result can be obtained.

In a possible embodiment, the feature data is a feature vector, and since the feature data is obtained by feature extraction of content data (at least one of text data, image frame data, or audio data) of the target media resource, the feature vector corresponding to the feature data includes at least one of a text feature vector, an image frame feature vector, or an audio feature vector.

After at least one feature vector is obtained, the obtained feature vectors are classified, and the wonderful degree of the target media resource in the next time period can be determined. For example, the classification process performed on any one of the text feature vector, the image frame feature vector, or the audio feature vector may be performed by a softmax classifier, which performs two classifications of the feature vector and outputs the degree of wonderness of the target media resource in the next period.

The feature vectors to which the feature data corresponds may further include at least two of a text feature vector, an image frame feature vector, or an audio feature vector.

When the feature vector is any two of a text feature vector, an image frame feature vector, or an audio feature vector, there are 3 cases, the first: text feature vectors and image frame feature vectors; and the second method comprises the following steps: image frame feature vectors and audio feature vectors; and the third is that: text feature vectors and audio feature vectors. Therefore, the obtained feature vectors need to be spliced first, and then classified to output the wonderful degree of the target media resource in the next time period.

When the feature vectors are all text feature vectors, image frame feature vectors and audio feature vectors, all the feature vectors need to be merged by splicing because the obtained feature vectors include three, and classification processing is performed after merging so as to output the wonderful degree of the target media resource in the next period.

Illustratively, the text feature vector, the image frame feature vector, and the audio feature vector are feature vectors of 768 dimensions.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: since the feature vector of the target media resource includes multiple types, the wonderful degree of the target media resource in the next period can be determined according to multiple different combinations of the feature vectors. By providing a corresponding processing mode for each combination, the applicability of the target prediction model can be improved.

In one possible implementation, the media asset sample tag represents a first highlight level of a next time period of the media asset sample. The first level of wonderness may be characterized in a number of ways. The first wonderful degree of the present disclosure is determined according to the playing data of the user, and the specific playing data of the user is the long-play rate of the next media resource sample of the media resource samples. The long broadcast rate is the ratio of the number of people who enter the next media resource sample in the first preset time period and watch the media resource sample beyond the first preset time period to the number of people who click to enter the next media resource sample in the first preset time period.

Illustratively, the duration of a media asset (e.g., a live room) is sliced at 20 seconds to obtain a plurality of different video segments. For a 20s video segment, the first highlight is characterized by a long play rate calculated by: the number a of people entering the video clip within 20 seconds at present is taken as a denominator; the number of people b who enter the live broadcast room and watch for more than 20 seconds in the current 20 seconds is taken as a numerator, and the ratio of b to a is the long broadcast rate. Meanwhile, it is specified that when the long-run rate is greater than 0.8, the video segment is considered as a highlight sample, and when the long-run rate is less than 0.8, the video segment is considered as a non-highlight sample. When the long play rate is 0.9, i.e. the first highlight is greater than 0.8, the video segment is a highlight sample. When the long play rate is 0.3, i.e. the first highlight is less than 0.4, the video segment is an unfinished sample. The first highlight calculated from the video segment is a label of a video segment preceding the video segment.

The technical scheme provided by the embodiment can at least bring the following beneficial effects: the label of the media resource sample is characterized by a first wonderful degree, and the first wonderful degree is determined according to the long-play rate of the next media resource sample corresponding to the media resource sample. Because the long broadcast rate is directly related to the actual performance of the media resources watched by the user, the finally output labels of the media resource samples can represent the interest degree of the user, and therefore the target prediction model obtained by training according to the labels of the media resource samples also has higher wonderful degree prediction accuracy.

The method provided by the embodiments of the present disclosure is described in detail above in conjunction with fig. 1-7. In order to implement the functions, the media resource data processing device includes hardware structures and/or software modules for executing the respective functions, and the hardware structures and/or software modules for executing the respective functions may constitute one media resource data processing device. Those of skill in the art will readily appreciate that the present disclosure can be implemented in hardware or a combination of hardware and computer software for implementing the exemplary algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The embodiment of the present disclosure may perform functional module division on the media resource data processing apparatus according to the above method example, for example, the media resource data processing apparatus may divide each functional module corresponding to each function, or may integrate two or more functions into one processing unit. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiments of the present disclosure is illustrative, and is only one division of logic functions, and there may be another division in actual implementation.

Hereinafter, a media resource data processing apparatus provided by an embodiment of the present disclosure is described in detail with reference to fig. 8. It should be understood that the description of the apparatus embodiments corresponds to the description of the method embodiments, and therefore, for brevity, details are not repeated here, since the details that are not described in detail may be referred to the above method embodiments.

Fig. 8 is a block diagram illustrating a logical structure of a media asset data processing device according to an example embodiment. Referring to fig. 8, the media asset data processing device includes: an acquisition unit 810 and a processing unit 820. An obtaining unit 810 configured to obtain a target media resource being played; for example, in conjunction with fig. 1, the obtaining unit 810 may be configured to perform step 11. The processing unit 820 is configured to perform feature extraction on the target media resource in the current time period to obtain feature data, and determine the wonderful degree of the target media resource in the next time period according to the feature data. For example, in conjunction with fig. 1, processing unit 820 may be used to perform step 12.

Optionally, the characteristic data comprises at least one of: text feature data, image frame feature data, and audio feature data; the processing unit 820 is further configured to determine the wonderful degree of the target media resource in the next period according to at least one of the text feature data, the image frame feature data and the audio feature data. For example, in conjunction with fig. 2, processing unit 820 may be used to perform step 121.

Optionally, the processing unit 820 is further configured to recommend the target media resource to the target user when the wonderness of the target media resource in the next time period meets a preset condition. For example, in conjunction with fig. 2, processing unit 820 may be used to perform step 13.

Optionally, the processing unit 820 is further configured to input the target media resource in the current time interval into a target prediction model for wonderful degree prediction, so as to obtain wonderful degree of the target media resource in the next time interval, where the target prediction model is configured to perform feature extraction on the target media resource in the current time interval, so as to obtain feature data, and determine the wonderful degree of the target media resource in the next time interval according to the feature data. For example, in conjunction with fig. 3, processing unit 820 may be used to perform step 122.

Optionally, the obtaining unit 810 is further configured to obtain a plurality of media resource samples and each media resource sample label in the plurality of media resource samples, where the media resource sample label is used to represent a first highlight degree of an associated media resource sample, a time period corresponding to the associated media resource sample is a next time period of the time period in which the media resource sample is located, and the associated media resource sample and the media resource sample are derived from the same media resource set; for example, in conjunction with fig. 5, the obtaining unit 810 may be configured to perform step 51.

The processing unit 820 is further configured to input each media resource sample into the initial prediction model respectively, and obtain a prediction tag of each media resource sample; the prediction label is used for representing a second wonderful degree of the associated media resource sample of each media resource sample predicted by the initial prediction model; for example, in connection with fig. 5, processing unit 820 may be used to perform step 52.

The processing unit 820 is further configured to train the initial prediction model according to the prediction tag and the media resource sample tag, so as to obtain a target prediction model. For example, in connection with fig. 5, processing unit 820 may be used to perform step 53.

Optionally, the processing unit 820 is further configured to determine a loss value based on the loss function, the prediction tag and the media resource sample; for example, in connection with fig. 7, processing unit 820 may be used to perform step 531.

And the processing unit 820 is further configured to iteratively update parameters of the initial prediction model according to the loss value and a preset learning rate to obtain a target prediction model. For example, in connection with fig. 7, processing unit 820 may be used to perform step 532.

Of course, the path selection device provided by the embodiment of the present disclosure includes, but is not limited to, the above modules, for example, the path selection device may further include the storage unit 830. The storage unit 830 may be used for storing program codes of the write path selection apparatus, and may also be used for storing data generated by the write path selection apparatus during operation, such as data in a write request.

Fig. 9 shows a schematic diagram of a possible structure of the electronic device involved in the above embodiment. As shown in fig. 9, the electronic device 90 includes a processor 901 and a memory 902.

It is understood that the electronic device 90 shown in fig. 9 can implement all the functions of the above-described media asset data processing method. The functions of the various modules in the media resource data processing apparatus described above may be implemented in the processor 901 of the electronic device 90. The storage means of the media asset data processing device corresponds to the memory 902 of the electronic device 90.

Among other things, the processor 901 may include one or more processing cores, such as a 4-core processor, a 9-core processor, and so on. The processor 901 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

Memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 902 is used to store at least one instruction for execution by the processor 901 to implement the media asset data processing method provided by the disclosed method embodiments.

In some embodiments, the electronic device 90 may further optionally include: a peripheral interface 903 and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 903 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 904, a display screen 905, a camera assembly 906, an audio circuit 907, a positioning assembly 908, and a power supply 909.

The peripheral interface 903 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 901, the memory 902, and the peripheral interface 903 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 904 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 904 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 904 may communicate with other media resource data processing apparatus via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 9G), Wireless local area networks, and/or Wi-Fi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (Near Field Communication) related circuits, which are not limited by this disclosure.

The display screen 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 905 is a touch display screen, the display screen 905 also has the ability to capture touch signals on or over the surface of the display screen 905. The touch signal may be input to the processor 901 as a control signal for processing. At this point, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 905 may be one, providing the front panel of the electronic device 90; the Display panel 905 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 906 is used to capture images or video. Optionally, camera assembly 906 includes a front camera and a rear camera. Generally, the front camera is disposed on the front panel of the media resource data processing device, and the rear camera is disposed on the back of the media resource data processing device. Audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for realizing voice communication. For stereo sound collection or noise reduction purposes, the microphones may be multiple and disposed at different locations of the electronic device 90. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuit 907 may also include a headphone jack.

The positioning component 908 is used to locate a current geographic Location of the electronic device 90 to implement navigation or LBS (Location Based Service). The Positioning component 908 may be a Positioning component based on the GPS (Global Positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.

The power supply 909 is used to supply power to each component in the electronic device 90. The power source 909 may be alternating current, direct current, disposable or rechargeable. When power source 909 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the electronic device 90 also includes one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensors, gyroscope sensors, pressure sensors, fingerprint sensors, optical sensors, and proximity sensors.

The acceleration sensor may detect the magnitude of acceleration in three coordinate axes of a coordinate system established with the electronic device 90. The gyroscope sensor can detect the body direction and the rotation angle of the electronic equipment 90, and the gyroscope sensor and the acceleration sensor can cooperatively acquire the 3D action of the user on the electronic equipment 90. The pressure sensors may be disposed on the side bezel of the electronic device 90 and/or underneath the display screen 905. When the pressure sensor is disposed on the side frame of the electronic device 90, a holding signal of the electronic device 90 by the user can be detected. The fingerprint sensor is used for collecting fingerprints of users. The optical sensor is used for collecting the intensity of ambient light. Proximity sensors, also known as distance sensors, are typically provided on the front panel of the electronic device 90. The proximity sensor is used to capture the distance between the user and the front of the electronic device 90.

The present disclosure also provides a computer-readable storage medium having instructions stored thereon, which, when executed by a processor of a media resource data processing apparatus, enable the media resource data processing apparatus to execute the media resource data processing method provided by the present disclosure.

The embodiment of the present disclosure further provides a computer program product containing instructions, which when run on a media resource data processing apparatus, causes the media resource data processing apparatus to execute the media resource data processing method provided by the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for processing media resource data, comprising:

acquiring a target media resource which is being played;

and performing feature extraction on the target media resource in the current time period to obtain feature data, and determining the wonderful degree of the target media resource in the next time period according to the feature data.

2. The method of claim 1, wherein the characterization data comprises at least one of: text feature data, image frame feature data, and audio feature data;

the determining the wonderful degree of the target media resource in the next time period according to the feature data comprises:

and determining the wonderful degree of the target media resource in the next time period according to at least one item of the text characteristic data, the image frame characteristic data and the audio characteristic data.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

and recommending the target media resource to a target user when the wonderful degree of the target media resource in the next time interval meets a preset condition.

4. The method of claim 1, wherein the extracting features of the target media resource in a current time interval to obtain feature data, and determining a wonderful degree of the target media resource in a next time interval according to the feature data comprises:

inputting the target media resource of the current time period into a target prediction model to perform wonderful degree prediction to obtain the wonderful degree of the target media resource in the next time period, wherein the target prediction model is used for performing feature extraction on the target media resource of the current time period to obtain feature data, and determining the wonderful degree of the target media resource in the next time period according to the feature data.

5. The method of claim 4,

the feature data obtained by the target prediction model are feature vectors; the feature vector comprises at least two of a text feature vector, an image frame feature vector, and an audio feature vector;

and the wonderful degree of the target media resource in the next time period is obtained by classifying the spliced feature vectors.

6. The method of claim 4, further comprising:

obtaining a plurality of media resource samples and each media resource sample label in the plurality of media resource samples, wherein the media resource sample label is used for representing a first wonderful degree of an associated media resource sample, a time interval corresponding to the associated media resource sample is a next time interval of the time interval in which the media resource sample is positioned, and the associated media resource sample and the media resource sample are from the same media resource set;

inputting each media resource sample into an initial prediction model respectively to obtain a prediction label of each media resource sample; the prediction tag is used for characterizing a second wonderful degree of the associated media resource sample of each media resource sample predicted by the initial prediction model;

and training the initial prediction model according to the prediction label and the media resource sample label to obtain a target prediction model.

7. A media asset data processing device, comprising:

an acquisition unit configured to acquire a target media resource being played;

and the processing unit is configured to perform feature extraction on the target media resource in the current time period to obtain feature data, and determine the wonderful degree of the target media resource in the next time period according to the feature data.

8. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the media asset data processing method of any of claims 1-6.

9. A computer-readable storage medium having instructions stored thereon, wherein the instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the media asset data processing method of any of claims 1-6.

10. A computer program product comprising computer instructions for implementing a media asset data processing method according to any of claims 1-6 when the computer instructions are executed by an electronic device.