CN114117212A

CN114117212A - Media data processing method and device, electronic equipment and storage medium

Info

Publication number: CN114117212A
Application number: CN202111342502.9A
Authority: CN
Inventors: 李杨; 陈洪亮; 姜清华
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-11-12
Filing date: 2021-11-12
Publication date: 2022-03-01

Abstract

The disclosure provides a media data processing method, a device, an electronic device and a storage medium, and belongs to the technical field of networks. In the embodiment of the present disclosure, a vector corresponding to a content tag of first media data may be obtained as a first vector, and a vector corresponding to a content tag of second media data may be obtained as a second vector, where media types of the first media data and the second media data are different. And determining second media data related to the first media data according to the first vector and second vectors of the second media data to serve as candidate media data. And performing media data recommendation based on the candidate media data. Therefore, the related media data are acquired from the media data of other media types for recommendation, so that the recommended content is richer, and the recommendation effect can be improved to a certain extent.

Description

Media data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to network technologies, and in particular, to a media data processing method and apparatus, an electronic device, and a storage medium.

Background

With the continuous development of network technology, the more information in a network platform, for example, a user can view information such as videos, articles, pictures, texts and the like based on the network platform.

In the related art, in order to improve the viewing efficiency of the user, similar information of the same type is often recommended for the information selected to be viewed by the user, and the recommendation effect is poor.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a media data processing method, apparatus, electronic device, and storage medium.

According to a first aspect of the present disclosure, there is provided a media data processing method, the method comprising:

acquiring a vector corresponding to a content tag of the first media data as a first vector, and acquiring a vector corresponding to a content tag of the second media data as a second vector; the first media data is of a different media type than the second media data;

determining second media data related to the first media data according to the first vector and second vectors of the second media data to serve as candidate media data;

and recommending the media data based on the candidate media data.

Optionally, the recommending media data based on the candidate media data includes:

respectively acquiring text information included in the first media data and the candidate media data to obtain first text information and second text information;

for any candidate media data, calculating the repetition degree between the first text information and the second text information of each candidate media data;

determining candidate media data corresponding to the repetition degree smaller than or equal to a preset repetition degree threshold value as target media data;

and recommending the target media data.

Optionally, the recommending the target media data includes:

determining a recommendation value of each target media data according to semantic relevance between each target media data and the first media data, the release time of the target media data and the information richness of the target media data; the recommendation value is positively correlated with the semantic correlation degree, the release time and the information richness;

and recommending the determined target media data to the user based on the recommendation value of each target media data.

Optionally, the recommending the determined target media data to the user based on the recommendation value of each target media data includes:

displaying media data with the maximum recommendation value in the target media data and a recommendation display element in a display interface of the first media data, wherein the recommendation display element is used for indicating that display at least comprises part of media data in the target media data;

and under the condition that the triggering operation on the recommendation display element is detected, displaying the viewing options of at least part of the target media data.

Optionally, the determining, according to the first vector and a second vector of each of the second media data, second media data related to the first media data as candidate media data includes:

calculating semantic correlation between the first media data and each second media data according to the first vector and each second vector;

and determining the second media data with the semantic relevance larger than a preset relevance threshold as the candidate media data.

Optionally, the method further includes:

acquiring the category of the first media data to serve as a first category;

determining third media data of which the category is matched with the first category as the second media data; the third media data is of the same media type as the second media data.

Optionally, in a case that the first media data is a video, the second media data is an article; and under the condition that the first media data is an article, the second media data is a video.

According to a second aspect of the present disclosure, there is provided a media data processing apparatus, the apparatus comprising:

a first obtaining module configured to obtain a vector corresponding to a content tag of the first media data as a first vector, and obtain a vector corresponding to a content tag of the second media data as a second vector; the first media data is of a different media type than the second media data;

a first determining module configured to determine second media data related to the first media data as candidate media data according to the first vector and a second vector of each of the second media data;

and the recommending module is configured to recommend the media data based on the candidate media data.

Optionally, the recommendation module is specifically configured to:

and recommending the target media data.

Optionally, the recommendation module is further specifically configured to:

Optionally, the first determining module is specifically configured to

Optionally, the apparatus further comprises:

a second obtaining module configured to obtain a category to which the first media data belongs as a first category;

a second determination module configured to determine, as the second media data, third media data of which the category matches the first category; the third media data is of the same media type as the second media data.

In accordance with a third aspect of the present disclosure, there is provided an electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the media data processing method according to any one of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, cause the electronic device to perform the media data processing method according to any one of the first aspect.

According to a fifth aspect of the present disclosure, a computer program product is provided, comprising readable program instructions, which, when executed by a processor of an electronic device, cause the electronic device to perform the media data processing method according to any one of the first aspect.

Compared with the related art, the method has the following advantages and positive effects:

the media data processing method provided by the embodiment of the disclosure acquires a vector corresponding to a content tag of first media data as a first vector, and acquires a vector corresponding to a content tag of second media data as a second vector, where media types of the first media data and the second media data are different. And determining second media data related to the first media data according to the first vector and second vectors of the second media data to serve as candidate media data. And performing media data recommendation based on the candidate media data. Therefore, the related media data are acquired from the media data of other media types for recommendation, so that the recommended content is richer, and the recommendation effect can be improved to a certain extent.

Meanwhile, in the embodiment of the disclosure, media data is determined across media types, that is, related media data is matched across modes and recommended, so that the utilization rate of media data of other media types can be improved to a certain extent, and thus the overall resource utilization rate of the network platform is improved.

The foregoing description is only an overview of the technical solutions of the present disclosure, and the embodiments of the present disclosure are described below in order to make the technical means of the present disclosure more clearly understood and to make the above and other objects, features, and advantages of the present disclosure more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the disclosure. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating steps of a media data processing method according to an embodiment of the disclosure;

FIG. 2 is a schematic view of an interface provided by an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart provided by an embodiment of the present disclosure;

fig. 4 is a block diagram of a media data processing device provided by an embodiment of the present disclosure;

FIG. 5 is a block diagram illustrating an apparatus for media data processing in accordance with an exemplary embodiment;

fig. 6 is a block diagram illustrating an apparatus for media data processing according to an example embodiment.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a flowchart illustrating steps of a media data processing method according to an embodiment of the disclosure, where as shown in fig. 1, the method may include:

step 101, obtaining a vector corresponding to a content tag of first media data as a first vector, and obtaining a vector corresponding to a content tag of second media data as a second vector; the first media data is of a different media type than the second media data.

In the embodiment of the present disclosure, the first media data may be media data of a first media type, and the second media data may be media data of a second media type, where specific types of the first media type and the second media type may be set according to actual requirements, and different media types may represent a modality correspondingly, where the media data of different media types have different forms when transmitting information. By way of example, the media types may include video, images, text, and so forth.

The vector corresponding to the content tag of the media data may be obtained by vectorizing (Embedding) the content tag of the media data. The content tag of the media data can be preset for the media data, the labeling system of the content tag can be set according to actual requirements, and the content tag can represent the content related to the media data. For example, taking media data as an example of a video, assuming that the video is an entertainment news video with three participating in AA singing programs and a first name, the content tags of the video may include: zhang San, AA singing program, champion and entertainment. Assuming that the video is of the li-quad teaching traditional embroidery skill, the content tags of the video may include: "Lisi", "science popularization", "traditional culture" and "embroidery".

Further, the media data of the media type may be identified in advance based on the preset content understanding model corresponding to each media type, so as to obtain the content tag of each media data of the media type. Correspondingly, when the vector corresponding to the content tag is obtained, the content tag identified in advance can be obtained, and then the vectorization is performed on the content tag to obtain the vector corresponding to the content tag. Of course, vectorization may be performed in advance, so that the vector corresponding to the content tag may be directly read.

And step 102, determining second media data related to the first media data according to the first vector and second vectors of the second media data to serve as candidate media data.

Because the content forms and the content tag setting modes of the media data of different media types often have differences, the mode of directly recalling the related media data in other media types by adopting explicit content tags or directly vectorizing the media data, and the mode of recalling the related media data in other media types based on the obtained high-dimensional vectors may cause that the recalled media data are not accurate enough. In the step, the relevant media data is determined based on the vector corresponding to the content tag in a cross-modal manner, and the content tag is often shorter, so that the content tag is more refined compared with the whole media data, and the vector corresponding to the content tag can represent the content related to the media data. Therefore, the candidate media data are recalled based on the vector corresponding to the content tag, and the accuracy of the recalled candidate media data can be ensured to a certain extent.

And 103, recommending media data based on the candidate media data.

The embodiment of the present disclosure may recommend the candidate media data to the user directly, or may further recommend the candidate media data after screening, which is not limited by the present disclosure.

Optionally, in an implementation scenario, the first media data may be a video, and the second media data may be an article, i.e., teletext information. Alternatively, the first media data may be an article and the second media data may be a video. In the application of the current content application program (APP) industry, video and graphics information are used as two different forms of content, and the technical system usually focuses more on the internal connection of each form of content, i.e. solves the correlation of video content and the correlation of graphics information content, and is applied to the actual recommendation information stream. That is, the video related to the video recommendation content is recommended, and the graphics context information related to the graphics context information is recommended. For example, after a user watches a video, the related art may recommend more related videos to the user in combination with the video relevance analysis. After the user browses a piece of image-text information, the user is recommended the relevant image-text information by combining the analysis of the text. However, the video and the graphics information are used as two different content forms, and the advantages and disadvantages of the two forms are provided during information transmission. For example, when watching a video, if a user wants to understand the relevant subject of the video content deeply and recognize some media views, the user needs to be able to quickly view the relevant teletext information, and the user is often unable to satisfy the requirement only by relying on the video itself.

In the embodiment of the disclosure, the video is used as the first media data, the image-text information is used as the second media data, or the image-text information is used as the first media data, and the video is used as the second media data, so that the recommendation of the relevant image-text information to the user based on the video or the recommendation of the relevant video to the user based on the image-text information can be realized. Therefore, when the user views the video, the user can combine the image-text information related to the video to conduct more extensive reading and explore more interests of the user. When the image-text information is checked, the video related to the image-text video is combined, so that the user can know the related information more visually, the user experience is improved, and the information acquisition efficiency of the user is improved.

Optionally, the operation of determining, according to the first vector and the second vectors of the second media data, the second media data related to the first media data to serve as candidate media data may specifically include:

step S21, calculating a semantic correlation between the first media data and each of the second media data according to the first vector and each of the second vectors.

In this step, for any second media data, a vector distance between the first vector and the second vector of the second media data may be calculated. A semantic relatedness between the first media data and the second media data is then determined based on the vector distance. The vector distance may be calculated based on a preset distance calculation method. The semantic relatedness may be inversely related to the vector distance. That is, the closer the vector distance, the greater the semantic relevance may be. Conversely, the farther the vector distance, the smaller the semantic relatedness may be.

Furthermore, the content tags serve as dominant features, the vectors serve as recessive features, and the semantic relevance is calculated based on the first vectors and the second vectors corresponding to the content tags in the disclosure, namely, the calculation of the semantic relevance is realized through a combination mode of the dominant \ recessive features, so that the calculation accuracy is ensured to a certain extent.

Step S22, determining the second media data with the semantic relevance greater than a preset relevance threshold as the candidate media data.

In this step, the preset correlation threshold may be set according to actual requirements, which is not limited by this disclosure. Further, if the semantic relevance is larger, it may be determined that the second media data is more relevant to the first media data. Conversely, if the semantic relevance is smaller, it may be determined that the second media data is less relevant to the first media data. Therefore, the second media data with a larger semantic relevance degree, namely the second media data with the semantic relevance degree larger than the preset relevance degree threshold value, can be used as the candidate media data, so that the subsequently recommended media data are ensured to have sufficient relevance.

In the embodiment of the disclosure, the semantic relevance between the first media data and each second media data is calculated according to the first vector and each second vector, and the second media data with the semantic relevance larger than a preset relevance threshold is used as the candidate media data. In this way, the finally determined candidate media data can be made to be media data strongly correlated with the first media data to some extent, thereby ensuring the recommendation effect of the media data. Meanwhile, semantic relevance is calculated based on the vector corresponding to the content tag, relevance matching of different modal contents is achieved, more efficient information meeting user requirements can be mined during subsequent recommendation, convenient information expansion capability is provided, and information acquisition efficiency is improved.

Optionally, the category of the second media data and the category of the first media data may be the same in the embodiment of the present disclosure. Accordingly, the second media data may be determined by:

and step S31, acquiring the category of the first media data as a first category.

In this step, the category to which the first media data is determined in advance may be read, thereby achieving acquisition of the first category. For example, the first media data may be identified based on a content understanding model corresponding to the first media type in an offline stage, so as to obtain a category to which the first media data belongs.

Step S33, determining third media data of which the category matches the first category as the second media data; the third media data is of the same media type as the second media data.

In this step, the third media data may be all media data of the second media type in the network platform. The category to which the third media data belongs, which is determined in advance for the third media data, may be read, and for example, the category to which the third media data belongs may be obtained by identifying the third media data based on the content understanding model corresponding to the second media type in the offline stage. Further, the category to which the third media data belongs may be compared with the first category, and if the category to which the third media data belongs matches the first category, that is, the categories are the same, the third media data may be determined as the second media data.

It should be noted that, in the embodiment of the present disclosure, before the step 101, tag matching may be performed, for example, a difference between the number and the type of the content tag of the second media data and the content tag of the first media data may be detected, and if the difference is greater than a preset threshold, the second media data is rejected, so as to further narrow the selection range and improve the selection efficiency.

In the embodiment of the present disclosure, the category to which the first media data belongs is acquired as the first category. And then determining third media data of which the category is matched with the first category as second media data, wherein the third media data and the second media data have the same media type. In this way, by performing category matching in advance, media data of which the category belongs to the other media types is matched with the category to which the first media data belongs is selected as the second media data, and it is possible to ensure that a greater probability of the second media data participating in selection is associated with the first media data to a certain extent, that is, a high-quality candidate is provided for selection operation, and at the same time, the selection range is narrowed, so that the selection cost can be saved, and the selection efficiency can be improved.

Optionally, in the embodiment of the present disclosure, the operation of recommending media data based on the candidate media data may further include the following steps:

step S41, obtaining text information included in the first media data and the candidate media data, respectively, to obtain first text information and second text information.

Specifically, in the case where the media data is text, the content of the media data may be directly extracted as text information. In the case that the media data is an article, a text portion included in the media data may be extracted, and characters on a picture in the article may be extracted based on an Optical Character Recognition (OCR) algorithm, so as to obtain text information. Further, under the condition that the media data are pictures, characters on the pictures can be directly extracted based on an OCR algorithm, and then text information is obtained. Further, in the case that the media data is a video, the characters on each frame of the video picture may be extracted based on an OCR algorithm, and the audio stream in the video may be converted into a text based on an Automatic Speech Recognition (ASR) algorithm, so as to obtain text information. Further, the text information extracted from the first media data is the first text information, and the text information extracted from the candidate media data is the second text information.

Step S42, for any one of the candidate media data, calculating a repetition degree between the first text information and the second text information of the candidate media data.

In this step, the first text information may be compared with the characters in the second text information to determine the number of characters in the second text information that are repeated with the first text information, that is, the number of characters belonging to the first text information that appear in the second text. And then calculating the ratio of the number of the characters to the total number of the characters of the second text information so as to obtain the repeatability. The degree of repetition may measure a proportion of repeated information present in the second text information, and the higher the degree of repetition, the more repeated information may be determined. Conversely, the lower the degree of repetition, the less repetitive information can be determined.

Step S43, determining candidate media data corresponding to the threshold value of the repetition degree less than or equal to the preset repetition degree as target media data.

In this step, the preset repetition threshold may be preset according to actual requirements, for example, the preset repetition threshold may be 70%. Accordingly, if the degree of repetition of the candidate media data is greater than the preset threshold degree of repetition, it may be determined that there is a large amount of repeated information in the candidate media data, which is similar information highly repeated with the first media data, and thus, the candidate media data may be filtered. Further, candidate media data corresponding to the remaining repetition degree smaller than or equal to the preset repetition degree threshold may be determined as target media data.

In the embodiment of the disclosure, the text information included in the first media data and the candidate media data is respectively obtained to obtain the first text information and the second text information. For any candidate media data, a degree of repetition between the first text information and the second text information of the candidate media data is calculated. And determining candidate media data corresponding to the repetition degree smaller than or equal to a preset repetition degree threshold value as target media data. In this way, candidate media data highly repeated with the first media data can be identified and filtered before recommendation, and it can be ensured that the target media data for final recommendation is data related to the first media data, rather than overlapping similar data, and recommendation effect can be improved.

Optionally, the operation of recommending the target media data may specifically include:

step S51, determining a recommendation value of each target media data according to the semantic relevance between each target media data and the first media data, the release time of the target media data and the information richness of the target media data; the recommendation value is positively correlated with the semantic relevance, the release time and the information richness.

In the embodiment of the present disclosure, the higher the semantic relevance, the higher the relevance of the target media data and the first media data can be determined. Specifically, the recommendation value may be calculated by obtaining the semantic relevance calculated above. The earlier the release time, the more time-efficient the target media data can be determined. Specifically, the related information of the target media data can be obtained, and the publishing time of the target media data can be searched. The publication time may include a year, a month and a day, and may be 20200815 or 20210112, for example. Further, the higher the information richness, the stronger the ability of the target media data to expand the related information can be determined. Specifically, the number of topics involved in the target media data may be counted, and the information richness may be determined according to the number of topics. Wherein, the more the number of the related topics is, the higher the information richness can be. Accordingly, in the embodiment of the present disclosure, the recommendation value is determined in a manner positively correlated with the semantic correlation, the publishing time, and the information richness, so that the target media data with higher correlation with the first media data, higher timeliness, and higher ability to expand the related information can be preferentially recommended to a certain extent, thereby ensuring the recommendation effect. Specifically, when the recommendation value is calculated, the semantic relevance, the release time and the information richness calculation weighted value can be calculated, so that the recommendation value of the target media data is obtained.

It should be noted that, before the calculation, the target media data whose release time is longer than the preset time threshold from the current time may be removed, that is, the target media data within a certain time window from the current time may be calculated as far as possible, so that the calculation amount may be reduced while avoiding that the time efficiency of the subsequently recommended target media data is weak.

And step S52, recommending the determined target media data to the user based on the recommendation value of each target media data.

In one implementation, the top M target media data with the largest recommendation value may be recommended to the user. For example, the target media data may be sorted according to the recommended value in a big-to-small/small-to-big manner, so as to obtain a sorting result. And under the condition of sorting from big to small, obtaining the first M target media data in the sorting result so as to obtain the first M target media data with the maximum recommendation value. And under the condition of sorting from small to large, obtaining the last M target media data in the sorting result so as to obtain the first M target media data with the maximum recommendation value. Where M may be an integer set according to actual requirements, and for example, M may be 5. Further, in another optional implementation manner, target media data with a recommendation value greater than a preset recommendation value threshold may also be recommended to the user. The preset recommended value threshold may be set according to actual requirements, which is not limited by the present disclosure.

In the embodiment of the disclosure, the recommendation value of each target media data is determined according to the semantic relevance between each target media data and the first media data, the release time of the target media data and the information richness of the target media data, and the recommendation value is positively correlated with the semantic relevance, the release time and the information richness. And recommending the determined target media data to the user based on the recommendation value of each target media data. Thus, the target media data which is more related to the first media data, has stronger timeliness and richer provided information can be preferentially recommended to the user. Meanwhile, the problem that the selection difficulty of the user is increased due to the fact that the recommended target media data are too large can be solved.

Optionally, the operation of recommending the determined target media data to the user based on the recommended value of each target media data may specifically include:

step S61, in a display interface of the first media data, displaying media data with a maximum recommendation value in the target media data and a recommendation display element, where the recommendation display element is used to indicate that a display at least includes part of the media data in the target media data.

In this embodiment of the disclosure, the display interface of the first media data may be an interface on which the first media data is displayed, for example, taking the first media data as a video, the display interface of the first media data may be a play interface of the first media data. The recommendation display element may be a designated identifier, where the designated identifier may be a preset identifier for performing abbreviated display on part of the media data in the target media data, and may be, for example, an identifier for performing abbreviated display on the remaining M-1 target media data. For example, the specified identity may be a "more" option. For example, fig. 2 is a schematic diagram of an interface provided by an embodiment of the present disclosure, and as shown in fig. 2, the interface displays first media data a, target media data b with a maximum recommended value among M target media data, and a specified identifier c. In other words, in specific application, the target media data with the maximum recommended value is strongly displayed, and the target media data of the whole TOPM is checked by clicking more product forms, so that the data checking efficiency is improved, and the flexibility of user operation is improved.

It should be noted that, in the embodiment of the present disclosure, when the first media data is displayed, for example, when a video is played, the target media data may be automatically matched for the first media data, and recommended, so that the viewing experience of the user may be improved.

And step S62, under the condition that the triggering operation of the recommendation display element is detected, displaying the viewing options of at least part of the target media data.

In the embodiment of the present disclosure, the trigger operation may be predefined, for example, the trigger operation may be a click operation, a long-press operation, a voice trigger operation, and the like. Correspondingly, under the condition that the trigger operation of the recommendation display element is received, the fact that the user needs to view at least part of the media data in the target media data can be determined, and then viewing options of at least part of the media data in the target media data can be displayed. At least part of the target media data may include all the target media data, or may only include other target media data except the target media data with the largest recommended value, the viewing option may include a title of the target media data, and the user may click the viewing option to control the terminal to display the target media data.

In the embodiment of the disclosure, in a display interface of first media data, media data with a maximum recommendation value in target media data and a recommendation display element are displayed, and the recommendation display element is used for indicating that display at least includes part of the media data in the target media data. And in the case of detecting the triggering operation of the recommendation display element, displaying the viewing options of at least part of the media data in the target media data. By directly displaying the target media data with the maximum recommendation value in the display interface of the first media data, the user can be ensured to directly view the target media data with the highest correlation, so that the user can realize rapid cross-mode content consumption, and the data viewing efficiency is improved. Meanwhile, under the condition that the trigger operation of the recommendation display element is detected, the viewing options of at least part of the media data in the target media data are displayed, so that the user can more flexibly acquire information as required, and the flexibility is improved.

Further, for a scene in which the first media data is a video and the second media data is graphics context information, fig. 3 is a schematic flow chart provided in an embodiment of the disclosure, as shown in fig. 3, a category and a video content tag of the video in the video library may be obtained based on the video content understanding model, and a category and a graphics context tag of the graphics context information in the graphics library may be obtained based on the graphics context understanding model. Further, when the relevant image-text information needs to be recalled for a certain video, the category and the video content label of the video can be obtained, and category matching is performed, so that the image-text information with the same category can be determined. Next, a tag vectorization relevance calculation is performed, that is, the video content tag of the video is vectorized to obtain a first vector. Vectorizing the graphics content labels of the graphics information belonging to the same category to obtain a second vector. Based on the first vector and the second vector, a semantic relevance is calculated. The list of relevant teletext information may then be recalled, i.e. the target media data is selected based on the semantic relevance. Further, high-repetition content may be filtered. Then, based on timeliness and diversity, sorting is performed to return the teletext information of the relevance top5 (i.e., the top5 teletext information with the highest recommendation value). That is, based on the semantic relevance, the release time and the information richness, the first 5 pieces of graphics and text information with the maximum recommendation value are returned. Finally, the top1 teletext information (i.e., the teletext information with the highest recommended value) can be highlighted, and the top5 teletext information is displayed in a more viewable manner.

According to the cross-modal content matching method and device, cross-modal content matching is carried out based on the vector corresponding to the content tag, the relevant video can be recalled by the image-text information/the relevant image-text information can be recalled by the video more quickly, and data in different modalities can be provided for users more fully, so that the limitation of content forms is broken on product experience of the users, and fused data viewing experience is obtained. Meanwhile, the full utilization of content resources can be realized, and the exposure rate and the consumption rate of the long-tailed image-text information/long-tailed video in the network platform are improved.

Fig. 4 is a block diagram of a media data processing device according to an embodiment of the disclosure, and as shown in fig. 4, the device 20 may include:

a first obtaining module 201, configured to determine, according to the first vector and a second vector of each of the second media data, second media data related to the first media data as candidate media data;

a first determining module 202 configured to select, as target media data, second media data related to the first media data according to the first vector and a second vector of each of the second media data;

a recommendation module 203 configured to make a media data recommendation based on the candidate media data.

The media data processing apparatus provided by the embodiment of the present disclosure obtains a vector corresponding to a content tag of first media data as a first vector, and obtains a vector corresponding to a content tag of second media data as a second vector, where media types of the first media data and the second media data are different. And determining second media data related to the first media data according to the first vector and second vectors of the second media data to serve as candidate media data. And performing media data recommendation based on the candidate media data. Therefore, the related media data are acquired from the media data of other media types for recommendation, so that the recommended content is richer, and the recommendation effect can be improved to a certain extent.

Optionally, the recommending module 203 is specifically configured to:

and recommending the target media data.

Optionally, the recommending module 203 is further specifically configured to:

Optionally, the first determining module 202 is specifically configured to

Optionally, the apparatus 20 further includes:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

According to an embodiment of the present disclosure, there is provided an electronic apparatus including: a processor, a memory for storing processor executable instructions, wherein the processor is configured to perform the steps of the media data processing method as in any of the above embodiments when executed.

There is also provided, according to an embodiment of the present disclosure, a computer-readable storage medium, in which instructions are executed by a processor of an electronic device, so that the electronic device can perform the steps in the media data processing method as in any one of the above embodiments.

There is also provided, according to an embodiment of the present disclosure, a computer program product including readable program instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the steps of the media data processing method as in any one of the above embodiments.

Fig. 5 is a block diagram illustrating an apparatus for media data processing according to an example embodiment. For example, the apparatus 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, apparatus 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.

The processing component 702 generally controls overall operation of the device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 702 may include one or more processors 720 to execute instructions to perform all or part of the steps of the media data processing method described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operations at the apparatus 700. Examples of such data include instructions for any application or method operating on device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 706 provides power to the various components of the device 700. The power components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 700.

The multimedia component 708 includes a screen that provides an output interface between the device 700 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 700 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 710 is configured to output and/or input audio signals. For example, audio component 710 includes a Microphone (MIC) configured to receive external audio signals when apparatus 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, audio component 710 also includes a speaker for outputting audio signals.

The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects of the apparatus 700. For example, sensor assembly 714 may detect an open/closed state of device 700, the relative positioning of components, such as a display and keypad of device 700, sensor assembly 714 may also detect a change in position of device 700 or a component of device 700, the presence or absence of user contact with device 700, orientation or acceleration/deceleration of device 700, and a change in temperature of device 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate wired or wireless communication between the apparatus 700 and other devices. The apparatus 700 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 716 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described media data processing methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the device 700 to perform the media data processing method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 6 is a block diagram illustrating an apparatus for media data processing according to an example embodiment. For example, the apparatus 800 may be provided as a server. Referring to FIG. 6, the apparatus 800 includes a processing component 822, which further includes one or more processors, and memory resources, represented by memory 832, for storing instructions, such as applications, that are executable by the processing component 822. The application programs stored in memory 832 may include one or more modules that each correspond to a set of instructions. Further, the processing component 822 is configured to execute instructions to perform the above-described media data processing method.

The device 800 may also include a power component 826 configured to perform power management of the device 800, a wired or wireless network interface 850 configured to connect the device 800 to a network, and an input/output (I/O) interface 858. The apparatus 800 may operate based on an operating system stored in the memory 832, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of media data processing, the method comprising:

and recommending the media data based on the candidate media data.

2. The method of claim 1, wherein said making a media data recommendation based on said candidate media data comprises:

and recommending the target media data.

3. The method of claim 2, wherein the recommending the target media data comprises:

4. The method of claim 3, wherein recommending the determined target media data to the user based on the recommendation value of each of the target media data comprises:

5. The method according to any one of claims 1 to 4, wherein the determining second media data related to the first media data as candidate media data according to the first vector and a second vector of each of the second media data comprises:

6. The method of any of claims 1 to 4, further comprising:

acquiring the category of the first media data to serve as a first category;

7. An apparatus for media data processing, the apparatus comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the media data processing method of any of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, cause the electronic device to perform the media data processing method of any of claims 1 to 6.

10. A computer program product, characterized in that it comprises readable program instructions which, when executed by a processor of an electronic device, cause the electronic device to carry out the media data processing method according to any one of claims 1 to 6.