CN106445922B - Method and device for determining title of multimedia resource - Google Patents

Method and device for determining title of multimedia resource Download PDF

Info

Publication number
CN106445922B
CN106445922B CN201610881052.3A CN201610881052A CN106445922B CN 106445922 B CN106445922 B CN 106445922B CN 201610881052 A CN201610881052 A CN 201610881052A CN 106445922 B CN106445922 B CN 106445922B
Authority
CN
China
Prior art keywords
component
list
updated
component list
multimedia resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610881052.3A
Other languages
Chinese (zh)
Other versions
CN106445922A (en
Inventor
刘荣
赵磊
单明辉
王建宇
顾思斌
潘柏宇
王冀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Youku Network Technology Beijing Co Ltd
Original Assignee
Youku Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Youku Network Technology Beijing Co Ltd filed Critical Youku Network Technology Beijing Co Ltd
Priority to CN201610881052.3A priority Critical patent/CN106445922B/en
Publication of CN106445922A publication Critical patent/CN106445922A/en
Priority to PCT/CN2017/104410 priority patent/WO2018064959A1/en
Application granted granted Critical
Publication of CN106445922B publication Critical patent/CN106445922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to a method and a device for determining a title of a multimedia resource. The method comprises the following steps: acquiring user behavior data of a target user, and generating a first multimedia resource list according to the user behavior data; analyzing the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to a target user; analyzing an original title of the multimedia resource to be recommended to obtain a second component list corresponding to the original title; comparing each component in the second component list with each component in the first component list to obtain an updated second component list; and determining a new title of the multimedia resource to be recommended according to the updated second component list. According to the method and the device for determining the title of the multimedia resource, the personalized title can be determined for the target user, the user can be better attracted, and therefore the probability that the recommended multimedia resource is clicked can be improved.

Description

Method and device for determining title of multimedia resource
Technical Field
The present invention relates to the field of information technologies, and in particular, to a method and an apparatus for determining a title of a multimedia resource.
Background
In the internet era, especially in the mobile internet era, how to provide timely and valuable information for users is a hot spot of research of numerous internet companies. For example, when a user browses a video website, a video title is an important factor attracting the user to watch a video, and therefore, the video website often has a large number of operators to edit the video title. The video uploading person can also edit the video title so as to achieve the purpose of attracting the user to watch.
At present, the editing of the title of the multimedia resource such as video and the like depends on the operators and the uploaders of the website, a large amount of human resources are consumed, and the title of the multimedia resource edited by the operators and the uploaders of the website is preferred by the public and cannot meet the personalized requirements of a single user.
Disclosure of Invention
Technical problem
In view of the above, the technical problem to be solved by the present invention is that the existing method for determining the title of the multimedia resource consumes a lot of human resources and cannot meet the personalized requirements of the user.
Solution scheme
In order to solve the above technical problem, according to an embodiment of the present invention, there is provided a method of determining a title of a multimedia asset, including:
acquiring user behavior data of a target user, and generating a first multimedia resource list according to the user behavior data;
analyzing the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user;
analyzing an original title of a multimedia resource to be recommended to obtain a second component list corresponding to the original title;
comparing each component in the second component list with each component in the first component list to obtain an updated second component list;
and determining a new title of the multimedia resource to be recommended according to the updated second ingredient list.
In one possible implementation manner, the method for obtaining an updated second component list by comparing each component in the second component list with each component in the first component list includes:
calculating similarity of each component in the second component list and each component in the first component list;
replacing a component in the second component list with a component in the first component list if the similarity between the component in the second component list and the component in the first component list is greater than a first preset value;
an updated second component list is derived from all replaced components.
For the above method, in one possible implementation manner, the calculating a similarity between each component in the second component list and each component in the first component list includes:
determining a vector corresponding to each component in the second component list;
and respectively calculating the similarity of the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list.
In one possible implementation manner, the calculating a similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list includes:
calculating a vector corresponding to the ith component in the second component list by using equation 1
Figure BDA0001126992450000021
A vector corresponding to the m-th component in the first component list
Figure BDA0001126992450000031
Degree of similarity of
Figure BDA0001126992450000032
Figure BDA0001126992450000033
For the above method, in a possible implementation manner, determining a new title of the multimedia resource to be recommended according to the updated second component list includes:
calculating a score of the updated second component list;
and under the condition that the score of the updated second ingredient list is greater than a second preset value, determining a new title of the multimedia resource to be recommended according to the updated second ingredient list.
In one possible implementation manner, the calculating the score of the updated second component list includes:
and calculating the score of the updated second component list according to the probability of each component in the updated second component list appearing in a designated sample set.
In one possible implementation manner, the calculating a score of the updated second component list according to a probability that each component in the updated second component list appears in a designated sample set includes:
calculating a score s of the updated second component list using equation 2;
Figure BDA0001126992450000034
Figure BDA0001126992450000035
wherein n represents the number of components in the updated second component list, wjRepresents the jth component, w, of the updated second component listj-iRepresents the j-i th component, p (w), in the updated second component listjwj-i) Representing a probability that said jth component and said jth-i component co-occur in said given set of samples,p(wj-i) Representing the probability of the j-i component occurring in the specified set of samples.
For the above method, in one possible implementation, after calculating the score of the updated second component list, the method further includes:
and under the condition that the score of the updated second component list is less than or equal to the second preset value, reserving the original title of the multimedia resource to be recommended.
For the above method, in a possible implementation manner, parsing the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user includes:
analyzing the title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user;
taking the components with the occurrence times larger than a third preset value in the components related to the target user as the components corresponding to the target user;
and generating a first component list corresponding to the target user according to the component corresponding to the target user.
For the above method, in a possible implementation manner, acquiring user behavior data of a target user, and generating a first multimedia resource list according to the user behavior data includes:
collecting all user behavior data of the target user in a specified time period;
screening effective user behavior data from the collected user behavior data;
and sequencing the effective user behavior data according to the time corresponding to the effective user behavior data to obtain the first multimedia resource list.
In order to solve the above technical problem, according to another embodiment of the present invention, there is provided an apparatus for determining a title of a multimedia asset, including:
the acquisition module is used for acquiring user behavior data of a target user and generating a first multimedia resource list according to the user behavior data;
the first analysis module is used for analyzing the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user;
the second analysis module is used for analyzing the original title of the multimedia resource to be recommended to obtain a second component list corresponding to the original title;
a comparison module, configured to compare each component in the second component list with each component in the first component list to obtain an updated second component list;
and the determining module is used for determining a new title of the multimedia resource to be recommended according to the updated second ingredient list.
For the apparatus, in a possible implementation manner, the comparing module includes:
a similarity operator module for calculating the similarity between each component in the second component list and each component in the first component list;
a replacing submodule, configured to replace a component in the second component list with a component in the first component list when a similarity between a component in the second component list and a component in the first component list is greater than a first preset value;
and the updating submodule is used for obtaining an updated second ingredient list according to all the replaced ingredients.
For the above apparatus, in one possible implementation, the similarity operator module includes:
a vector determination unit configured to determine a vector corresponding to each component in the second component list;
and a similarity calculation unit configured to calculate a similarity between a vector corresponding to each component in the second component list and a vector corresponding to each component in the first component list.
For the apparatus described above, in one possible implementation manner, the similarity calculation unit is configured to:
calculating a vector corresponding to the ith component in the second component list by using equation 1
Figure BDA0001126992450000051
A vector corresponding to the m-th component in the first component list
Figure BDA0001126992450000052
Degree of similarity of
Figure BDA0001126992450000053
Figure BDA0001126992450000054
For the apparatus, in one possible implementation manner, the determining module includes:
a score calculation sub-module for calculating a score of the updated second component list;
and the determining submodule is used for determining a new title of the multimedia resource to be recommended according to the updated second ingredient list under the condition that the score of the updated second ingredient list is greater than a second preset value.
For the above apparatus, in one possible implementation, the score calculating sub-module is configured to:
and calculating the score of the updated second component list according to the probability of each component in the updated second component list appearing in a designated sample set.
For the above apparatus, in one possible implementation, the score calculating sub-module is configured to:
calculating a score s of the updated second component list using equation 2;
Figure BDA0001126992450000061
Figure BDA0001126992450000062
wherein n represents the number of components in the updated second component list, wjRepresents the jth component, w, of the updated second component listj-iRepresents the j-i th component, p (w), in the updated second component listjwj-i) Represents the probability that said jth component and said jth-i component co-occur in said given set of samples, p (w)j-i) Representing the probability of the j-i component occurring in the specified set of samples.
For the above apparatus, in one possible implementation manner, the apparatus further includes:
and the reserving module is used for reserving the original title of the multimedia resource to be recommended under the condition that the score of the updated second ingredient list is less than or equal to the second preset value.
For the apparatus, in a possible implementation manner, the first parsing module includes:
the analysis submodule is used for analyzing the title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user;
the component determining submodule is used for taking the components of which the occurrence times are greater than a third preset value in the components related to the target user as the components corresponding to the target user;
and the first component list generating submodule is used for generating a first component list corresponding to the target user according to the components corresponding to the target user.
For the above apparatus, in a possible implementation manner, the acquisition module includes:
the acquisition submodule is used for acquiring all user behavior data of the target user within a specified time period;
the screening submodule is used for screening effective user behavior data from the collected user behavior data;
and the sequencing submodule is used for sequencing the effective user behavior data according to the time corresponding to the effective user behavior data to obtain the first multimedia resource list.
Advantageous effects
The method and the device for determining the title of the multimedia resource can determine the personalized title aiming at the target user, can attract the user better, and can improve the probability of clicking the recommended multimedia resource.
Other features and aspects of the present invention will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the invention and, together with the description, serve to explain the principles of the invention.
Fig. 1 shows a flow chart of an implementation of a method of determining a title of a multimedia asset according to an embodiment of the invention;
fig. 2 shows a flowchart of an exemplary implementation of step S104 of the method of determining a title of a multimedia asset according to an embodiment of the present invention;
FIG. 3 shows a flowchart of an exemplary implementation of step S301 of a method for determining a title of a multimedia asset according to an embodiment of the present invention;
fig. 4 shows a flowchart of an exemplary implementation of step S105 of the method of determining a title of a multimedia asset according to an embodiment of the present invention;
FIG. 5 shows a flowchart of an exemplary implementation of step S102 of a method for determining a title of a multimedia asset according to an embodiment of the present invention;
FIG. 6 shows a flowchart of an exemplary implementation of step S101 of a method of determining a title of a multimedia asset according to an embodiment of the present invention;
fig. 7 is a block diagram illustrating a structure of an apparatus for determining a title of a multimedia asset according to another embodiment of the present invention;
fig. 8 is a flowchart illustrating an exemplary implementation of a block diagram of a structure of an apparatus for determining a title of a multimedia asset according to another embodiment of the present invention;
fig. 9 is a block diagram showing a structure of an apparatus for determining a title of a multimedia asset according to another embodiment of the present invention.
Detailed Description
Various exemplary embodiments, features and aspects of the present invention will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, methods, procedures, components, and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present invention.
Example 1
Fig. 1 shows a flowchart of an implementation of a method of determining a title of a multimedia asset according to an embodiment of the present invention. The execution subject of this embodiment may be a server, or may be other devices for determining the title of the multimedia resource, and is not limited herein. As shown in fig. 1, the method mainly includes:
in step S101, user behavior data of a target user is collected, and a first multimedia resource list is generated according to the user behavior data.
The multimedia may be a combination of various media, for example, various media forms including text, sound, and image. For example, the multimedia resource may be a video, and is not limited herein. The user behavior data of the target user may include, but is not limited to, at least one of: the target user watches the data of the multimedia resources, the data of the target user for commenting the multimedia resources, the data of the target user for subscribing the multimedia resources and the data of the target user for stepping on the multimedia resources. In this embodiment, the first multimedia resource list may be generated according to a multimedia resource corresponding to user behavior data of the target user. For example, the first multimedia resource list corresponding to the target user may be denoted as LU ═ { v1, v2, …, vn }.
In step S102, the titles of the multimedia resources in the first multimedia resource list are analyzed to obtain a first component list corresponding to the target user.
As an example of this embodiment, an NER (Named Entity Recognition) technology may be adopted to parse the titles of the multimedia resources in the first multimedia resource list to obtain a first component list corresponding to the target user.
In step S103, the original title of the multimedia resource to be recommended is analyzed to obtain a second component list corresponding to the original title.
As an example of this embodiment, the original titles of the multimedia resources to be recommended in the list of multimedia resources to be recommended may be respectively analyzed, so as to obtain the second component lists corresponding to the original titles. For example, the NER technique may be used to parse an original title of the multimedia resource to be recommended to obtain a second component list corresponding to the original title.
In step S104, each component in the second component list is compared with each component in the first component list to obtain an updated second component list.
As an example of this embodiment, each component in the second component list may be compared with each component in the first component list, respectively, to replace the component in the second component list with the component in the first component list.
In step S105, a new title of the multimedia resource to be recommended is determined according to the updated second component list.
For example, the original title of the multimedia resource to be recommended is "tortoise gnaws the toe of a sleeping kitten", and the new title is "tortoise gnaws the toe of a sleeping cat star! ".
In the embodiment, each component in the second component list corresponding to the original title of the multimedia resource to be recommended is compared with each component in the first component list corresponding to the target user to obtain an updated second component list, so that a new title of the multimedia resource to be recommended is determined, a personalized title can be determined for the target user, the user can be better attracted, and the probability that the recommended multimedia resource is clicked can be improved; the titles of the multimedia resources do not need to be modified manually, and the labor cost is greatly saved.
Fig. 2 shows a flowchart of an exemplary implementation of step S104 of the method for determining a title of a multimedia asset according to an embodiment of the present invention. As shown in fig. 2, comparing each component in the second component list with each component in the first component list to obtain an updated second component list, includes:
in step S201, the similarity between each component in the second component list and each component in the first component list is calculated.
For example, the similarity between components may be determined by calculating the similarity between vectors to which the components correspond. One skilled in the art will appreciate that the similarity between the components may also be measured by other parameters of the components, and is not limited herein.
In step S202, in the case where the similarity between an element in the second component list and an element in the first component list is greater than a first preset value, an element in the second component list is replaced with an element in the first component list.
For example, the first preset value may be 0.9. For example, if one component in the second component list is "cat," one component in the first component list is "meow star," and the similarity between the "cat" and the "meow star" is 0.95, the "cat" in the second component list may be replaced with the "meow star" in the first component list.
In this example, when the similarity between a component in the second component list and a component in the first component list is greater than a first preset value, the component in the second component list is replaced with a component in the first component list, so that semantic consistency can be ensured.
In step S203, an updated second component list is obtained from all the replaced components.
Fig. 3 shows a flowchart of an exemplary implementation of step S201 of the method for determining a title of a multimedia asset according to an embodiment of the present invention. As shown in fig. 3, calculating the similarity between each component in the second component list and each component in the first component list includes:
in step S301, a vector corresponding to each component in the second component list is determined.
As an example of this embodiment, word2vec may be used to determine a vector corresponding to each component in the second component list and a vector corresponding to each component in the first component list.
In step S302, the similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list is calculated.
For example, the cosine distance between the vectors corresponding to two components may be determined as the similarity of the two components.
In one possible implementation manner, calculating the similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list respectively includes: calculating a vector corresponding to the I component in the second component list by using equation 1
Figure BDA0001126992450000111
Vector corresponding to mth component in first component list
Figure BDA0001126992450000112
Degree of similarity of
Figure BDA0001126992450000113
Figure BDA0001126992450000114
Fig. 4 shows a flowchart of an exemplary implementation of step S105 of the method for determining a title of a multimedia asset according to an embodiment of the present invention. As shown in fig. 4, determining a new title of the multimedia resource to be recommended according to the updated second component list includes:
in step S401, the score of the updated second component list is calculated.
In step S402, in the case that the score of the updated second component list is greater than the second preset value, a new title of the multimedia resource to be recommended is determined according to the updated second component list.
In this example, in the case that the score of the updated second component list is greater than the second preset value, a new title of the multimedia resource to be recommended is determined according to the updated second component list, so as to ensure the language relevance between the front and rear components of the new title. The second preset value can be set according to experience of a person skilled in the art, and is not limited herein.
In one possible implementation, after calculating the score of the updated second component list, the method further comprises: and under the condition that the score of the updated second ingredient list is less than or equal to a second preset value, keeping the original title of the multimedia resource to be recommended. In this implementation manner, in the case that the score of the updated second component list is less than or equal to the second preset value, the original title of the multimedia resource to be recommended is retained, so as to ensure the language relevance between the front and rear components of the title.
In one possible implementation, calculating a score of the updated second component list includes: and calculating the score of the updated second component list according to the probability of each component in the updated second component list appearing in the designated sample set.
For example, the specified sample set may be determined according to the titles of all multimedia resources in the multimedia resource list to be recommended, or may be determined according to the titles of all multimedia resources in other specified multimedia resource lists, which is not limited herein.
In one possible implementation, calculating a score of the updated second component list according to a probability that each component in the updated second component list appears in the designated sample set includes:
calculating a score s of the updated second component list using equation 2;
Figure BDA0001126992450000121
Figure BDA0001126992450000122
where n represents the number of components in the updated second component list, wjRepresents the jth component, w, in the updated second component listj-iRepresents the j-i th component, p (w), in the updated second component listjwj-i) Denotes the probability that the jth component and the jth-i components co-occur in a given sample set, p (w)j-i) Representing the probability of the j-i component occurring in the given sample set.
Fig. 5 shows a flowchart of an exemplary implementation of step S102 of the method for determining a title of a multimedia asset according to an embodiment of the present invention. As shown in fig. 5, parsing the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user includes:
in step S501, the titles of the multimedia resources in the first multimedia resource list are analyzed to obtain components related to the target user.
As an example of this embodiment, the NER technique may be adopted to analyze the titles of the multimedia resources in the first multimedia resource list, respectively, to obtain components corresponding to the titles of the multimedia resources. Wherein the components may include one or more of physical words (e.g., "dog," "mars intelligence bureau"), emotional words (e.g., "nice," laughter not tendered "), and emotional punctuation (e.g.," | "). The entity words may include one or more of names of people, places, organizations and proper nouns.
In step S502, a component whose occurrence number is greater than a third preset value among the components related to the target user is taken as a component corresponding to the target user.
For example, the third preset value may be 2. In this example, by setting a third preset value, a component whose number of occurrences is greater than the third preset value among the components related to the target user is taken as a component corresponding to the target user, and filtering a component whose number of occurrences is less than or equal to the third preset value among the components related to the target user, it is possible to reduce the influence of noise on the component corresponding to the target user.
In step S503, a first component list corresponding to the target user is generated from the component corresponding to the target user.
For example, the first component list corresponding to the target user may be represented as { NE1, NE2, …, NEn }, where NE1, NE2, …, NEn represent the respective components corresponding to the target user.
Fig. 6 shows a flowchart of an exemplary implementation of step S101 of the method for determining a title of a multimedia asset according to an embodiment of the present invention. As shown in fig. 6, acquiring user behavior data of a target user, and generating a first multimedia resource list according to the user behavior data includes:
in step S601, all user behavior data of the target user within a specified time period is collected.
For example, all user behavior data for a target user within 1 month, 3 months, or half a year may be collected.
In step S602, effective user behavior data is screened out from the collected user behavior data.
For example, the user behavior data of repeatedly viewing the multimedia asset may be determined as invalid user behavior data, or the user behavior data of a small completion rate of viewing the multimedia asset may be determined as invalid user behavior data, which is not limited herein.
In step S603, the valid user behavior data is sorted according to the time corresponding to the valid user behavior data, so as to obtain a first multimedia resource list.
The time corresponding to the valid user behavior data may be the occurrence time of the valid user behavior data. The sorting of the effective user behavior data according to the time corresponding to the effective user behavior data may be: and sequencing the effective user behavior data according to the time sequence of the effective user behavior data from near to far.
In a possible implementation manner, the list of multimedia resources to be recommended may be filtered, so that the multimedia resources to be recommended have diversity: the method comprises the steps of uploading information of the multimedia resource to be recommended, channel information of the multimedia resource to be recommended, data of the multimedia resource watched by a target user and an interest tag of the target user. For example, if the to-be-recommended multimedia resource list includes more than four multimedia resources uploaded by the same uploader, the multimedia resource with the top three click rate in the multimedia resources uploaded by the uploader may be reserved as the to-be-recommended multimedia resource. For another example, if the to-be-recommended multimedia resource list includes more than four multimedia resources of the same secondary channel, the multimedia resources with the top three click volumes in the multimedia resources of the secondary channel may be reserved as the to-be-recommended multimedia resources. For example, the hedonic channel is a primary channel, and the hunan hedonic channel is a secondary channel below the primary channel. For another example, if the to-be-recommended multimedia resource list includes more than four multimedia resources under the same three-level interest tag, the multimedia resource with the click rate ranked three in the multimedia resources under the three-level interest tag may be reserved as the to-be-recommended multimedia resource. For example, the primary interest tag is entertainment, the entertainment star is the secondary interest tag under the primary interest tag, and Beyond is the tertiary interest tag under the secondary interest tag. For another example, if the multimedia resource list to be recommended includes the multimedia resources recently viewed by the target user, the multimedia resources are not regarded as the multimedia resources to be recommended.
In this way, each component in the second component list corresponding to the original title of the multimedia resource to be recommended is compared with each component in the first component list corresponding to the target user to obtain an updated second component list, so that a new title of the multimedia resource to be recommended is determined.
Example 2
Fig. 7 is a block diagram illustrating a structure of an apparatus for determining a title of a multimedia asset according to another embodiment of the present invention. The apparatus shown in fig. 7 may be used to perform the method of determining a title of a multimedia asset shown in fig. 1 to 6. For convenience of explanation, only a part related to the present embodiment is shown in fig. 7.
As shown in fig. 7, the apparatus includes: the acquisition module 71 is configured to acquire user behavior data of a target user, and generate a first multimedia resource list according to the user behavior data; a first parsing module 72, configured to parse the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user; the second parsing module 73 is configured to parse an original title of a multimedia resource to be recommended to obtain a second component list corresponding to the original title; a comparison module 74, configured to compare each component in the second component list with each component in the first component list to obtain an updated second component list; a determining module 75, configured to determine a new title of the multimedia resource to be recommended according to the updated second component list.
Fig. 8 is a flowchart illustrating an exemplary implementation of a block diagram of a structure of an apparatus for determining a title of a multimedia asset according to another embodiment of the present invention. The apparatus shown in fig. 8 may be used to perform the method of determining a title of a multimedia asset shown in fig. 1 to 6. For convenience of explanation, only a part related to the present embodiment is shown in fig. 8. Components in fig. 8 that are numbered the same as those in fig. 7 have the same functions, and detailed descriptions of these components are omitted for the sake of brevity.
In one possible implementation, the comparing module 74 includes: a similarity operator module 741 configured to calculate similarities between the components in the second component list and the components in the first component list; a replacing submodule 742, configured to replace a component in the second component list with a component in the first component list if a similarity between the component in the second component list and the component in the first component list is greater than a first preset value; an update submodule 743 is used to obtain an updated second component list from all replaced components.
In one possible implementation, the similarity operator module 741 includes: a vector determination unit configured to determine a vector corresponding to each component in the second component list; and a similarity calculation unit configured to calculate a similarity between a vector corresponding to each component in the second component list and a vector corresponding to each component in the first component list.
In a possible implementation manner, the similarity calculation unit is configured to: calculating a vector corresponding to the ith component in the second component list by using equation 1
Figure BDA0001126992450000161
A vector corresponding to the m-th component in the first component list
Figure BDA0001126992450000162
Degree of similarity of
Figure BDA0001126992450000163
Figure BDA0001126992450000164
In one possible implementation, the determining module 75 includes: a score calculating sub-module 751 for calculating a score of the updated second component list; the determining submodule 752 is configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended when the score of the updated second component list is greater than a second preset value.
In one possible implementation, the score calculating sub-module 751 is configured to: and calculating the score of the updated second component list according to the probability of each component in the updated second component list appearing in a designated sample set.
In one possible implementation, the score calculating sub-module 751 is configured to: calculating a score s of the updated second component list using equation 2;
Figure BDA0001126992450000171
Figure BDA0001126992450000172
wherein n represents the number of components in the updated second component list, wjRepresents the jth component, w, of the updated second component listj-iRepresents the j-i th component, p (w), in the updated second component listjwj-i) Represents the probability that said jth component and said jth-i component co-occur in said given set of samples, p (w)j-i) Representing the probability of the j-i component occurring in the specified set of samples.
In one possible implementation, the apparatus further includes: a reserving module 76, configured to reserve the original title of the multimedia resource to be recommended when the score of the updated second component list is less than or equal to the second preset value.
In one possible implementation manner, the first parsing module 72 includes: the parsing sub-module 721 is configured to parse the titles of the multimedia resources in the first multimedia resource list to obtain components related to the target user; the component determining submodule 722 is used for taking a component of which the occurrence frequency is greater than a third preset value in the components related to the target user as a component corresponding to the target user; the first component list generating sub-module 723 is configured to generate a first component list corresponding to the target user according to the component corresponding to the target user.
In one possible implementation, the acquisition module 71 includes: the acquisition submodule 711 is configured to acquire all user behavior data of the target user within a specified time period; a screening submodule 712, configured to screen effective user behavior data from the collected user behavior data; the sorting submodule 713 is configured to sort the effective user behavior data according to the time corresponding to the effective user behavior data, so as to obtain the first multimedia resource list.
It should be noted that, in this way, by comparing each component in the second component list corresponding to the original title of the multimedia resource to be recommended with each component in the first component list corresponding to the target user, an updated second component list is obtained, and thus a new title of the multimedia resource to be recommended is determined.
Example 3
Fig. 9 is a block diagram showing a structure of an apparatus for determining a title of a multimedia asset according to another embodiment of the present invention. The apparatus 1100 for determining the title of a multimedia asset may be a host server with computing capabilities, a personal computer PC, or a portable computer or terminal that can be carried, etc. The specific embodiments of the present invention do not limit the specific implementation of the compute node.
The apparatus 1100 for determining a title of a multimedia asset includes a processor (processor)1110, a communication Interface (Communications Interface)1120, a memory 1130, and a bus 1140. The processor 1110, the communication interface 1120, and the memory 1130 communicate with each other via the bus 1140.
The communication interface 1120 is used to communicate with network devices, including, for example, virtual machine management centers, shared storage, and the like.
Processor 1110 is configured to execute programs. Processor 1110 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The memory 1130 is used to store files. The memory 1130 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1130 may also be a memory array. The storage 1130 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules.
In one possible embodiment, the program may be a program code including computer operation instructions. The procedure is particularly useful for: the operations of the steps in example 1 were carried out.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may select different ways to implement the described functionality for specific applications, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
If the described functionality is implemented in the form of computer software and sold or used as a stand-alone product, it is to some extent possible to consider all or part of the technical solution of the invention (for example, the part contributing to the prior art) to be embodied in the form of a computer software product. The computer software product is generally stored in a non-volatile storage medium readable by a computer and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the methods according to the embodiments of the present invention. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (18)

1. A method of determining a title of a multimedia asset, comprising:
acquiring user behavior data of a target user, and generating a first multimedia resource list according to the user behavior data;
analyzing the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user;
analyzing an original title of a multimedia resource to be recommended to obtain a second component list corresponding to the original title;
comparing each component in the second component list with each component in the first component list to obtain an updated second component list; the comparing each component in the second component list with each component in the first component list to obtain an updated second component list, comprising: calculating similarity of each component in the second component list and each component in the first component list; replacing a component in the second component list with a component in the first component list if the similarity between the component in the second component list and the component in the first component list is greater than a first preset value; obtaining an updated second component list from all replaced components;
and determining a new title of the multimedia resource to be recommended according to the updated second ingredient list.
2. The method of claim 1, wherein calculating a similarity of each component in the second list of components to each component in the first list of components comprises:
determining a vector corresponding to each component in the second component list;
and respectively calculating the similarity of the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list.
3. The method of claim 2, wherein separately calculating the similarity of the vector corresponding to each component in the second component list to the vector corresponding to each component in the first component list comprises:
calculating a vector corresponding to the ith component in the second component list by using equation 1
Figure FDA0002248391480000011
A vector corresponding to the m-th component in the first component list
Figure FDA0002248391480000021
Degree of similarity of
Figure FDA0002248391480000022
Figure FDA0002248391480000023
4. The method of claim 1, wherein determining a new title of the multimedia resource to be recommended according to the updated second component list comprises:
calculating a score of the updated second component list;
and under the condition that the score of the updated second ingredient list is greater than a second preset value, determining a new title of the multimedia resource to be recommended according to the updated second ingredient list.
5. The method of claim 4, wherein calculating a score for the updated second component list comprises:
and calculating the score of the updated second component list according to the probability of each component in the updated second component list appearing in a designated sample set.
6. The method of claim 5, wherein calculating a score for the updated second component list based on a probability of each component in the updated second component list occurring in a given sample set comprises:
calculating a score s of the updated second component list using equation 2;
Figure FDA0002248391480000024
Figure FDA0002248391480000025
wherein n represents the number of components in the updated second component list, wjRepresents the jth component, w, of the updated second component listj-iRepresents the j-i th component, p (w), in the updated second component listjwj-i) Represents the probability that said jth component and said jth-i component co-occur in said given set of samples, p (w)j-i) Representing the probability of the j-i component occurring in the specified set of samples.
7. The method according to any of claims 4 to 6, wherein after calculating the score of the updated second component list, the method further comprises:
and under the condition that the score of the updated second component list is less than or equal to the second preset value, reserving the original title of the multimedia resource to be recommended.
8. The method of claim 1, wherein parsing the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user comprises:
analyzing the title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user;
taking the components with the occurrence times larger than a third preset value in the components related to the target user as the components corresponding to the target user;
and generating a first component list corresponding to the target user according to the component corresponding to the target user.
9. The method of claim 1, wherein collecting user behavior data of a target user, and generating a first multimedia resource list according to the user behavior data comprises:
collecting all user behavior data of the target user in a specified time period;
screening effective user behavior data from the collected user behavior data;
and sequencing the effective user behavior data according to the time corresponding to the effective user behavior data to obtain the first multimedia resource list.
10. An apparatus for determining a title of a multimedia asset, comprising:
the acquisition module is used for acquiring user behavior data of a target user and generating a first multimedia resource list according to the user behavior data;
the first analysis module is used for analyzing the title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user;
the second analysis module is used for analyzing the original title of the multimedia resource to be recommended to obtain a second component list corresponding to the original title;
a comparison module, configured to compare each component in the second component list with each component in the first component list to obtain an updated second component list; the comparison module comprises: a similarity operator module for calculating the similarity between each component in the second component list and each component in the first component list; a replacing submodule, configured to replace a component in the second component list with a component in the first component list when a similarity between a component in the second component list and a component in the first component list is greater than a first preset value; the updating submodule is used for obtaining an updated second ingredient list according to all replaced ingredients;
and the determining module is used for determining a new title of the multimedia resource to be recommended according to the updated second ingredient list.
11. The apparatus of claim 10, wherein the similarity operator module comprises:
a vector determination unit configured to determine a vector corresponding to each component in the second component list;
and a similarity calculation unit configured to calculate a similarity between a vector corresponding to each component in the second component list and a vector corresponding to each component in the first component list.
12. The apparatus of claim 11, wherein the similarity calculation unit is configured to:
calculating a vector corresponding to the ith component in the second component list by using equation 1
Figure FDA0002248391480000041
And the first componentVector corresponding to m-th component in the sublist
Figure FDA0002248391480000042
Degree of similarity of
Figure FDA0002248391480000043
Figure FDA0002248391480000044
13. The apparatus of claim 10, wherein the determining module comprises:
a score calculation sub-module for calculating a score of the updated second component list;
and the determining submodule is used for determining a new title of the multimedia resource to be recommended according to the updated second ingredient list under the condition that the score of the updated second ingredient list is greater than a second preset value.
14. The apparatus of claim 13, wherein the score computation sub-module is configured to:
and calculating the score of the updated second component list according to the probability of each component in the updated second component list appearing in a designated sample set.
15. The apparatus of claim 14, wherein the score computation sub-module is configured to:
calculating a score s of the updated second component list using equation 2;
Figure FDA0002248391480000052
wherein n represents the number of components in the updated second component list, wjRepresents the jth component, w, of the updated second component listj-iRepresents the j-i th component, p (w), in the updated second component listjwj-i) Represents the probability that said jth component and said jth-i component co-occur in said given set of samples, p (w)j-i) Representing the probability of the j-i component occurring in the specified set of samples.
16. The apparatus of any one of claims 13 to 15, further comprising:
and the reserving module is used for reserving the original title of the multimedia resource to be recommended under the condition that the score of the updated second ingredient list is less than or equal to the second preset value.
17. The apparatus of claim 10, wherein the first parsing module comprises:
the analysis submodule is used for analyzing the title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user;
the component determining submodule is used for taking the components of which the occurrence times are greater than a third preset value in the components related to the target user as the components corresponding to the target user;
and the first component list generating submodule is used for generating a first component list corresponding to the target user according to the components corresponding to the target user.
18. The apparatus of claim 10, wherein the acquisition module comprises:
the acquisition submodule is used for acquiring all user behavior data of the target user within a specified time period;
the screening submodule is used for screening effective user behavior data from the collected user behavior data;
and the sequencing submodule is used for sequencing the effective user behavior data according to the time corresponding to the effective user behavior data to obtain the first multimedia resource list.
CN201610881052.3A 2016-10-09 2016-10-09 Method and device for determining title of multimedia resource Active CN106445922B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610881052.3A CN106445922B (en) 2016-10-09 2016-10-09 Method and device for determining title of multimedia resource
PCT/CN2017/104410 WO2018064959A1 (en) 2016-10-09 2017-09-29 Method and device for determining title of multimedia resource

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610881052.3A CN106445922B (en) 2016-10-09 2016-10-09 Method and device for determining title of multimedia resource

Publications (2)

Publication Number Publication Date
CN106445922A CN106445922A (en) 2017-02-22
CN106445922B true CN106445922B (en) 2020-02-18

Family

ID=58173116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610881052.3A Active CN106445922B (en) 2016-10-09 2016-10-09 Method and device for determining title of multimedia resource

Country Status (2)

Country Link
CN (1) CN106445922B (en)
WO (1) WO2018064959A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445922B (en) * 2016-10-09 2020-02-18 合一网络技术(北京)有限公司 Method and device for determining title of multimedia resource
CN111401046B (en) * 2020-04-13 2023-09-29 贝壳技术有限公司 House source title generation method and device, storage medium and electronic equipment
CN113742567B (en) * 2020-05-29 2023-08-22 北京达佳互联信息技术有限公司 Recommendation method and device for multimedia resources, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103594A (en) * 2009-12-22 2011-06-22 北京大学 Character data recognition and processing method and device
CN103324729A (en) * 2013-06-27 2013-09-25 北京小米科技有限责任公司 Method and device for recommending multimedia resources
CN103544264A (en) * 2013-10-17 2014-01-29 常熟市华安电子工程有限公司 Commodity title optimizing tool
CN105930532A (en) * 2016-06-16 2016-09-07 上海聚力传媒技术有限公司 Method and device of recommending multimedia resources to user

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6978277B2 (en) * 1989-10-26 2005-12-20 Encyclopaedia Britannica, Inc. Multimedia search system
US7788084B2 (en) * 2006-09-19 2010-08-31 Xerox Corporation Labeling of work of art titles in text for natural language processing
CN101604310A (en) * 2008-06-10 2009-12-16 宏碁股份有限公司 According to the user to the preference for relative titles managing articles
US8140567B2 (en) * 2010-04-13 2012-03-20 Microsoft Corporation Measuring entity extraction complexity
CN106445922B (en) * 2016-10-09 2020-02-18 合一网络技术(北京)有限公司 Method and device for determining title of multimedia resource

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103594A (en) * 2009-12-22 2011-06-22 北京大学 Character data recognition and processing method and device
CN103324729A (en) * 2013-06-27 2013-09-25 北京小米科技有限责任公司 Method and device for recommending multimedia resources
CN103544264A (en) * 2013-10-17 2014-01-29 常熟市华安电子工程有限公司 Commodity title optimizing tool
CN105930532A (en) * 2016-06-16 2016-09-07 上海聚力传媒技术有限公司 Method and device of recommending multimedia resources to user

Also Published As

Publication number Publication date
WO2018064959A1 (en) 2018-04-12
CN106445922A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN109885770B (en) Information recommendation method and device, electronic equipment and storage medium
CN106326391B (en) Multimedia resource recommendation method and device
CN106331778B (en) Video recommendation method and device
CN108108821A (en) Model training method and device
US9467744B2 (en) Comment-based media classification
US9454528B2 (en) Method and system for creating ordered reading lists from unstructured document sets
CN109511015B (en) Multimedia resource recommendation method, device, storage medium and equipment
CN109753601B (en) Method and device for determining click rate of recommended information and electronic equipment
US20100312767A1 (en) Information Process Apparatus, Information Process Method, and Program
CN102265276A (en) Context-based recommender system
KR100970335B1 (en) Method for updating interest keyword of user and system for executing the method
CN106168980A (en) Multimedia resource recommends sort method and device
Sisodia et al. Fast prediction of web user browsing behaviours using most interesting patterns
JP4896132B2 (en) Information retrieval method and apparatus reflecting information value
CN110895586A (en) Method and device for generating news page, computer equipment and storage medium
CN106445922B (en) Method and device for determining title of multimedia resource
CN110175264A (en) Construction method, server and the computer readable storage medium of video user portrait
Chiny et al. Netflix recommendation system based on TF-IDF and cosine similarity algorithms
CN107506459A (en) A kind of film recommendation method based on film similarity
Wang et al. CROWN: a context-aware recommender for web news
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
CN113961823B (en) News recommendation method, system, storage medium and equipment
CN109063080B (en) Video recommendation method and device
KR102028356B1 (en) Advertisement recommendation apparatus and method based on comments
CN110188277A (en) A kind of recommended method and device of resource

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee after: Youku network technology (Beijing) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee before: 1VERGE INTERNET TECHNOLOGY (BEIJING) Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200618

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 100080 Beijing Haidian District city Haidian street A Sinosteel International Plaza No. 8 block 5 layer A, C

Patentee before: Youku network technology (Beijing) Co.,Ltd.