WO2018064959A1 - Method and device for determining title of multimedia resource - Google Patents

Method and device for determining title of multimedia resource Download PDF

Info

Publication number
WO2018064959A1
WO2018064959A1 PCT/CN2017/104410 CN2017104410W WO2018064959A1 WO 2018064959 A1 WO2018064959 A1 WO 2018064959A1 CN 2017104410 W CN2017104410 W CN 2017104410W WO 2018064959 A1 WO2018064959 A1 WO 2018064959A1
Authority
WO
WIPO (PCT)
Prior art keywords
component
list
updated
multimedia resource
component list
Prior art date
Application number
PCT/CN2017/104410
Other languages
French (fr)
Chinese (zh)
Inventor
刘荣
赵磊
单明辉
王建宇
顾思斌
潘柏宇
王冀
Original Assignee
优酷网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 优酷网络技术(北京)有限公司 filed Critical 优酷网络技术(北京)有限公司
Publication of WO2018064959A1 publication Critical patent/WO2018064959A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present disclosure relates to the field of information technology, and in particular, to a method and apparatus for determining a title of a multimedia resource.
  • the editing of the title of multimedia resources such as video depends on the operator and uploader of the website, which consumes a lot of human resources, and the title of the multimedia resource edited by the website operator and the uploader is aimed at the public and cannot satisfy a single single. User's personalized needs.
  • the technical problem to be solved by the present disclosure is that the existing method of determining the title of the multimedia resource consumes a large amount of human resources and cannot satisfy the personalized needs of the user.
  • a method for determining a title of a multimedia resource including:
  • comparing each component in the second component list with each component in the first component list to obtain an updated second component list including:
  • An updated list of second ingredients is obtained based on all of the replaced ingredients.
  • calculating the similarity between each component in the second component list and each component in the first component list includes:
  • the similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list is separately calculated.
  • calculating the similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list, respectively includes:
  • determining, according to the updated second component list, the new title of the multimedia resource to be recommended including:
  • the new title of the to-be-recommended multimedia resource is determined according to the updated second component list.
  • calculating the score of the updated second component list includes:
  • a score of the updated second component list is calculated based on a probability that each component in the updated second component list appears in the specified sample set.
  • calculating the score of the updated second component list according to the probability that each component in the updated second component list appears in the specified sample set includes:
  • n represents the number of components in the updated second component list
  • w j represents the jth component in the updated second component list
  • w ji represents the updated second component list
  • p(w j w ji ) represents a probability that the jth component and the jith component coexist in the specified sample set
  • p(w ji ) represents the ji The probability that a component will appear in the specified set of samples.
  • the method further includes:
  • the header of each multimedia resource in the first multimedia resource list is parsed to obtain a first component list corresponding to the target user, including:
  • a component having a number of occurrences greater than a third preset value among the components related to the target user is used as a component corresponding to the target user;
  • the user behavior data of the target user is collected, and the first multimedia resource list is generated according to the user behavior data, including:
  • an apparatus for determining a title of a multimedia resource including:
  • An acquisition module configured to collect user behavior data of the target user, and generate a first multimedia resource list according to the user behavior data
  • a first parsing module configured to parse a title of each multimedia resource in the first multimedia resource list, to obtain a first component list corresponding to the target user
  • a second parsing module configured to parse an original title of the recommended multimedia resource, to obtain a second component list corresponding to the original title
  • a comparison module configured to compare each component in the second component list with each component in the first component list to obtain an updated second component list
  • a determining module configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended.
  • the comparison module includes:
  • a similarity calculation submodule configured to calculate a similarity between each component in the second component list and each component in the first component list
  • a replacement submodule configured to: in a second component list, if a similarity between a component in the second component list and a component in the first component list is greater than a first preset value Substituting a component with a component of the first component list;
  • An update sub-module for obtaining an updated second component list based on all of the replaced components is provided.
  • the similarity calculation submodule includes:
  • a vector determining unit configured to determine a vector corresponding to each component in the second component list
  • the similarity calculation unit is configured to separately calculate similarities between vectors corresponding to the respective components in the second component list and vectors corresponding to the respective components in the first component list.
  • the similarity calculation unit is configured to:
  • the determining module includes:
  • a score calculation sub-module configured to calculate a score of the updated second component list
  • a determining submodule configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended, if the score of the updated second component list is greater than a second preset value.
  • the score calculation sub-module is used to:
  • a score of the updated second component list is calculated based on a probability that each component in the updated second component list appears in the specified sample set.
  • the score calculation sub-module is used to:
  • n represents the number of components in the updated second component list
  • w j represents the jth component in the updated second component list
  • w ji represents the updated second component list
  • p(w j w ji ) represents a probability that the jth component and the jith component coexist in the specified sample set
  • p(w ji ) represents the ji The probability that a component will appear in the specified set of samples.
  • the device further includes:
  • a retaining module configured to retain the original title of the multimedia resource to be recommended if the score of the updated second component list is less than or equal to the second preset value.
  • the first parsing module includes:
  • a parsing sub-module configured to parse a title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user
  • the components of the three preset values are the components corresponding to the target user;
  • the first component list generating submodule is configured to generate a first component list corresponding to the target user according to the component corresponding to the target user.
  • the acquiring module includes:
  • a collection submodule configured to collect all user behavior data of the target user within a specified time period
  • a screening sub-module for filtering valid user behavior data from the collected user behavior data
  • the sorting sub-module is configured to sort the valid user behavior data according to the time corresponding to the valid user behavior data to obtain the first multimedia resource list.
  • the updated second component list is obtained by comparing each component in the second component list corresponding to the original title of the multimedia resource to be recommended with each component in the first component list corresponding to the target user, thereby determining the multimedia resource to be recommended.
  • the new title, the method and apparatus for determining the title of the multimedia resource according to the embodiment of the present disclosure can determine the personalized title for the target user, and can better attract the user, thereby improving the probability that the recommended multimedia resource is clicked.
  • FIG. 1 illustrates an implementation flowchart of a method for determining a title of a multimedia resource according to an embodiment of the present disclosure
  • FIG. 2 illustrates a method step S104 of determining a title of a multimedia resource according to an embodiment of the present disclosure.
  • FIG. 3 illustrates an exemplary implementation flowchart of a method step S301 of determining a title of a multimedia resource according to an embodiment of the present disclosure
  • FIG. 4 illustrates an exemplary implementation flowchart of a method step S105 of determining a title of a multimedia resource, according to an embodiment of the present disclosure
  • FIG. 5 illustrates an exemplary implementation flowchart of a method step S102 of determining a title of a multimedia resource according to an embodiment of the present disclosure
  • FIG. 6 illustrates an exemplary implementation flowchart of a method step S101 of determining a title of a multimedia resource according to an embodiment of the present disclosure
  • FIG. 7 illustrates a structural block diagram of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure
  • FIG. 8 illustrates an exemplary implementation flowchart of a structural block diagram of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure
  • FIG. 9 is a block diagram showing the structure of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure.
  • FIG. 1 illustrates an implementation flow diagram of a method of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure.
  • the executor of the embodiment may be a server, or other device for determining the title of the multimedia resource, which is not limited herein.
  • the method mainly includes:
  • step S101 user behavior data of the target user is collected, and a first multimedia resource list is generated according to the user behavior data.
  • the multimedia may be a combination of multiple media, for example, may include various media forms such as text, sound, and images.
  • the multimedia resource may be a video, which is not limited herein.
  • the user behavior data of the target user may include, but is not limited to, at least one of the following: data of the target user viewing the multimedia resource, data of the target user commenting the multimedia resource, data of the target user subscribing to the multimedia resource, and data of the target user pressing the multimedia resource.
  • the first multimedia resource list may be generated according to the multimedia resource corresponding to the user behavior data of the target user.
  • step S102 the title of each multimedia resource in the first multimedia resource list is parsed to obtain a first component list corresponding to the target user.
  • the title of each multimedia resource in the first multimedia resource list may be parsed by using a NER (Named Entity Recognition) technology to obtain a first component list corresponding to the target user.
  • NER Named Entity Recognition
  • step S103 the original title of the recommended multimedia resource is parsed to obtain a second component list corresponding to the original title.
  • the original title of each multimedia resource to be recommended in the recommended multimedia resource list may be separately parsed to obtain a second component list corresponding to each original title.
  • the NER technology can be used to parse the original title of the recommended multimedia resource to obtain a second component list corresponding to the original title.
  • step S104 each component in the second component list is compared with each component in the first component list to obtain an updated second component list.
  • each component in the second component list may be compared with each component in the first component list to replace the component in the second component list with the component in the first component list.
  • step S105 a new title of the multimedia resource to be recommended is determined according to the updated second component list.
  • the original title of the multimedia resource to be recommended is “Turtle licking the toe of a sleeping kitten”, and the new title is “The tortoise licks the toe of a sleeping comet!”.
  • an updated second component list is obtained, thereby determining to be determined.
  • the new title of the multimedia resource is recommended, and the personalized title can be determined for the target user, which can better attract the user, thereby improving the probability that the recommended multimedia resource is clicked; without manually modifying the title of the multimedia resource, the labor cost is greatly saved.
  • FIG. 2 illustrates an exemplary implementation flow diagram of a method step S104 of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure.
  • each component in the second component list is compared with each component in the first component list to obtain an updated second component list, including:
  • step S201 the similarity between each component in the second component list and each component in the first component list is calculated.
  • the similarity between the components can be determined by calculating the similarity between the vectors corresponding to the components. It should be understood by those skilled in the art that the similarity between the components can also be measured by other parameters of the component, which is not limited herein.
  • step S202 if a similarity between a component in the second component list and a component in the first component list is greater than the first preset value, replacing one component in the second component list with the first component A component in the list.
  • the first preset value may be 0.9.
  • one component in the second component list is "kitten kitten”
  • one component in the first component list is “comet star”
  • the similarity between "kittencat” and “comet star” is 0.95, then the second component can be Replace “Kids” in the ingredient list with “Iridium Man” in the first ingredient list.
  • step S203 the updated second component list is obtained based on all the replaced components.
  • FIG. 3 illustrates an exemplary implementation flow diagram of a method step S201 of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure.
  • calculating the similarity between each component in the second component list and each component in the first component list includes:
  • step S301 a vector corresponding to each component in the second component list is determined.
  • word2vec may be used to determine a vector corresponding to each component in the second component list and a vector corresponding to each component in the first component list.
  • step S302 the similarities of the vectors corresponding to the respective components in the second component list and the vectors corresponding to the respective components in the first component list are respectively calculated.
  • the cosine distance between vectors corresponding to two components can be determined as the similarity of two components.
  • calculating, respectively, a similarity between a vector corresponding to each component in the second component list and a vector corresponding to each component in the first component list including: calculating, by using Equation 1, the second component list Vector corresponding to the first component a vector corresponding to the mth component in the first component list Similarity
  • FIG. 4 illustrates a method step S105 of determining a title of a multimedia resource according to an embodiment of the present disclosure.
  • step S401 the score of the updated second component list is calculated.
  • step S402 if the score of the updated second component list is greater than the second preset value, the new title of the multimedia resource to be recommended is determined according to the updated second component list.
  • the new title of the multimedia resource to be recommended is determined according to the updated second component list to ensure the front and rear components of the new title.
  • Language relevance may be set according to the experience of a person skilled in the art, which is not limited herein.
  • the method further includes: retaining, if the score of the updated second component list is less than or equal to the second preset value, The original title of the multimedia resource to be recommended. In this implementation, if the score of the updated second component list is less than or equal to the second preset value, the original title of the multimedia resource to be recommended is reserved to ensure language association between the components before and after the title.
  • calculating a score of the updated second component list includes: calculating an updated second component list according to a probability that each component in the updated second component list appears in the specified sample set The score.
  • the specified sample set may be determined according to the title of all the multimedia resources in the multimedia resource list to be recommended, or the specified sample set may be determined according to the titles of all the multimedia resources in the other specified multimedia resource list, which is not limited herein.
  • calculating a score of the updated second component list according to a probability that each component in the updated second component list appears in the specified sample set includes:
  • n is the number of components in the updated second component list
  • w j is the jth component in the updated second component list
  • w ji is the ji component in the updated second component list.
  • p(w j w ji ) represents the probability that the jth component and the jith component co-occur in the specified sample set
  • p(w ji ) represents the probability that the jith component appears in the specified sample set.
  • FIG. 5 illustrates an exemplary implementation flow diagram of a method step S102 of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure.
  • the title of each multimedia resource in the first multimedia resource list is parsed, and the first component list corresponding to the target user is obtained, including:
  • step S501 the title of each multimedia resource in the first multimedia resource list is parsed to obtain a component related to the target user.
  • the NER technology may separately parse the titles of the multimedia resources in the first multimedia resource list to obtain the components corresponding to the titles of the respective multimedia resources.
  • the components may include one or more of an entity word (such as "dog” "Mars Intelligence Agency”), an emotional word (such as "good-looking” "laughing dead”), and emotional punctuation (such as "!).
  • entity word may include one or more of a person's name, a place name, an institution name, and a proper noun.
  • step S502 a component whose number of occurrences in the component related to the target user is greater than the third preset value is taken as a component corresponding to the target user.
  • the third preset value may be 2.
  • a component whose number of occurrences related to the target user is greater than the third preset value is used as a component corresponding to the target user, and filtering the number of occurrences of the component related to the target user is less than Or a component equal to the third preset value, thereby being able to reduce the influence of noise on the component corresponding to the target user.
  • step S503 a first component list corresponding to the target user is generated according to the component corresponding to the target user.
  • the first component list corresponding to the target user may be represented as ⁇ NE1, NE2, ..., NEn ⁇ , which Among them, NE1, NE2, ..., NEn represent the respective components corresponding to the target user.
  • FIG. 6 illustrates an exemplary implementation flow diagram of a method step S101 of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure.
  • the user behavior data of the target user is collected, and the first multimedia resource list is generated according to the user behavior data, including:
  • step S601 all user behavior data of the target user within the specified time period is collected.
  • all user behavior data for a target user within one month, three months, or six months can be collected.
  • step S602 valid user behavior data is filtered out from the collected user behavior data.
  • the user behavior data of the repeated viewing of the multimedia resource may be determined as invalid user behavior data, and the user behavior data of the viewing multimedia resource having a small completion ratio may be determined as invalid user behavior data, which is not limited herein.
  • step S603 the valid user behavior data is sorted according to the time corresponding to the valid user behavior data, and the first multimedia resource list is obtained.
  • the recommended multimedia resource list may be filtered to make the multimedia resources to be recommended diverse: the uploader information of the multimedia resource to be recommended, the channel information to which the multimedia resource to be recommended belongs, and the target user to view the multimedia.
  • the data of the resource and the interest tag of the target user For example, if the multimedia resource list to be recommended includes more than four multimedia resources uploaded by the same uploader, the top three multimedia resources of the multimedia resources uploaded by the uploader may be reserved as the multimedia resources to be recommended. For another example, if the multimedia resource list to be recommended includes more than four multimedia resources of the same second-level channel, the multimedia resources of the second-level channel may be retained. The top three multimedia resources in the source are used as the multimedia resources to be recommended.
  • the variety channel is a certain level channel
  • the Hunan variety channel is the second level channel under the level one channel.
  • the multimedia resource list to be recommended includes more than four multimedia resources under the same three-level interest tag
  • the top three multimedia resources in the multimedia resources under the three-level interest tag may be reserved as the multimedia to be recommended.
  • the primary interest tag is entertainment
  • the entertainment star is the secondary interest tag under the primary interest tag
  • Beyond is the tertiary interest tag under the secondary interest tag.
  • the multimedia resource list to be recommended includes the multimedia resource that the target user has recently viewed, the multimedia resource is not used as the multimedia resource to be recommended.
  • the method for determining the title of the multimedia resource can determine the personalized title for the target user, and can better attract the user, thereby improving the probability that the recommended multimedia resource is clicked.
  • FIG. 7 illustrates a structural block diagram of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure.
  • the apparatus shown in FIG. 7 can be used to run the method of determining the title of the multimedia resource shown in FIGS. 1 through 6. For the convenience of explanation, only the portion related to the present embodiment is shown in FIG.
  • the device includes: an acquisition module 71, configured to collect user behavior data of a target user, and generate a first multimedia resource list according to the user behavior data; and a first parsing module 72, configured to The header of each multimedia resource in the first multimedia resource list is parsed to obtain a first component list corresponding to the target user, and the second parsing module 73 is configured to parse the original title of the recommended multimedia resource to obtain the a second component list corresponding to the original title; a comparison module 74, configured to compare each component in the second component list with each component in the first component list to obtain an updated second component list; The module 75 is configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended.
  • an acquisition module 71 configured to collect user behavior data of a target user, and generate a first multimedia resource list according to the user behavior data
  • a first parsing module 72 configured to The header of each multimedia resource in the first multimedia resource list is parsed to obtain a first component list corresponding to the target user, and the second
  • FIG. 8 illustrates an exemplary implementation flowchart of a structural block diagram of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure.
  • the apparatus shown in FIG. 8 can be used to run the method of determining the title of the multimedia resource shown in FIGS. 1 through 6.
  • FIG. 8 For the convenience of explanation, only the portion related to the present embodiment is shown in FIG.
  • the same components in Fig. 8 as those in Fig. 7 have the same functions, and a detailed description of these components will be omitted for the sake of brevity.
  • the comparison module 74 includes: a similarity calculation sub-module 741, configured to calculate a similarity between each component in the second component list and each component in the first component list a replacement sub-module 742, configured to: if the similarity between a component in the second component list and a component in the first component list is greater than a first preset value, One of the components is replaced with a component in the first component list; and the update sub-module 743 is configured to obtain an updated second component list based on all the replaced components.
  • a similarity calculation sub-module 741 configured to calculate a similarity between each component in the second component list and each component in the first component list
  • a replacement sub-module 742 configured to: if the similarity between a component in the second component list and a component in the first component list is greater than a first preset value, One of the components is replaced with a component in the first component list
  • the update sub-module 743 is configured to obtain an updated second component list based on all the replaced components
  • the similarity calculation sub-module 741 includes: a vector determining unit, configured to determine a vector corresponding to each component in the second component list; and a similarity calculation unit, configured to separately calculate The similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list.
  • the similarity calculation unit is configured to calculate, by using Equation 1, a vector corresponding to the first component in the second component list. a vector corresponding to the mth component in the first component list Similarity
  • the determining module 75 includes: a score calculation sub-module 751, configured to calculate a score of the updated second component list; and a determination sub-module 752 for the updated In a case where the score of the second component list is greater than the second preset value, the new title of the multimedia resource to be recommended is determined according to the updated second component list.
  • the score calculation sub-module 751 is configured to: according to the The score of the updated second component list is calculated by the probability that each component in the new second component list appears in the specified sample set.
  • the score calculation sub-module 751 is configured to calculate the score s of the updated second component list by using Equation 2;
  • n represents the number of components in the updated second component list
  • w j represents the jth component in the updated second component list
  • w ji represents the updated second component list
  • p(w j w ji ) represents a probability that the jth component and the jith component coexist in the specified sample set
  • p(w ji ) represents the ji The probability that a component will appear in the specified set of samples.
  • the apparatus further includes: a retaining module 76, configured to reserve the information that the score of the updated second component list is less than or equal to the second preset value The original title of the multimedia resource to be recommended.
  • the first parsing module 72 includes: a parsing sub-module 721, configured to parse a title of each multimedia resource in the first multimedia resource list, to obtain the target user a component determining component sub-module 722, configured to use, as a component corresponding to the target user, a component whose number of occurrences in the component related to the target user is greater than a third preset value; the first component list generation sub-module 723, And configured to generate a first component list corresponding to the target user according to the component corresponding to the target user.
  • the collecting module 71 includes: a collecting sub-module 711, configured to collect all user behavior data of the target user in a specified time period; and a filtering sub-module 712 for collecting from the collected Filter out valid user behavior data in user behavior data; sort submodule 713.
  • the method is used to sort the valid user behavior data according to the time corresponding to the valid user behavior data, to obtain the first multimedia resource list.
  • the device for determining the title of the multimedia resource can determine the personalized title for the target user, can better attract the user, thereby being able to improve the recommended multimedia resource The probability of clicking.
  • FIG. 9 is a block diagram showing the structure of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure.
  • the device 1100 that determines the title of the multimedia resource may be a host server having a computing capability, a personal computer PC, or a portable computer or terminal that can be carried.
  • the specific embodiments of the present disclosure do not limit the specific implementation of the computing node.
  • the device 1100 that determines the title of the multimedia resource includes a processor 1110, a communications interface 1120, a memory 1130, and a bus 1140.
  • the processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the bus 1140.
  • Communication interface 1120 is for communicating with network devices, including, for example, a virtual machine management center, shared storage, and the like.
  • the processor 1110 is configured to execute a program.
  • the processor 1110 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present disclosure.
  • ASIC Application Specific Integrated Circuit
  • the memory 1130 is used to store files.
  • the memory 1130 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory.
  • Memory 1130 can also be a memory array.
  • the memory 1130 may also be partitioned, and the block may Combine into virtual volumes according to certain rules.
  • the above program may be program code including computer operating instructions.
  • the program can be specifically used to: implement the operations of the steps in Embodiment 1.
  • the function is implemented in the form of computer software and sold or used as a stand-alone product, it is considered to some extent that all or part of the technical solution of the present disclosure (for example, a part contributing to the prior art) is It is embodied in the form of computer software products.
  • the computer software product is typically stored in a computer readable non-volatile storage medium, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all of the methods of various embodiments of the present disclosure. Or part of the steps.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Disclosed are a method and device for determining a title of a multimedia resource. The method comprises: acquiring user behavior data of a target user and generating a first multimedia resource list according to the user behavior data (S101); parsing a title of each multimedia resource in the first multimedia resource list to obtain a first element list corresponding to the target user (S102); parsing an original title of a multimedia resource to be recommended to obtain a second element list corresponding to the original title (S103); comparing each component of the second element list with each component of the first element list to obtain an updated second element list (S104); and determining a new title for the multimedia resource to be recommended according to the updated second element list (S105). The method and device for determining a title of a multimedia resource can be used to better attract users by determining a personalized title for the target user, thereby increasing the probability that the recommended multimedia resource is clicked on.

Description

确定多媒体资源的标题的方法及装置Method and device for determining a title of a multimedia resource
交叉引用cross reference
本申请主张2016年10月9日提交的中国专利申请号为201610881052.3的优先权,其全部内容通过引用包含于此。The present application claims priority to Chinese Patent Application No. 201610881052.3, filed on Oct. 9, the entire disclosure of which is hereby incorporated by reference.
技术领域Technical field
本公开涉及信息技术领域,尤其涉及一种确定多媒体资源的标题的方法及装置。The present disclosure relates to the field of information technology, and in particular, to a method and apparatus for determining a title of a multimedia resource.
背景技术Background technique
在互联网时代,尤其是移动互联网时代,如何为用户提供及时且有价值的信息是众多互联网公司研究的热点。例如,用户在浏览视频网站时,视频标题是吸引用户观看视频的一个重要因素,因此,视频网站往往有大量的运营人员对视频标题进行编辑。视频上传者也可以对视频标题进行编辑,以达到吸引用户观看的目的。In the era of the Internet, especially in the era of mobile Internet, how to provide users with timely and valuable information is a hot spot for many Internet companies. For example, when a user browses a video website, the video title is an important factor in attracting the user to watch the video. Therefore, the video website often has a large number of operators editing the video title. The video uploader can also edit the video title to appeal to the user.
目前,视频等多媒体资源的标题的编辑依赖于网站的运营人员和上传者,耗费大量的人力资源,且网站的运营人员和上传者所编辑的多媒体资源的标题是针对大众喜好的,不能满足单个用户的个性化需求。At present, the editing of the title of multimedia resources such as video depends on the operator and uploader of the website, which consumes a lot of human resources, and the title of the multimedia resource edited by the website operator and the uploader is aimed at the public and cannot satisfy a single single. User's personalized needs.
发明内容Summary of the invention
技术问题technical problem
有鉴于此,本公开要解决的技术问题是,现有的确定多媒体资源的标题的方式耗费大量的人力资源,且不能满足用户的个性化需求。In view of this, the technical problem to be solved by the present disclosure is that the existing method of determining the title of the multimedia resource consumes a large amount of human resources and cannot satisfy the personalized needs of the user.
解决方案solution
为了解决上述技术问题,根据本公开的一实施例,提供了一种确定多媒体资源的标题的方法,包括:In order to solve the above technical problem, according to an embodiment of the present disclosure, a method for determining a title of a multimedia resource is provided, including:
采集目标用户的用户行为数据,并根据所述用户行为数据生成第一多媒体资源列表;Collecting user behavior data of the target user, and generating a first multimedia resource list according to the user behavior data;
对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到所述目标用户对应的第一成分列表;Parsing a title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user;
对待推荐多媒体资源的原标题进行解析,得到所述原标题对应的第二成分列表;Parsing the original title of the recommended multimedia resource to obtain a second component list corresponding to the original title;
将所述第二成分列表中的各个成分与所述第一成分列表中的各个成分进行比较,得到更新后的第二成分列表;Comparing each component in the second component list with each component in the first component list to obtain an updated second component list;
根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。Determining a new title of the multimedia resource to be recommended according to the updated second component list.
对于上述方法,在一种可能的实现方式中,将所述第二成分列表中的各个成分与所述第一成分列表中的各个成分进行比较,得到更新后的第二成分列表,包括:For the above method, in a possible implementation, comparing each component in the second component list with each component in the first component list to obtain an updated second component list, including:
计算所述第二成分列表中的各个成分与所述第一成分列表中的各个成分的相似度;Calculating a similarity between each component in the second component list and each component in the first component list;
在所述第二成分列表中的一成分与所述第一成分列表中的一成分的相似度大于第一预设值的情况下,将所述第二成分列表中的一成分替换为所述第一成分列表中的一成分;When a similarity between a component in the second component list and a component in the first component list is greater than a first preset value, replacing one component in the second component list with the a component in the first ingredient list;
根据所有替换的成分得到更新后的第二成分列表。An updated list of second ingredients is obtained based on all of the replaced ingredients.
对于上述方法,在一种可能的实现方式中,计算所述第二成分列表中的各个成分与所述第一成分列表中的各个成分的相似度,包括:For the above method, in a possible implementation, calculating the similarity between each component in the second component list and each component in the first component list includes:
确定所述第二成分列表中的各个成分对应的向量;Determining a vector corresponding to each component in the second component list;
分别计算所述第二成分列表中的各个成分对应的向量与所述第一成分列表中的各个成分对应的向量的相似度。 The similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list is separately calculated.
对于上述方法,在一种可能的实现方式中,分别计算所述第二成分列表中的各个成分对应的向量与所述第一成分列表中的各个成分对应的向量的相似度,包括:For the above method, in a possible implementation, calculating the similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list, respectively, includes:
采用式1计算所述第二成分列表中的第l个成分对应的向量
Figure PCTCN2017104410-appb-000001
与所述第一成分列表中的第m个成分对应的向量
Figure PCTCN2017104410-appb-000002
的相似度
Figure PCTCN2017104410-appb-000003
Calculating a vector corresponding to the first component in the second component list by using Equation 1
Figure PCTCN2017104410-appb-000001
a vector corresponding to the mth component in the first component list
Figure PCTCN2017104410-appb-000002
Similarity
Figure PCTCN2017104410-appb-000003
Figure PCTCN2017104410-appb-000004
Figure PCTCN2017104410-appb-000004
对于上述方法,在一种可能的实现方式中,根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题,包括:For the above method, in a possible implementation, determining, according to the updated second component list, the new title of the multimedia resource to be recommended, including:
计算所述更新后的第二成分列表的得分;Calculating a score of the updated second component list;
在所述更新后的第二成分列表的得分大于第二预设值的情况下,根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。In a case that the score of the updated second component list is greater than the second preset value, the new title of the to-be-recommended multimedia resource is determined according to the updated second component list.
对于上述方法,在一种可能的实现方式中,计算所述更新后的第二成分列表的得分,包括:For the above method, in a possible implementation, calculating the score of the updated second component list includes:
根据所述更新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算所述更新后的第二成分列表的得分。A score of the updated second component list is calculated based on a probability that each component in the updated second component list appears in the specified sample set.
对于上述方法,在一种可能的实现方式中,根据所述更新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算所述更新后的第二成分列表的得分,包括:For the above method, in a possible implementation, calculating the score of the updated second component list according to the probability that each component in the updated second component list appears in the specified sample set includes:
采用式2计算所述更新后的第二成分列表的得分s;Calculating the score s of the updated second component list by using Equation 2;
Figure PCTCN2017104410-appb-000005
Figure PCTCN2017104410-appb-000005
Figure PCTCN2017104410-appb-000006
Figure PCTCN2017104410-appb-000006
其中,n表示所述更新后的第二成分列表中成分的个数,wj表示所述更新后的第二成分列表中的第j个成分,wj-i表示所述更新后的第二成分列表中 的第j-i个成分,p(wjwj-i)表示所述第j个成分与所述第j-i个成分在所述指定样本集合中共同出现的概率,p(wj-i)表示所述第j-i个成分在所述指定样本集合中出现的概率。Where n represents the number of components in the updated second component list, w j represents the jth component in the updated second component list, and w ji represents the updated second component list The jith component in the middle, p(w j w ji ) represents a probability that the jth component and the jith component coexist in the specified sample set, and p(w ji ) represents the ji The probability that a component will appear in the specified set of samples.
对于上述方法,在一种可能的实现方式中,在计算所述更新后的第二成分列表的得分之后,所述方法还包括:For the above method, in a possible implementation, after calculating the score of the updated second component list, the method further includes:
在所述更新后的第二成分列表的得分小于或等于所述第二预设值的情况下,保留所述待推荐多媒体资源的原标题。If the score of the updated second component list is less than or equal to the second preset value, retain the original title of the multimedia resource to be recommended.
对于上述方法,在一种可能的实现方式中,对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到所述目标用户对应的第一成分列表,包括:For the foregoing method, in a possible implementation, the header of each multimedia resource in the first multimedia resource list is parsed to obtain a first component list corresponding to the target user, including:
对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到与所述目标用户相关的成分;Parsing a title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user;
将与所述目标用户相关的成分中出现次数大于第三预设值的成分作为所述目标用户对应的成分;A component having a number of occurrences greater than a third preset value among the components related to the target user is used as a component corresponding to the target user;
根据所述目标用户对应的成分生成所述目标用户对应的第一成分列表。Generating, according to the component corresponding to the target user, a first component list corresponding to the target user.
对于上述方法,在一种可能的实现方式中,采集目标用户的用户行为数据,根据所述用户行为数据生成第一多媒体资源列表,包括:For the foregoing method, in a possible implementation, the user behavior data of the target user is collected, and the first multimedia resource list is generated according to the user behavior data, including:
采集指定时间段内的所述目标用户的所有用户行为数据;Collecting all user behavior data of the target user within a specified time period;
从所采集的用户行为数据中筛选出有效的用户行为数据;Filtering valid user behavior data from the collected user behavior data;
按照所述有效的用户行为数据对应的时间对所述有效的用户行为数据进行排序,得到所述第一多媒体资源列表。And sorting the valid user behavior data according to the time corresponding to the valid user behavior data to obtain the first multimedia resource list.
为了解决上述技术问题,根据本公开的另一实施例,提供了一种确定多媒体资源的标题的装置,包括:In order to solve the above technical problem, according to another embodiment of the present disclosure, an apparatus for determining a title of a multimedia resource is provided, including:
采集模块,用于采集目标用户的用户行为数据,并根据所述用户行为数据生成第一多媒体资源列表; An acquisition module, configured to collect user behavior data of the target user, and generate a first multimedia resource list according to the user behavior data;
第一解析模块,用于对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到所述目标用户对应的第一成分列表;a first parsing module, configured to parse a title of each multimedia resource in the first multimedia resource list, to obtain a first component list corresponding to the target user;
第二解析模块,用于对待推荐多媒体资源的原标题进行解析,得到所述原标题对应的第二成分列表;a second parsing module, configured to parse an original title of the recommended multimedia resource, to obtain a second component list corresponding to the original title;
比较模块,用于将所述第二成分列表中的各个成分与所述第一成分列表中的各个成分进行比较,得到更新后的第二成分列表;a comparison module, configured to compare each component in the second component list with each component in the first component list to obtain an updated second component list;
确定模块,用于根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。And a determining module, configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended.
对于上述装置,在一种可能的实现方式中,所述比较模块包括:In a possible implementation manner, the comparison module includes:
相似度计算子模块,用于计算所述第二成分列表中的各个成分与所述第一成分列表中的各个成分的相似度;a similarity calculation submodule, configured to calculate a similarity between each component in the second component list and each component in the first component list;
替换子模块,用于在所述第二成分列表中的一成分与所述第一成分列表中的一成分的相似度大于第一预设值的情况下,将所述第二成分列表中的一成分替换为所述第一成分列表中的一成分;a replacement submodule, configured to: in a second component list, if a similarity between a component in the second component list and a component in the first component list is greater than a first preset value Substituting a component with a component of the first component list;
更新子模块,用于根据所有替换的成分得到更新后的第二成分列表。An update sub-module for obtaining an updated second component list based on all of the replaced components.
对于上述装置,在一种可能的实现方式中,所述相似度计算子模块包括:In a possible implementation manner, the similarity calculation submodule includes:
向量确定单元,用于确定所述第二成分列表中的各个成分对应的向量;a vector determining unit, configured to determine a vector corresponding to each component in the second component list;
相似度计算单元,用于分别计算所述第二成分列表中的各个成分对应的向量与所述第一成分列表中的各个成分对应的向量的相似度。The similarity calculation unit is configured to separately calculate similarities between vectors corresponding to the respective components in the second component list and vectors corresponding to the respective components in the first component list.
对于上述装置,在一种可能的实现方式中,所述相似度计算单元用于:In a possible implementation manner, the similarity calculation unit is configured to:
采用式1计算所述第二成分列表中的第l个成分对应的向量
Figure PCTCN2017104410-appb-000007
与所述第一成分列表中的第m个成分对应的向量
Figure PCTCN2017104410-appb-000008
的相似度
Figure PCTCN2017104410-appb-000009
Calculating a vector corresponding to the first component in the second component list by using Equation 1
Figure PCTCN2017104410-appb-000007
a vector corresponding to the mth component in the first component list
Figure PCTCN2017104410-appb-000008
Similarity
Figure PCTCN2017104410-appb-000009
Figure PCTCN2017104410-appb-000010
Figure PCTCN2017104410-appb-000010
对于上述装置,在一种可能的实现方式中,所述确定模块包括: In a possible implementation manner, the determining module includes:
得分计算子模块,用于计算所述更新后的第二成分列表的得分;a score calculation sub-module, configured to calculate a score of the updated second component list;
确定子模块,用于在所述更新后的第二成分列表的得分大于第二预设值的情况下,根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。And a determining submodule, configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended, if the score of the updated second component list is greater than a second preset value.
对于上述装置,在一种可能的实现方式中,所述得分计算子模块用于:For a device as described above, in a possible implementation, the score calculation sub-module is used to:
根据所述更新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算所述更新后的第二成分列表的得分。A score of the updated second component list is calculated based on a probability that each component in the updated second component list appears in the specified sample set.
对于上述装置,在一种可能的实现方式中,所述得分计算子模块用于:For a device as described above, in a possible implementation, the score calculation sub-module is used to:
采用式2计算所述更新后的第二成分列表的得分s;Calculating the score s of the updated second component list by using Equation 2;
Figure PCTCN2017104410-appb-000011
Figure PCTCN2017104410-appb-000011
Figure PCTCN2017104410-appb-000012
Figure PCTCN2017104410-appb-000012
其中,n表示所述更新后的第二成分列表中成分的个数,wj表示所述更新后的第二成分列表中的第j个成分,wj-i表示所述更新后的第二成分列表中的第j-i个成分,p(wjwj-i)表示所述第j个成分与所述第j-i个成分在所述指定样本集合中共同出现的概率,p(wj-i)表示所述第j-i个成分在所述指定样本集合中出现的概率。Where n represents the number of components in the updated second component list, w j represents the jth component in the updated second component list, and w ji represents the updated second component list The jith component in the middle, p(w j w ji ) represents a probability that the jth component and the jith component coexist in the specified sample set, and p(w ji ) represents the ji The probability that a component will appear in the specified set of samples.
对于上述装置,在一种可能的实现方式中,所述装置还包括:In a possible implementation manner, the device further includes:
保留模块,用于在所述更新后的第二成分列表的得分小于或等于所述第二预设值的情况下,保留所述待推荐多媒体资源的原标题。And a retaining module, configured to retain the original title of the multimedia resource to be recommended if the score of the updated second component list is less than or equal to the second preset value.
对于上述装置,在一种可能的实现方式中,所述第一解析模块包括:In a possible implementation manner, the first parsing module includes:
解析子模块,用于对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到与所述目标用户相关的成分;a parsing sub-module, configured to parse a title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user;
成分确定子模块,用于将与所述目标用户相关的成分中出现次数大于第 三预设值的成分作为所述目标用户对应的成分;a component determining sub-module for using the number of occurrences of the component related to the target user to be greater than The components of the three preset values are the components corresponding to the target user;
第一成分列表生成子模块,用于根据所述目标用户对应的成分生成所述目标用户对应的第一成分列表。The first component list generating submodule is configured to generate a first component list corresponding to the target user according to the component corresponding to the target user.
对于上述装置,在一种可能的实现方式中,所述采集模块包括:In a possible implementation manner, the acquiring module includes:
采集子模块,用于采集指定时间段内的所述目标用户的所有用户行为数据;a collection submodule, configured to collect all user behavior data of the target user within a specified time period;
筛选子模块,用于从所采集的用户行为数据中筛选出有效的用户行为数据;a screening sub-module for filtering valid user behavior data from the collected user behavior data;
排序子模块,用于按照所述有效的用户行为数据对应的时间对所述有效的用户行为数据进行排序,得到所述第一多媒体资源列表。The sorting sub-module is configured to sort the valid user behavior data according to the time corresponding to the valid user behavior data to obtain the first multimedia resource list.
有益效果Beneficial effect
通过将待推荐多媒体资源的原标题对应的第二成分列表中的各个成分与目标用户对应的第一成分列表中的各个成分进行比较,得到更新后的第二成分列表,从而确定待推荐多媒体资源的新标题,根据本公开实施例的确定多媒体资源的标题的方法及装置能够针对目标用户确定个性化的标题,能够更好地吸引用户,从而能够提高所推荐的多媒体资源被点击的概率。The updated second component list is obtained by comparing each component in the second component list corresponding to the original title of the multimedia resource to be recommended with each component in the first component list corresponding to the target user, thereby determining the multimedia resource to be recommended. The new title, the method and apparatus for determining the title of the multimedia resource according to the embodiment of the present disclosure can determine the personalized title for the target user, and can better attract the user, thereby improving the probability that the recommended multimedia resource is clicked.
根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。Further features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments.
附图说明DRAWINGS
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。The accompanying drawings, which are incorporated in FIG
图1示出根据本公开一实施例的确定多媒体资源的标题的方法的实现流程图;FIG. 1 illustrates an implementation flowchart of a method for determining a title of a multimedia resource according to an embodiment of the present disclosure;
图2示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S104 的一示例性的实现流程图;FIG. 2 illustrates a method step S104 of determining a title of a multimedia resource according to an embodiment of the present disclosure. An exemplary implementation flow diagram;
图3示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S301的一示例性的实现流程图;FIG. 3 illustrates an exemplary implementation flowchart of a method step S301 of determining a title of a multimedia resource according to an embodiment of the present disclosure;
图4示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S105的一示例性的实现流程图;FIG. 4 illustrates an exemplary implementation flowchart of a method step S105 of determining a title of a multimedia resource, according to an embodiment of the present disclosure;
图5示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S102的一示例性的实现流程图;FIG. 5 illustrates an exemplary implementation flowchart of a method step S102 of determining a title of a multimedia resource according to an embodiment of the present disclosure;
图6示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S101的一示例性的实现流程图;FIG. 6 illustrates an exemplary implementation flowchart of a method step S101 of determining a title of a multimedia resource according to an embodiment of the present disclosure;
图7示出根据本公开另一实施例的确定多媒体资源的标题的装置的结构框图;FIG. 7 illustrates a structural block diagram of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure;
图8示出根据本公开另一实施例的确定多媒体资源的标题的装置的结构框图的一示例性的实现流程图;FIG. 8 illustrates an exemplary implementation flowchart of a structural block diagram of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure;
图9示出了本公开的另一个实施例的一种确定多媒体资源的标题的设备的结构框图。FIG. 9 is a block diagram showing the structure of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure.
具体实施方式detailed description
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。Various exemplary embodiments, features, and aspects of the present disclosure are described in detail below with reference to the drawings. The same reference numerals in the drawings denote the same or similar elements. Although the various aspects of the embodiments are illustrated in the drawings, the drawings are not necessarily drawn to scale unless otherwise indicated.
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustrative." Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or preferred.
另外,为了更好的说明本公开,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路 未作详细描述,以便于凸显本公开的主旨。In addition, numerous specific details are set forth in the Detailed Description of the <RTIgt; Those skilled in the art will appreciate that the present disclosure may be practiced without some specific details. In some examples, methods, means, components, and circuits well known to those skilled in the art It is not described in detail to facilitate the purpose of the disclosure.
实施例1Example 1
图1示出根据本公开一实施例的确定多媒体资源的标题的方法的实现流程图。该实施例的执行主体可以为服务器,也可以为其他确定多媒体资源的标题的装置,在此不作限定。如图1所示,该方法主要包括:FIG. 1 illustrates an implementation flow diagram of a method of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure. The executor of the embodiment may be a server, or other device for determining the title of the multimedia resource, which is not limited herein. As shown in Figure 1, the method mainly includes:
在步骤S101中,采集目标用户的用户行为数据,并根据用户行为数据生成第一多媒体资源列表。In step S101, user behavior data of the target user is collected, and a first multimedia resource list is generated according to the user behavior data.
其中,多媒体可以是多种媒体的综合,例如可以包括文字、声音和图像等多种媒体形式。例如,多媒体资源可以为视频,在此不作限定。目标用户的用户行为数据可以包括但不限于以下至少一项:目标用户观看多媒体资源的数据、目标用户评论多媒体资源的数据、目标用户订阅多媒体资源的数据和目标用户顶踩多媒体资源的数据。在本实施例中,可以根据目标用户的用户行为数据对应的多媒体资源生成第一多媒体资源列表。例如,目标用户对应的第一多媒体资源列表可以表示为LU={v1,v2,...,vn}。The multimedia may be a combination of multiple media, for example, may include various media forms such as text, sound, and images. For example, the multimedia resource may be a video, which is not limited herein. The user behavior data of the target user may include, but is not limited to, at least one of the following: data of the target user viewing the multimedia resource, data of the target user commenting the multimedia resource, data of the target user subscribing to the multimedia resource, and data of the target user pressing the multimedia resource. In this embodiment, the first multimedia resource list may be generated according to the multimedia resource corresponding to the user behavior data of the target user. For example, the first multimedia resource list corresponding to the target user may be represented as LU={v1, v2, . . . , vn}.
在步骤S102中,对第一多媒体资源列表中各个多媒体资源的标题进行解析,得到目标用户对应的第一成分列表。In step S102, the title of each multimedia resource in the first multimedia resource list is parsed to obtain a first component list corresponding to the target user.
作为本实施例的一个示例,可以采用NER(Named Entity Recognition,命名实体识别)技术对第一多媒体资源列表中各个多媒体资源的标题进行解析,以得到目标用户对应的第一成分列表。As an example of the embodiment, the title of each multimedia resource in the first multimedia resource list may be parsed by using a NER (Named Entity Recognition) technology to obtain a first component list corresponding to the target user.
在步骤S103中,对待推荐多媒体资源的原标题进行解析,得到原标题对应的第二成分列表。In step S103, the original title of the recommended multimedia resource is parsed to obtain a second component list corresponding to the original title.
作为本实施例的一个示例,可以分别对待推荐多媒体资源列表中的各个待推荐多媒体资源的原标题进行解析,得到各个原标题对应的第二成分列表。例如,可以采用NER技术对待推荐多媒体资源的原标题进行解析,得到原标题对应的第二成分列表。 As an example of the embodiment, the original title of each multimedia resource to be recommended in the recommended multimedia resource list may be separately parsed to obtain a second component list corresponding to each original title. For example, the NER technology can be used to parse the original title of the recommended multimedia resource to obtain a second component list corresponding to the original title.
在步骤S104中,将第二成分列表中的各个成分与第一成分列表中的各个成分进行比较,得到更新后的第二成分列表。In step S104, each component in the second component list is compared with each component in the first component list to obtain an updated second component list.
作为本实施例的一个示例,可以分别将第二成分列表中的各个成分与第一成分列表中的各个成分进行比较,以采用第一成分列表中的成分替换第二成分列表中的成分。As an example of the present embodiment, each component in the second component list may be compared with each component in the first component list to replace the component in the second component list with the component in the first component list.
在步骤S105中,根据更新后的第二成分列表确定待推荐多媒体资源的新标题。In step S105, a new title of the multimedia resource to be recommended is determined according to the updated second component list.
例如,待推荐多媒体资源的原标题为“乌龟啃一只睡觉的小猫咪的脚趾头”,新标题为“乌龟啃一只睡觉的喵星人的脚趾头!”。For example, the original title of the multimedia resource to be recommended is “Turtle licking the toe of a sleeping kitten”, and the new title is “The tortoise licks the toe of a sleeping comet!”.
本实施例通过将待推荐多媒体资源的原标题对应的第二成分列表中的各个成分与目标用户对应的第一成分列表中的各个成分进行比较,得到更新后的第二成分列表,从而确定待推荐多媒体资源的新标题,能够针对目标用户确定个性化的标题,能够更好地吸引用户,从而能够提高所推荐的多媒体资源被点击的概率;无需人工修改多媒体资源的标题,大大节省了人力成本。In this embodiment, by comparing each component in the second component list corresponding to the original title of the multimedia resource to be recommended with each component in the first component list corresponding to the target user, an updated second component list is obtained, thereby determining to be determined. The new title of the multimedia resource is recommended, and the personalized title can be determined for the target user, which can better attract the user, thereby improving the probability that the recommended multimedia resource is clicked; without manually modifying the title of the multimedia resource, the labor cost is greatly saved. .
图2示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S104的一示例性的实现流程图。如图2所示,将第二成分列表中的各个成分与第一成分列表中的各个成分进行比较,得到更新后的第二成分列表,包括:FIG. 2 illustrates an exemplary implementation flow diagram of a method step S104 of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure. As shown in FIG. 2, each component in the second component list is compared with each component in the first component list to obtain an updated second component list, including:
在步骤S201中,计算第二成分列表中的各个成分与第一成分列表中的各个成分的相似度。In step S201, the similarity between each component in the second component list and each component in the first component list is calculated.
例如,可以通过计算成分对应的向量之间的相似度来确定成分之间的相似度。本领域技术人员应理解,也可以通过成分的其他参量来衡量成分之间的相似度,在此不作限定。For example, the similarity between the components can be determined by calculating the similarity between the vectors corresponding to the components. It should be understood by those skilled in the art that the similarity between the components can also be measured by other parameters of the component, which is not limited herein.
在步骤S202中,在第二成分列表中的一成分与第一成分列表中的一成分的相似度大于第一预设值的情况下,将第二成分列表中的一成分替换为第一成分列表中的一成分。 In step S202, if a similarity between a component in the second component list and a component in the first component list is greater than the first preset value, replacing one component in the second component list with the first component A component in the list.
例如,第一预设值可以为0.9。例如,第二成分列表中的一成分为“小猫咪”,第一成分列表中的一成分为“喵星人”,“小猫咪”与“喵星人”的相似度为0.95,则可以将第二成分列表中的“小猫咪”替换为第一成分列表中的“喵星人”。For example, the first preset value may be 0.9. For example, one component in the second component list is "kitten kitten", one component in the first component list is "comet star", and the similarity between "kittencat" and "comet star" is 0.95, then the second component can be Replace “Kids” in the ingredient list with “Iridium Man” in the first ingredient list.
在本示例中,在第二成分列表中的一成分与第一成分列表中的一成分的相似度大于第一预设值的情况下,才将第二成分列表中的一成分替换为第一成分列表中的一成分,由此能够保证语义的一致性。In this example, when the similarity between a component in the second component list and a component in the first component list is greater than the first preset value, one component in the second component list is replaced with the first component. A component in the list of ingredients, thereby ensuring semantic consistency.
在步骤S203中,根据所有替换的成分得到更新后的第二成分列表。In step S203, the updated second component list is obtained based on all the replaced components.
图3示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S201的一示例性的实现流程图。如图3所示,计算第二成分列表中的各个成分与第一成分列表中的各个成分的相似度,包括:FIG. 3 illustrates an exemplary implementation flow diagram of a method step S201 of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure. As shown in FIG. 3, calculating the similarity between each component in the second component list and each component in the first component list includes:
在步骤S301中,确定第二成分列表中的各个成分对应的向量。In step S301, a vector corresponding to each component in the second component list is determined.
作为本实施例的一个示例,可以采用word2vec确定第二成分列表中的各个成分对应的向量以及第一成分列表中的各个成分对应的向量。As an example of the present embodiment, word2vec may be used to determine a vector corresponding to each component in the second component list and a vector corresponding to each component in the first component list.
在步骤S302中,分别计算第二成分列表中的各个成分对应的向量与第一成分列表中的各个成分对应的向量的相似度。In step S302, the similarities of the vectors corresponding to the respective components in the second component list and the vectors corresponding to the respective components in the first component list are respectively calculated.
例如,可以将两个成分对应的向量之间的余弦距离确定为两个成分的相似度。For example, the cosine distance between vectors corresponding to two components can be determined as the similarity of two components.
在一种可能的实现方式中,分别计算第二成分列表中的各个成分对应的向量与第一成分列表中的各个成分对应的向量的相似度,包括:采用式1计算第二成分列表中的第l个成分对应的向量
Figure PCTCN2017104410-appb-000013
与第一成分列表中的第m个成分对应的向量
Figure PCTCN2017104410-appb-000014
的相似度
Figure PCTCN2017104410-appb-000015
In a possible implementation manner, calculating, respectively, a similarity between a vector corresponding to each component in the second component list and a vector corresponding to each component in the first component list, including: calculating, by using Equation 1, the second component list Vector corresponding to the first component
Figure PCTCN2017104410-appb-000013
a vector corresponding to the mth component in the first component list
Figure PCTCN2017104410-appb-000014
Similarity
Figure PCTCN2017104410-appb-000015
Figure PCTCN2017104410-appb-000016
Figure PCTCN2017104410-appb-000016
图4示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S105 的一示例性的实现流程图。如图4所示,根据更新后的第二成分列表确定待推荐多媒体资源的新标题,包括:FIG. 4 illustrates a method step S105 of determining a title of a multimedia resource according to an embodiment of the present disclosure. An exemplary implementation flow diagram. As shown in FIG. 4, the new title of the multimedia resource to be recommended is determined according to the updated second component list, including:
在步骤S401中,计算更新后的第二成分列表的得分。In step S401, the score of the updated second component list is calculated.
在步骤S402中,在更新后的第二成分列表的得分大于第二预设值的情况下,根据更新后的第二成分列表确定待推荐多媒体资源的新标题。In step S402, if the score of the updated second component list is greater than the second preset value, the new title of the multimedia resource to be recommended is determined according to the updated second component list.
在该示例中,在更新后的第二成分列表的得分大于第二预设值的情况下,根据更新后的第二成分列表确定待推荐多媒体资源的新标题,以保证新标题的前后成分之间的语言关联性。其中,第二预设值可以依据本领域技术人员的经验设定,在此不作限定。In this example, if the score of the updated second component list is greater than the second preset value, the new title of the multimedia resource to be recommended is determined according to the updated second component list to ensure the front and rear components of the new title. Language relevance. The second preset value may be set according to the experience of a person skilled in the art, which is not limited herein.
在一种可能的实现方式中,在计算更新后的第二成分列表的得分之后,该方法还包括:在更新后的第二成分列表的得分小于或等于第二预设值的情况下,保留待推荐多媒体资源的原标题。在该实现方式中,在更新后的第二成分列表的得分小于或等于第二预设值的情况下,保留待推荐多媒体资源的原标题,以保证标题的前后成分之间的语言关联性。In a possible implementation manner, after calculating the score of the updated second component list, the method further includes: retaining, if the score of the updated second component list is less than or equal to the second preset value, The original title of the multimedia resource to be recommended. In this implementation, if the score of the updated second component list is less than or equal to the second preset value, the original title of the multimedia resource to be recommended is reserved to ensure language association between the components before and after the title.
在一种可能的实现方式中,计算更新后的第二成分列表的得分,包括:根据更新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算更新后的第二成分列表的得分。In a possible implementation manner, calculating a score of the updated second component list includes: calculating an updated second component list according to a probability that each component in the updated second component list appears in the specified sample set The score.
例如,可以根据待推荐的多媒体资源列表中所有多媒体资源的标题确定指定样本集合,或者可以根据其他指定的多媒体资源列表中所有多媒体资源的标题确定指定样本集合,在此不作限定。For example, the specified sample set may be determined according to the title of all the multimedia resources in the multimedia resource list to be recommended, or the specified sample set may be determined according to the titles of all the multimedia resources in the other specified multimedia resource list, which is not limited herein.
在一种可能的实现方式中,根据更新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算更新后的第二成分列表的得分,包括:In a possible implementation manner, calculating a score of the updated second component list according to a probability that each component in the updated second component list appears in the specified sample set includes:
采用式2计算更新后的第二成分列表的得分s;Calculating the score s of the updated second component list using Equation 2;
Figure PCTCN2017104410-appb-000017
Figure PCTCN2017104410-appb-000017
Figure PCTCN2017104410-appb-000018
Figure PCTCN2017104410-appb-000018
其中,n表示更新后的第二成分列表中成分的个数,wj表示更新后的第二成分列表中的第j个成分,wj-i表示更新后的第二成分列表中的第j-i个成分,p(wjwj-i)表示第j个成分与第j-i个成分在指定样本集合中共同出现的概率,p(wj-i)表示第j-i个成分在指定样本集合中出现的概率。Where n is the number of components in the updated second component list, w j is the jth component in the updated second component list, and w ji is the ji component in the updated second component list. , p(w j w ji ) represents the probability that the jth component and the jith component co-occur in the specified sample set, and p(w ji ) represents the probability that the jith component appears in the specified sample set.
图5示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S102的一示例性的实现流程图。如图5所示,对第一多媒体资源列表中各个多媒体资源的标题进行解析,得到目标用户对应的第一成分列表,包括:FIG. 5 illustrates an exemplary implementation flow diagram of a method step S102 of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure. As shown in FIG. 5, the title of each multimedia resource in the first multimedia resource list is parsed, and the first component list corresponding to the target user is obtained, including:
在步骤S501中,对第一多媒体资源列表中各个多媒体资源的标题进行解析,得到与目标用户相关的成分。In step S501, the title of each multimedia resource in the first multimedia resource list is parsed to obtain a component related to the target user.
作为本实施例的一个示例,可以采用NER技术分别对第一多媒体资源列表中各个多媒体资源的标题进行解析,得到各个多媒体资源的标题对应的成分。其中,成分可以包括实体词(例如“狗”“火星情报局”)、情绪词(例如“好看”“笑死人不偿命”)和情绪标点(例如“!”)中的一项或多项。其中,实体词可以包括人名、地名、机构名和专有名词中的一项或多项。As an example of the embodiment, the NER technology may separately parse the titles of the multimedia resources in the first multimedia resource list to obtain the components corresponding to the titles of the respective multimedia resources. Among them, the components may include one or more of an entity word (such as "dog" "Mars Intelligence Agency"), an emotional word (such as "good-looking" "laughing dead"), and emotional punctuation (such as "!"). The entity word may include one or more of a person's name, a place name, an institution name, and a proper noun.
在步骤S502中,将与目标用户相关的成分中出现次数大于第三预设值的成分作为目标用户对应的成分。In step S502, a component whose number of occurrences in the component related to the target user is greater than the third preset value is taken as a component corresponding to the target user.
例如,第三预设值可以为2。在该示例中,通过设置第三预设值,将与目标用户相关的成分中出现次数大于第三预设值的成分作为目标用户对应的成分,并过滤与目标用户相关的成分中出现次数小于或等于第三预设值的成分,由此能够降低噪声对目标用户对应的成分的影响。For example, the third preset value may be 2. In this example, by setting a third preset value, a component whose number of occurrences related to the target user is greater than the third preset value is used as a component corresponding to the target user, and filtering the number of occurrences of the component related to the target user is less than Or a component equal to the third preset value, thereby being able to reduce the influence of noise on the component corresponding to the target user.
在步骤S503中,根据目标用户对应的成分生成目标用户对应的第一成分列表。In step S503, a first component list corresponding to the target user is generated according to the component corresponding to the target user.
例如,目标用户对应的第一成分列表可以表示为{NE1,NE2,...,NEn},其 中,NE1,NE2,...,NEn表示目标用户对应的各个成分。For example, the first component list corresponding to the target user may be represented as {NE1, NE2, ..., NEn}, which Among them, NE1, NE2, ..., NEn represent the respective components corresponding to the target user.
图6示出根据本公开一实施例的确定多媒体资源的标题的方法步骤S101的一示例性的实现流程图。如图6所示,采集目标用户的用户行为数据,根据用户行为数据生成第一多媒体资源列表,包括:FIG. 6 illustrates an exemplary implementation flow diagram of a method step S101 of determining a title of a multimedia resource, in accordance with an embodiment of the present disclosure. As shown in FIG. 6, the user behavior data of the target user is collected, and the first multimedia resource list is generated according to the user behavior data, including:
在步骤S601中,采集指定时间段内的目标用户的所有用户行为数据。In step S601, all user behavior data of the target user within the specified time period is collected.
例如,可以采集1个月、3个月或者半年内的目标用户的所有用户行为数据。For example, all user behavior data for a target user within one month, three months, or six months can be collected.
在步骤S602中,从所采集的用户行为数据中筛选出有效的用户行为数据。In step S602, valid user behavior data is filtered out from the collected user behavior data.
例如,可以将重复观看多媒体资源的用户行为数据确定为无效的用户行为数据,也可以将观看多媒体资源的完成比例很小的用户行为数据确定为无效的用户行为数据,在此不作限定。For example, the user behavior data of the repeated viewing of the multimedia resource may be determined as invalid user behavior data, and the user behavior data of the viewing multimedia resource having a small completion ratio may be determined as invalid user behavior data, which is not limited herein.
在步骤S603中,按照有效的用户行为数据对应的时间对有效的用户行为数据进行排序,得到第一多媒体资源列表。In step S603, the valid user behavior data is sorted according to the time corresponding to the valid user behavior data, and the first multimedia resource list is obtained.
其中,有效的用户行为数据对应的时间可以为该有效的用户行为数据的发生时间。按照有效的用户行为数据对应的时间对有效的用户行为数据进行排序可以为:按照有效的用户行为数据由近到远的时间顺序对有效的用户行为数据进行排序。The time corresponding to the valid user behavior data may be the time when the valid user behavior data occurs. Sorting valid user behavior data according to the time corresponding to valid user behavior data may be: sorting valid user behavior data in order from near to far according to valid user behavior data.
在一种可能的实现方式中,可以对待推荐多媒体资源列表进行筛选,以使待推荐多媒体资源具备多样性:待推荐多媒体资源的上传者信息、待推荐多媒体资源所属的频道信息、目标用户观看多媒体资源的数据和目标用户的兴趣标签。例如,若待推荐多媒体资源列表中包括四个以上同一上传者上传的多媒体资源,则可以保留该上传者上传的多媒体资源中点击量排名前三的多媒体资源作为待推荐多媒体资源。再例如,若待推荐多媒体资源列表中包括四个以上同一二级频道的多媒体资源,则可以保留该二级频道的多媒体资 源中点击量排名前三的多媒体资源作为待推荐多媒体资源。例如,综艺频道为某一一级频道,湖南综艺频道为该一级频道下的二级频道。再例如,若待推荐多媒体资源列表中包括四个以上同一三级兴趣标签下的多媒体资源,则可以保留该三级兴趣标签下的多媒体资源中点击量排名前三的多媒体资源作为待推荐多媒体资源。例如,一级兴趣标签为娱乐,娱乐明星为该一级兴趣标签下的二级兴趣标签,Beyond为该二级兴趣标签下的三级兴趣标签。再例如,若待推荐多媒体资源列表中包括目标用户近期观看过的多媒体资源,则不将该多媒体资源作为待推荐多媒体资源。In a possible implementation, the recommended multimedia resource list may be filtered to make the multimedia resources to be recommended diverse: the uploader information of the multimedia resource to be recommended, the channel information to which the multimedia resource to be recommended belongs, and the target user to view the multimedia. The data of the resource and the interest tag of the target user. For example, if the multimedia resource list to be recommended includes more than four multimedia resources uploaded by the same uploader, the top three multimedia resources of the multimedia resources uploaded by the uploader may be reserved as the multimedia resources to be recommended. For another example, if the multimedia resource list to be recommended includes more than four multimedia resources of the same second-level channel, the multimedia resources of the second-level channel may be retained. The top three multimedia resources in the source are used as the multimedia resources to be recommended. For example, the variety channel is a certain level channel, and the Hunan variety channel is the second level channel under the level one channel. For example, if the multimedia resource list to be recommended includes more than four multimedia resources under the same three-level interest tag, the top three multimedia resources in the multimedia resources under the three-level interest tag may be reserved as the multimedia to be recommended. Resources. For example, the primary interest tag is entertainment, the entertainment star is the secondary interest tag under the primary interest tag, and Beyond is the tertiary interest tag under the secondary interest tag. For example, if the multimedia resource list to be recommended includes the multimedia resource that the target user has recently viewed, the multimedia resource is not used as the multimedia resource to be recommended.
这样,通过将待推荐多媒体资源的原标题对应的第二成分列表中的各个成分与目标用户对应的第一成分列表中的各个成分进行比较,得到更新后的第二成分列表,从而确定待推荐多媒体资源的新标题,根据本公开实施例的确定多媒体资源的标题的方法能够针对目标用户确定个性化的标题,能够更好地吸引用户,从而能够提高所推荐的多媒体资源被点击的概率。In this way, by comparing each component in the second component list corresponding to the original title of the multimedia resource to be recommended with each component in the first component list corresponding to the target user, an updated second component list is obtained, thereby determining to be recommended. The new title of the multimedia resource, the method for determining the title of the multimedia resource according to an embodiment of the present disclosure can determine the personalized title for the target user, and can better attract the user, thereby improving the probability that the recommended multimedia resource is clicked.
实施例2Example 2
图7示出根据本公开另一实施例的确定多媒体资源的标题的装置的结构框图。图7所示的装置可以用于运行图1至图6所示的确定多媒体资源的标题的方法。为了便于说明,在图7中仅示出了与本实施例相关的部分。FIG. 7 illustrates a structural block diagram of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure. The apparatus shown in FIG. 7 can be used to run the method of determining the title of the multimedia resource shown in FIGS. 1 through 6. For the convenience of explanation, only the portion related to the present embodiment is shown in FIG.
如图7所示,该装置包括:采集模块71,用于采集目标用户的用户行为数据,并根据所述用户行为数据生成第一多媒体资源列表;第一解析模块72,用于对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到所述目标用户对应的第一成分列表;第二解析模块73,用于对待推荐多媒体资源的原标题进行解析,得到所述原标题对应的第二成分列表;比较模块74,用于将所述第二成分列表中的各个成分与所述第一成分列表中的各个成分进行比较,得到更新后的第二成分列表;确定模块75,用于根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。 As shown in FIG. 7, the device includes: an acquisition module 71, configured to collect user behavior data of a target user, and generate a first multimedia resource list according to the user behavior data; and a first parsing module 72, configured to The header of each multimedia resource in the first multimedia resource list is parsed to obtain a first component list corresponding to the target user, and the second parsing module 73 is configured to parse the original title of the recommended multimedia resource to obtain the a second component list corresponding to the original title; a comparison module 74, configured to compare each component in the second component list with each component in the first component list to obtain an updated second component list; The module 75 is configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended.
图8示出根据本公开另一实施例的确定多媒体资源的标题的装置的结构框图的一示例性的实现流程图。图8所示的装置可以用于运行图1至图6所示的确定多媒体资源的标题的方法。为了便于说明,在图8中仅示出了与本实施例相关的部分。图8中标号与图7相同的组件具有相同的功能,为简明起见,省略对这些组件的详细说明。FIG. 8 illustrates an exemplary implementation flowchart of a structural block diagram of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure. The apparatus shown in FIG. 8 can be used to run the method of determining the title of the multimedia resource shown in FIGS. 1 through 6. For the convenience of explanation, only the portion related to the present embodiment is shown in FIG. The same components in Fig. 8 as those in Fig. 7 have the same functions, and a detailed description of these components will be omitted for the sake of brevity.
在一种可能的实现方式中,所述比较模块74包括:相似度计算子模块741,用于计算所述第二成分列表中的各个成分与所述第一成分列表中的各个成分的相似度;替换子模块742,用于在所述第二成分列表中的一成分与所述第一成分列表中的一成分的相似度大于第一预设值的情况下,将所述第二成分列表中的一成分替换为所述第一成分列表中的一成分;更新子模块743,用于根据所有替换的成分得到更新后的第二成分列表。In a possible implementation manner, the comparison module 74 includes: a similarity calculation sub-module 741, configured to calculate a similarity between each component in the second component list and each component in the first component list a replacement sub-module 742, configured to: if the similarity between a component in the second component list and a component in the first component list is greater than a first preset value, One of the components is replaced with a component in the first component list; and the update sub-module 743 is configured to obtain an updated second component list based on all the replaced components.
在一种可能的实现方式中,所述相似度计算子模块741包括:向量确定单元,用于确定所述第二成分列表中的各个成分对应的向量;相似度计算单元,用于分别计算所述第二成分列表中的各个成分对应的向量与所述第一成分列表中的各个成分对应的向量的相似度。In a possible implementation manner, the similarity calculation sub-module 741 includes: a vector determining unit, configured to determine a vector corresponding to each component in the second component list; and a similarity calculation unit, configured to separately calculate The similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list.
在一种可能的实现方式中,所述相似度计算单元用于:采用式1计算所述第二成分列表中的第l个成分对应的向量
Figure PCTCN2017104410-appb-000019
与所述第一成分列表中的第m个成分对应的向量
Figure PCTCN2017104410-appb-000020
的相似度
Figure PCTCN2017104410-appb-000021
In a possible implementation, the similarity calculation unit is configured to calculate, by using Equation 1, a vector corresponding to the first component in the second component list.
Figure PCTCN2017104410-appb-000019
a vector corresponding to the mth component in the first component list
Figure PCTCN2017104410-appb-000020
Similarity
Figure PCTCN2017104410-appb-000021
Figure PCTCN2017104410-appb-000022
Figure PCTCN2017104410-appb-000022
在一种可能的实现方式中,所述确定模块75包括:得分计算子模块751,用于计算所述更新后的第二成分列表的得分;确定子模块752,用于在所述更新后的第二成分列表的得分大于第二预设值的情况下,根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。In a possible implementation, the determining module 75 includes: a score calculation sub-module 751, configured to calculate a score of the updated second component list; and a determination sub-module 752 for the updated In a case where the score of the second component list is greater than the second preset value, the new title of the multimedia resource to be recommended is determined according to the updated second component list.
在一种可能的实现方式中,所述得分计算子模块751用于:根据所述更 新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算所述更新后的第二成分列表的得分。In a possible implementation manner, the score calculation sub-module 751 is configured to: according to the The score of the updated second component list is calculated by the probability that each component in the new second component list appears in the specified sample set.
在一种可能的实现方式中,所述得分计算子模块751用于:采用式2计算所述更新后的第二成分列表的得分s;In a possible implementation, the score calculation sub-module 751 is configured to calculate the score s of the updated second component list by using Equation 2;
Figure PCTCN2017104410-appb-000023
Figure PCTCN2017104410-appb-000023
Figure PCTCN2017104410-appb-000024
Figure PCTCN2017104410-appb-000024
其中,n表示所述更新后的第二成分列表中成分的个数,wj表示所述更新后的第二成分列表中的第j个成分,wj-i表示所述更新后的第二成分列表中的第j-i个成分,p(wjwj-i)表示所述第j个成分与所述第j-i个成分在所述指定样本集合中共同出现的概率,p(wj-i)表示所述第j-i个成分在所述指定样本集合中出现的概率。Where n represents the number of components in the updated second component list, w j represents the jth component in the updated second component list, and w ji represents the updated second component list The jith component in the middle, p(w j w ji ) represents a probability that the jth component and the jith component coexist in the specified sample set, and p(w ji ) represents the ji The probability that a component will appear in the specified set of samples.
在一种可能的实现方式中,所述装置还包括:保留模块76,用于在所述更新后的第二成分列表的得分小于或等于所述第二预设值的情况下,保留所述待推荐多媒体资源的原标题。In a possible implementation manner, the apparatus further includes: a retaining module 76, configured to reserve the information that the score of the updated second component list is less than or equal to the second preset value The original title of the multimedia resource to be recommended.
在一种可能的实现方式中,所述第一解析模块72包括:解析子模块721,用于对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到与所述目标用户相关的成分;成分确定子模块722,用于将与所述目标用户相关的成分中出现次数大于第三预设值的成分作为所述目标用户对应的成分;第一成分列表生成子模块723,用于根据所述目标用户对应的成分生成所述目标用户对应的第一成分列表。In a possible implementation manner, the first parsing module 72 includes: a parsing sub-module 721, configured to parse a title of each multimedia resource in the first multimedia resource list, to obtain the target user a component determining component sub-module 722, configured to use, as a component corresponding to the target user, a component whose number of occurrences in the component related to the target user is greater than a third preset value; the first component list generation sub-module 723, And configured to generate a first component list corresponding to the target user according to the component corresponding to the target user.
在一种可能的实现方式中,所述采集模块71包括:采集子模块711,用于采集指定时间段内的所述目标用户的所有用户行为数据;筛选子模块712,用于从所采集的用户行为数据中筛选出有效的用户行为数据;排序子模块 713,用于按照所述有效的用户行为数据对应的时间对所述有效的用户行为数据进行排序,得到所述第一多媒体资源列表。In a possible implementation, the collecting module 71 includes: a collecting sub-module 711, configured to collect all user behavior data of the target user in a specified time period; and a filtering sub-module 712 for collecting from the collected Filter out valid user behavior data in user behavior data; sort submodule 713. The method is used to sort the valid user behavior data according to the time corresponding to the valid user behavior data, to obtain the first multimedia resource list.
需要说明的是,这样,通过将待推荐多媒体资源的原标题对应的第二成分列表中的各个成分与目标用户对应的第一成分列表中的各个成分进行比较,得到更新后的第二成分列表,从而确定待推荐多媒体资源的新标题,根据本公开实施例的确定多媒体资源的标题的装置能够针对目标用户确定个性化的标题,能够更好地吸引用户,从而能够提高所推荐的多媒体资源被点击的概率。It should be noted that, by comparing each component in the second component list corresponding to the original title of the multimedia resource to be recommended with each component in the first component list corresponding to the target user, the updated second component list is obtained. And determining a new title of the multimedia resource to be recommended, the device for determining the title of the multimedia resource according to an embodiment of the present disclosure can determine the personalized title for the target user, can better attract the user, thereby being able to improve the recommended multimedia resource The probability of clicking.
实施例3Example 3
图9示出了本公开的另一个实施例的一种确定多媒体资源的标题的设备的结构框图。所述确定多媒体资源的标题的设备1100可以是具备计算能力的主机服务器、个人计算机PC、或者可携带的便携式计算机或终端等。本公开具体实施例并不对计算节点的具体实现做限定。FIG. 9 is a block diagram showing the structure of an apparatus for determining a title of a multimedia resource according to another embodiment of the present disclosure. The device 1100 that determines the title of the multimedia resource may be a host server having a computing capability, a personal computer PC, or a portable computer or terminal that can be carried. The specific embodiments of the present disclosure do not limit the specific implementation of the computing node.
所述确定多媒体资源的标题的设备1100包括处理器(processor)1110、通信接口(Communications Interface)1120、存储器(memory)1130和总线1140。其中,处理器1110、通信接口1120、以及存储器1130通过总线1140完成相互间的通信。The device 1100 that determines the title of the multimedia resource includes a processor 1110, a communications interface 1120, a memory 1130, and a bus 1140. The processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the bus 1140.
通信接口1120用于与网络设备通信,其中网络设备包括例如虚拟机管理中心、共享存储等。Communication interface 1120 is for communicating with network devices, including, for example, a virtual machine management center, shared storage, and the like.
处理器1110用于执行程序。处理器1110可能是一个中央处理器CPU,或者是专用集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本公开实施例的一个或多个集成电路。The processor 1110 is configured to execute a program. The processor 1110 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present disclosure.
存储器1130用于存放文件。存储器1130可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1130也可以是存储器阵列。存储器1130还可能被分块,并且所述块可 按一定的规则组合成虚拟卷。The memory 1130 is used to store files. The memory 1130 may include a high speed RAM memory and may also include a non-volatile memory such as at least one disk memory. Memory 1130 can also be a memory array. The memory 1130 may also be partitioned, and the block may Combine into virtual volumes according to certain rules.
在一种可能的实施方式中,上述程序可为包括计算机操作指令的程序代码。该程序具体可用于:实现实施例1中各步骤的操作。In a possible implementation, the above program may be program code including computer operating instructions. The program can be specifically used to: implement the operations of the steps in Embodiment 1.
本领域普通技术人员可以意识到,本文所描述的实施例中的各示例性单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件形式来实现,取决于技术方案的特定应用和设计约束条件。专业技术人员可以针对特定的应用选择不同的方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。Those of ordinary skill in the art will appreciate that the various exemplary elements and algorithm steps in the embodiments described herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can select different methods for implementing the described functions for a particular application, but such implementation should not be considered to be beyond the scope of the present disclosure.
如果以计算机软件的形式来实现所述功能并作为独立的产品销售或使用时,则在一定程度上可认为本公开的技术方案的全部或部分(例如对现有技术做出贡献的部分)是以计算机软件产品的形式体现的。该计算机软件产品通常存储在计算机可读取的非易失性存储介质中,包括若干指令用以使得计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本公开各实施例方法的全部或部分步骤。而前述的存储介质包括U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of computer software and sold or used as a stand-alone product, it is considered to some extent that all or part of the technical solution of the present disclosure (for example, a part contributing to the prior art) is It is embodied in the form of computer software products. The computer software product is typically stored in a computer readable non-volatile storage medium, including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all of the methods of various embodiments of the present disclosure. Or part of the steps. The foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a removable hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。 The above is only the specific embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the disclosure. It should be covered within the scope of protection of the present disclosure. Therefore, the scope of protection of the present disclosure should be determined by the scope of the claims.

Claims (20)

  1. 一种确定多媒体资源的标题的方法,其特征在于,包括:A method for determining a title of a multimedia resource, comprising:
    采集目标用户的用户行为数据,并根据所述用户行为数据生成第一多媒体资源列表;Collecting user behavior data of the target user, and generating a first multimedia resource list according to the user behavior data;
    对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到所述目标用户对应的第一成分列表;Parsing a title of each multimedia resource in the first multimedia resource list to obtain a first component list corresponding to the target user;
    对待推荐多媒体资源的原标题进行解析,得到所述原标题对应的第二成分列表;Parsing the original title of the recommended multimedia resource to obtain a second component list corresponding to the original title;
    将所述第二成分列表中的各个成分与所述第一成分列表中的各个成分进行比较,得到更新后的第二成分列表;Comparing each component in the second component list with each component in the first component list to obtain an updated second component list;
    根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。Determining a new title of the multimedia resource to be recommended according to the updated second component list.
  2. 根据权利要求1所述的方法,其特征在于,将所述第二成分列表中的各个成分与所述第一成分列表中的各个成分进行比较,得到更新后的第二成分列表,包括:The method according to claim 1, wherein comparing each component in the second component list with each component in the first component list to obtain an updated second component list comprises:
    计算所述第二成分列表中的各个成分与所述第一成分列表中的各个成分的相似度;Calculating a similarity between each component in the second component list and each component in the first component list;
    在所述第二成分列表中的一成分与所述第一成分列表中的一成分的相似度大于第一预设值的情况下,将所述第二成分列表中的一成分替换为所述第一成分列表中的一成分;When a similarity between a component in the second component list and a component in the first component list is greater than a first preset value, replacing one component in the second component list with the a component in the first ingredient list;
    根据所有替换的成分得到更新后的第二成分列表。An updated list of second ingredients is obtained based on all of the replaced ingredients.
  3. 根据权利要求2所述的方法,其特征在于,计算所述第二成分列表中的各个成分与所述第一成分列表中的各个成分的相似度,包括:The method according to claim 2, wherein calculating the similarity between each component in the second component list and each component in the first component list comprises:
    确定所述第二成分列表中的各个成分对应的向量;Determining a vector corresponding to each component in the second component list;
    分别计算所述第二成分列表中的各个成分对应的向量与所述第一成分列表中的各个成分对应的向量的相似度。The similarity between the vector corresponding to each component in the second component list and the vector corresponding to each component in the first component list is separately calculated.
  4. 根据权利要求3所述的方法,其特征在于,分别计算所述第二成分列 表中的各个成分对应的向量与所述第一成分列表中的各个成分对应的向量的相似度,包括:The method according to claim 3, wherein said second component column is separately calculated The similarity between the vector corresponding to each component in the table and the vector corresponding to each component in the first component list includes:
    采用式1计算所述第二成分列表中的第l个成分对应的向量
    Figure PCTCN2017104410-appb-100001
    与所述第一成分列表中的第m个成分对应的向量
    Figure PCTCN2017104410-appb-100002
    的相似度
    Figure PCTCN2017104410-appb-100003
    Calculating a vector corresponding to the first component in the second component list by using Equation 1
    Figure PCTCN2017104410-appb-100001
    a vector corresponding to the mth component in the first component list
    Figure PCTCN2017104410-appb-100002
    Similarity
    Figure PCTCN2017104410-appb-100003
    Figure PCTCN2017104410-appb-100004
    Figure PCTCN2017104410-appb-100004
  5. 根据权利要求1所述的方法,其特征在于,根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题,包括:The method according to claim 1, wherein determining the new title of the multimedia resource to be recommended according to the updated second component list comprises:
    计算所述更新后的第二成分列表的得分;Calculating a score of the updated second component list;
    在所述更新后的第二成分列表的得分大于第二预设值的情况下,根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。In a case that the score of the updated second component list is greater than the second preset value, the new title of the to-be-recommended multimedia resource is determined according to the updated second component list.
  6. 根据权利要求5所述的方法,其特征在于,计算所述更新后的第二成分列表的得分,包括:The method of claim 5, wherein calculating the score of the updated second component list comprises:
    根据所述更新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算所述更新后的第二成分列表的得分。A score of the updated second component list is calculated based on a probability that each component in the updated second component list appears in the specified sample set.
  7. 根据权利要求6所述的方法,其特征在于,根据所述更新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算所述更新后的第二成分列表的得分,包括:The method according to claim 6, wherein calculating the score of the updated second component list according to a probability that each component in the updated second component list appears in the specified sample set comprises:
    采用式2计算所述更新后的第二成分列表的得分s;Calculating the score s of the updated second component list by using Equation 2;
    Figure PCTCN2017104410-appb-100005
    Figure PCTCN2017104410-appb-100005
    Figure PCTCN2017104410-appb-100006
    Figure PCTCN2017104410-appb-100006
    其中,n表示所述更新后的第二成分列表中成分的个数,wj表示所述更新后的第二成分列表中的第j个成分,wj-i表示所述更新后的第二成分列表中的第j-i个成分,p(wjwj-i)表示所述第j个成分与所述第j-i个成分在所述指定样 本集合中共同出现的概率,p(wj-i)表示所述第j-i个成分在所述指定样本集合中出现的概率。Where n represents the number of components in the updated second component list, w j represents the jth component in the updated second component list, and w ji represents the updated second component list The jith component in the middle, p(w j w ji ) represents a probability that the jth component and the jith component coexist in the specified sample set, and p(w ji ) represents the ji The probability that a component will appear in the specified set of samples.
  8. 根据权利要求5至7任意一项所述的方法,其特征在于,在计算所述更新后的第二成分列表的得分之后,所述方法还包括:The method according to any one of claims 5 to 7, wherein after calculating the score of the updated second component list, the method further comprises:
    在所述更新后的第二成分列表的得分小于或等于所述第二预设值的情况下,保留所述待推荐多媒体资源的原标题。If the score of the updated second component list is less than or equal to the second preset value, retain the original title of the multimedia resource to be recommended.
  9. 根据权利要求1所述的方法,其特征在于,对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到所述目标用户对应的第一成分列表,包括:The method according to claim 1, wherein the header of each multimedia resource in the first multimedia resource list is parsed to obtain a first component list corresponding to the target user, including:
    对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到与所述目标用户相关的成分;Parsing a title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user;
    将与所述目标用户相关的成分中出现次数大于第三预设值的成分作为所述目标用户对应的成分;A component having a number of occurrences greater than a third preset value among the components related to the target user is used as a component corresponding to the target user;
    根据所述目标用户对应的成分生成所述目标用户对应的第一成分列表。Generating, according to the component corresponding to the target user, a first component list corresponding to the target user.
  10. 根据权利要求1所述的方法,其特征在于,采集目标用户的用户行为数据,根据所述用户行为数据生成第一多媒体资源列表,包括:The method according to claim 1, wherein the user behavior data of the target user is collected, and the first multimedia resource list is generated according to the user behavior data, including:
    采集指定时间段内的所述目标用户的所有用户行为数据;Collecting all user behavior data of the target user within a specified time period;
    从所采集的用户行为数据中筛选出有效的用户行为数据;Filtering valid user behavior data from the collected user behavior data;
    按照所述有效的用户行为数据对应的时间对所述有效的用户行为数据进行排序,得到所述第一多媒体资源列表。And sorting the valid user behavior data according to the time corresponding to the valid user behavior data to obtain the first multimedia resource list.
  11. 一种确定多媒体资源的标题的装置,其特征在于,包括:An apparatus for determining a title of a multimedia resource, comprising:
    采集模块,用于采集目标用户的用户行为数据,并根据所述用户行为数据生成第一多媒体资源列表;An acquisition module, configured to collect user behavior data of the target user, and generate a first multimedia resource list according to the user behavior data;
    第一解析模块,用于对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到所述目标用户对应的第一成分列表; a first parsing module, configured to parse a title of each multimedia resource in the first multimedia resource list, to obtain a first component list corresponding to the target user;
    第二解析模块,用于对待推荐多媒体资源的原标题进行解析,得到所述原标题对应的第二成分列表;a second parsing module, configured to parse an original title of the recommended multimedia resource, to obtain a second component list corresponding to the original title;
    比较模块,用于将所述第二成分列表中的各个成分与所述第一成分列表中的各个成分进行比较,得到更新后的第二成分列表;a comparison module, configured to compare each component in the second component list with each component in the first component list to obtain an updated second component list;
    确定模块,用于根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。And a determining module, configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended.
  12. 根据权利要求11所述的装置,其特征在于,所述比较模块包括:The device according to claim 11, wherein the comparison module comprises:
    相似度计算子模块,用于计算所述第二成分列表中的各个成分与所述第一成分列表中的各个成分的相似度;a similarity calculation submodule, configured to calculate a similarity between each component in the second component list and each component in the first component list;
    替换子模块,用于在所述第二成分列表中的一成分与所述第一成分列表中的一成分的相似度大于第一预设值的情况下,将所述第二成分列表中的一成分替换为所述第一成分列表中的一成分;a replacement submodule, configured to: in a second component list, if a similarity between a component in the second component list and a component in the first component list is greater than a first preset value Substituting a component with a component of the first component list;
    更新子模块,用于根据所有替换的成分得到更新后的第二成分列表。An update sub-module for obtaining an updated second component list based on all of the replaced components.
  13. 根据权利要求12所述的装置,其特征在于,所述相似度计算子模块包括:The apparatus according to claim 12, wherein the similarity calculation sub-module comprises:
    向量确定单元,用于确定所述第二成分列表中的各个成分对应的向量;a vector determining unit, configured to determine a vector corresponding to each component in the second component list;
    相似度计算单元,用于分别计算所述第二成分列表中的各个成分对应的向量与所述第一成分列表中的各个成分对应的向量的相似度。The similarity calculation unit is configured to separately calculate similarities between vectors corresponding to the respective components in the second component list and vectors corresponding to the respective components in the first component list.
  14. 根据权利要求13所述的装置,其特征在于,所述相似度计算单元用于:The apparatus according to claim 13, wherein said similarity calculation unit is configured to:
    采用式1计算所述第二成分列表中的第l个成分对应的向量
    Figure PCTCN2017104410-appb-100007
    与所述第一成分列表中的第m个成分对应的向量
    Figure PCTCN2017104410-appb-100008
    的相似度
    Figure PCTCN2017104410-appb-100009
    Calculating a vector corresponding to the first component in the second component list by using Equation 1
    Figure PCTCN2017104410-appb-100007
    a vector corresponding to the mth component in the first component list
    Figure PCTCN2017104410-appb-100008
    Similarity
    Figure PCTCN2017104410-appb-100009
    Figure PCTCN2017104410-appb-100010
    Figure PCTCN2017104410-appb-100010
  15. 根据权利要求11所述的装置,其特征在于,所述确定模块包括: The apparatus according to claim 11, wherein the determining module comprises:
    得分计算子模块,用于计算所述更新后的第二成分列表的得分;a score calculation sub-module, configured to calculate a score of the updated second component list;
    确定子模块,用于在所述更新后的第二成分列表的得分大于第二预设值的情况下,根据所述更新后的第二成分列表确定所述待推荐多媒体资源的新标题。And a determining submodule, configured to determine, according to the updated second component list, a new title of the multimedia resource to be recommended, if the score of the updated second component list is greater than a second preset value.
  16. 根据权利要求15所述的装置,其特征在于,所述得分计算子模块用于:The apparatus according to claim 15, wherein said score calculation sub-module is used to:
    根据所述更新后的第二成分列表中的各个成分在指定样本集合中出现的概率计算所述更新后的第二成分列表的得分。A score of the updated second component list is calculated based on a probability that each component in the updated second component list appears in the specified sample set.
  17. 根据权利要求16所述的装置,其特征在于,所述得分计算子模块用于:The apparatus of claim 16 wherein said score calculation sub-module is for:
    采用式2计算所述更新后的第二成分列表的得分s;Calculating the score s of the updated second component list by using Equation 2;
    Figure PCTCN2017104410-appb-100011
    Figure PCTCN2017104410-appb-100011
    Figure PCTCN2017104410-appb-100012
    Figure PCTCN2017104410-appb-100012
    其中,n表示所述更新后的第二成分列表中成分的个数,wj表示所述更新后的第二成分列表中的第j个成分,wj-i表示所述更新后的第二成分列表中的第j-i个成分,p(wjwj-i)表示所述第j个成分与所述第j-i个成分在所述指定样本集合中共同出现的概率,p(wj-i)表示所述第j-i个成分在所述指定样本集合中出现的概率。Where n represents the number of components in the updated second component list, w j represents the jth component in the updated second component list, and w ji represents the updated second component list The jith component in the middle, p(w j w ji ) represents a probability that the jth component and the jith component coexist in the specified sample set, and p(w ji ) represents the ji The probability that a component will appear in the specified set of samples.
  18. 根据权利要求15至17任意一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 15 to 17, wherein the device further comprises:
    保留模块,用于在所述更新后的第二成分列表的得分小于或等于所述第二预设值的情况下,保留所述待推荐多媒体资源的原标题。And a retaining module, configured to retain the original title of the multimedia resource to be recommended if the score of the updated second component list is less than or equal to the second preset value.
  19. 根据权利要求11所述的装置,其特征在于,所述第一解析模块包括: The apparatus according to claim 11, wherein the first parsing module comprises:
    解析子模块,用于对所述第一多媒体资源列表中各个多媒体资源的标题进行解析,得到与所述目标用户相关的成分;a parsing sub-module, configured to parse a title of each multimedia resource in the first multimedia resource list to obtain a component related to the target user;
    成分确定子模块,用于将与所述目标用户相关的成分中出现次数大于第三预设值的成分作为所述目标用户对应的成分;a component determining sub-module, configured to use, as the component corresponding to the target user, a component whose number of occurrences in the component related to the target user is greater than a third preset value;
    第一成分列表生成子模块,用于根据所述目标用户对应的成分生成所述目标用户对应的第一成分列表。The first component list generating submodule is configured to generate a first component list corresponding to the target user according to the component corresponding to the target user.
  20. 根据权利要求11所述的装置,其特征在于,所述采集模块包括:The device according to claim 11, wherein the acquisition module comprises:
    采集子模块,用于采集指定时间段内的所述目标用户的所有用户行为数据;a collection submodule, configured to collect all user behavior data of the target user within a specified time period;
    筛选子模块,用于从所采集的用户行为数据中筛选出有效的用户行为数据;a screening sub-module for filtering valid user behavior data from the collected user behavior data;
    排序子模块,用于按照所述有效的用户行为数据对应的时间对所述有效的用户行为数据进行排序,得到所述第一多媒体资源列表。 The sorting sub-module is configured to sort the valid user behavior data according to the time corresponding to the valid user behavior data to obtain the first multimedia resource list.
PCT/CN2017/104410 2016-10-09 2017-09-29 Method and device for determining title of multimedia resource WO2018064959A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610881052.3A CN106445922B (en) 2016-10-09 2016-10-09 Method and device for determining title of multimedia resource
CN201610881052.3 2016-10-09

Publications (1)

Publication Number Publication Date
WO2018064959A1 true WO2018064959A1 (en) 2018-04-12

Family

ID=58173116

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/104410 WO2018064959A1 (en) 2016-10-09 2017-09-29 Method and device for determining title of multimedia resource

Country Status (2)

Country Link
CN (1) CN106445922B (en)
WO (1) WO2018064959A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106445922B (en) * 2016-10-09 2020-02-18 合一网络技术(北京)有限公司 Method and device for determining title of multimedia resource
CN111401046B (en) * 2020-04-13 2023-09-29 贝壳技术有限公司 House source title generation method and device, storage medium and electronic equipment
CN113742567B (en) * 2020-05-29 2023-08-22 北京达佳互联信息技术有限公司 Recommendation method and device for multimedia resources, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080071519A1 (en) * 2006-09-19 2008-03-20 Xerox Corporation Labeling of work of art titles in text for natural language processing
CN101604310A (en) * 2008-06-10 2009-12-16 宏碁股份有限公司 According to the user to the preference for relative titles managing articles
CN102103594A (en) * 2009-12-22 2011-06-22 北京大学 Character data recognition and processing method and device
US20110252034A1 (en) * 2010-04-13 2011-10-13 Microsoft Corporation Measuring entity extraction complexity
CN103544264A (en) * 2013-10-17 2014-01-29 常熟市华安电子工程有限公司 Commodity title optimizing tool
CN106445922A (en) * 2016-10-09 2017-02-22 合网络技术(北京)有限公司 Method and device for determining title of multimedia resource

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6978277B2 (en) * 1989-10-26 2005-12-20 Encyclopaedia Britannica, Inc. Multimedia search system
CN103324729B (en) * 2013-06-27 2017-03-08 小米科技有限责任公司 A kind of method and apparatus for recommending multimedia resource
CN105930532B (en) * 2016-06-16 2019-08-02 上海聚力传媒技术有限公司 A kind of method and apparatus from multimedia resource to user that recommending

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080071519A1 (en) * 2006-09-19 2008-03-20 Xerox Corporation Labeling of work of art titles in text for natural language processing
CN101604310A (en) * 2008-06-10 2009-12-16 宏碁股份有限公司 According to the user to the preference for relative titles managing articles
CN102103594A (en) * 2009-12-22 2011-06-22 北京大学 Character data recognition and processing method and device
US20110252034A1 (en) * 2010-04-13 2011-10-13 Microsoft Corporation Measuring entity extraction complexity
CN103544264A (en) * 2013-10-17 2014-01-29 常熟市华安电子工程有限公司 Commodity title optimizing tool
CN106445922A (en) * 2016-10-09 2017-02-22 合网络技术(北京)有限公司 Method and device for determining title of multimedia resource

Also Published As

Publication number Publication date
CN106445922A (en) 2017-02-22
CN106445922B (en) 2020-02-18

Similar Documents

Publication Publication Date Title
JP6538277B2 (en) Identify query patterns and related aggregate statistics among search queries
US9535810B1 (en) Layout optimization
US9507867B2 (en) Discovery engine
US11762908B1 (en) Node graph pruning and fresh content
US20100241647A1 (en) Context-Aware Query Recommendations
JP5616444B2 (en) Method and system for document indexing and data querying
US20160321355A1 (en) Media content recommendation method and apparatus
JP2017508214A (en) Provide search recommendations
JP2017157192A (en) Method of matching between image and content item based on key word
US8793120B1 (en) Behavior-driven multilingual stemming
US20140379719A1 (en) System and method for tagging and searching documents
JP6363682B2 (en) Method for selecting an image that matches content based on the metadata of the image and content
WO2018064959A1 (en) Method and device for determining title of multimedia resource
CN112749272A (en) Intelligent new energy planning text recommendation method for unstructured data
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
US9152705B2 (en) Automatic taxonomy merge
Chen et al. Multi-modal language models for lecture video retrieval
CN113961823B (en) News recommendation method, system, storage medium and equipment
CN107807964B (en) Digital content ordering method, apparatus and computer readable storage medium
Strötgen et al. Proximity2-aware ranking for textual, temporal, and geographic queries
Spitz et al. Topexnet: entity-centric network topic exploration in news streams
Cha et al. Topic model based approach for improved indexing in content based document retrieval
CN114996490A (en) Movie recommendation method, system, storage medium and device
JP6287192B2 (en) Information processing apparatus, information processing method, and program
Subramanya et al. Socialtagger-collaborative tagging for blogs in the long tail

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17857840

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.07.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17857840

Country of ref document: EP

Kind code of ref document: A1