CN113934872A - Search result sorting method, device, equipment and storage medium - Google Patents

Search result sorting method, device, equipment and storage medium Download PDF

Info

Publication number
CN113934872A
CN113934872A CN202111277526.0A CN202111277526A CN113934872A CN 113934872 A CN113934872 A CN 113934872A CN 202111277526 A CN202111277526 A CN 202111277526A CN 113934872 A CN113934872 A CN 113934872A
Authority
CN
China
Prior art keywords
sample
multimedia resource
resource
multimedia
search word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111277526.0A
Other languages
Chinese (zh)
Inventor
张志伟
王希爱
吴丽军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202111277526.0A priority Critical patent/CN113934872A/en
Publication of CN113934872A publication Critical patent/CN113934872A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/438Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a method, a device, equipment and a storage medium for sorting search results, and relates to the technical field of computers. The embodiment of the disclosure at least solves the problem of inaccurate search result ordering in the related art. The method comprises the following steps: acquiring a current search word and a plurality of multimedia resources related to the current search word; predicting the click rate of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a pre-trained prediction model, and sequencing a plurality of multimedia resources according to the click rate of each multimedia resource; the search word characteristics of the current search word are used for identifying the current search word, and the prediction model is obtained by training according to the sample search word, a plurality of sample multimedia resources related to the sample search word and the sample operation record of each sample multimedia resource; the sample operation records are used for representing the click rate of each sample multimedia resource when a user searches for the sample search terms in a historical time period.

Description

Search result sorting method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for ranking search results.
Background
In a video search scene, after a user inputs a search term, the device acquires a plurality of videos matched with the search term from a video library, and sorts the videos according to click rates of the videos so as to return a search result to the user. The click rate required by sequencing the videos is determined by the equipment according to the posterior behavior data of the user.
Specifically, the posterior behavior data of the user includes a search term input by the user in a historical time period, identifications of a plurality of videos matched with the search term, and behavior data such as whether the user clicks, watching duration, whether the user likes and pays attention to, and the like, for each of the plurality of videos. The device may determine the attention of each video according to "whether the user clicks" and determine the attention of each video according to "the viewing duration, whether to like, whether to pay attention". Further, the device calculates the click rate of each video according to the attention and satisfaction of each video.
However, for newly uploaded videos in the video library, corresponding posterior behavior data does not exist (because the videos are new and the user does not click), the device sets the click rate to 0, and the ranking of the videos is correspondingly backward, which may result in inaccurate ranking in the search result.
Disclosure of Invention
The present disclosure provides a method, an apparatus, a device and a storage medium for ranking search results, so as to at least solve the problem of inaccurate ranking of search results in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a method for ranking search results, including: acquiring a current search word and a plurality of multimedia resources related to the current search word; predicting the click rate of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a pre-trained prediction model, and sequencing a plurality of multimedia resources according to the click rate of each multimedia resource; the search word characteristics of the current search word are used for identifying the current search word, and the prediction model is obtained by training according to the sample search word, a plurality of sample multimedia resources related to the sample search word and the sample operation record of each sample multimedia resource; the sample operation records are used for representing the click rate of each sample multimedia resource when a user searches for the sample search terms in a historical time period.
Optionally, the predicting the click rate of each multimedia resource according to the search term feature of the current search term, the resource feature of each multimedia resource, and the pre-trained prediction model includes: predicting the attention and satisfaction of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a prediction model; the attention degree is used for reflecting the sequence of click operation executed by the user on different multimedia resources, and the satisfaction degree is used for reflecting the feedback operation of the user on the satisfaction information of each multimedia resource; and determining the click rate of each multimedia resource according to the attention and the satisfaction of each multimedia resource.
Optionally, the predicting the attention and satisfaction of each multimedia resource according to the search term feature of the current search term, the resource feature of each multimedia resource, and the prediction model includes: determining the fusion characteristics of each multimedia resource according to the search word characteristics of the current search word and the resource characteristics of each multimedia resource; the fusion characteristics comprise characteristics obtained by splicing the search word characteristics of the current search word and the resource characteristics of each multimedia resource; and inputting the fusion characteristics of each multimedia resource into a prediction model to obtain the attention and satisfaction of each multimedia resource.
Optionally, the determining the fusion characteristics of each multimedia resource according to the search term characteristics of the current search term and the resource characteristics of each multimedia resource includes: carrying out cross processing on the search word characteristics of the current search word and the resource characteristics of each multimedia resource to obtain the cross characteristics of each multimedia resource and the current search word; and splicing the search word characteristics of the current search word, the resource characteristics of each multimedia resource and the cross characteristics corresponding to each multimedia resource to obtain the fusion characteristics of each multimedia resource.
Optionally, the search term feature of the current search term is obtained by splicing the text feature of the current search term and the embedded feature of the identifier of the current search term.
Optionally, the resource feature of each multimedia resource is obtained by concatenating at least two of the text feature of the description of each multimedia resource, the image feature of each multimedia resource, and the embedded feature of the identifier of each multimedia resource.
Optionally, the method further includes: acquiring a plurality of groups of training samples; each group of training samples comprises search word characteristics of a sample search word, resource characteristics of sample multimedia resources matched with the sample search word, sample attention and sample satisfaction of the sample multimedia resources; and performing iterative training on the initial prediction model according to the obtained training samples to obtain the prediction model.
Optionally, the iteratively training the initial prediction model according to the obtained training samples to obtain the prediction model includes: predicting to obtain the estimated attention and the estimated satisfaction of the sample multimedia resources according to the search word characteristics of the sample search words, the resource characteristics of the sample multimedia resources and the initial prediction model; determining the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resources and the sample attention of the sample multimedia resources; determining the satisfaction loss of the initial prediction model according to the pre-estimated satisfaction of the sample multimedia resources and the sample satisfaction of the sample multimedia resources; and optimizing the initial prediction model according to the attention loss of the initial prediction model and the satisfaction loss of the initial prediction model to obtain the prediction model.
Optionally, the determining the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resource and the sample attention of the sample multimedia resource includes: determining the estimated neglect degree of the sample multimedia resource according to the estimated attention degree of the sample multimedia resource; the estimated ignorance of the sample multimedia resource is inversely related to the estimated attention of the sample multimedia resource; determining the sample neglect degree of the sample multimedia resource according to the sample attention degree of the sample multimedia resource; the sample neglect degree of the sample multimedia resource is inversely related to the sample attention degree of the sample multimedia resource; and determining the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resource, the estimated neglect of the sample multimedia resource, the sample attention of the sample multimedia resource and the sample neglect of the sample multimedia resource.
Optionally, the determining the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resource, the estimated neglect of the sample multimedia resource, the sample attention of the sample multimedia resource, and the sample neglect of the sample multimedia resource includes: determining a logarithm value of the estimated attention of each sample multimedia resource and a logarithm value of the estimated ignorance of each sample multimedia resource, and respectively calculating a first product and a second product; the first product is the product of the logarithm value of the estimated attention of each sample multimedia resource and the sample attention of each sample multimedia resource, and the second product is the product of the logarithm value of the estimated ignorance of each sample multimedia resource and the sample ignorance of each sample multimedia resource; and calculating the attention loss of the initial prediction model according to the first product and the second product obtained by calculation.
Optionally, the determining the satisfaction loss of the initial prediction model according to the estimated satisfaction of the sample multimedia resource and the sample satisfaction of the sample multimedia resource includes: determining the estimated rejection degree of the sample multimedia resources according to the estimated satisfaction degree of the sample multimedia resources; the estimated rejection of the sample multimedia resource is inversely related to the estimated satisfaction of the sample multimedia resource; determining the sample rejection degree of the sample multimedia resources according to the sample satisfaction degree of the sample multimedia resources; the sample rejection degree of the sample multimedia resource and the sample satisfaction degree of the sample multimedia resource; and determining the satisfaction loss of the initial prediction model according to the estimated satisfaction of the sample multimedia resources, the estimated rejection of the sample multimedia resources, the sample satisfaction of the sample multimedia resources and the sample rejection of the sample multimedia resources.
Optionally, the determining the satisfaction loss of the initial prediction model according to the estimated satisfaction of the sample multimedia resource, the estimated rejection of the sample multimedia resource, the sample satisfaction of the sample multimedia resource, and the sample rejection of the sample multimedia resource includes: determining a logarithm value of the estimated satisfaction degree of each sample multimedia resource and a logarithm value of the estimated repulsion degree of each sample multimedia resource, and respectively calculating a third product and a fourth product; the third product is the product of the logarithm of the estimated satisfaction of each sample multimedia resource and the sample satisfaction of each sample multimedia resource, and the fourth product is the product of the logarithm of the estimated rejection of each sample multimedia resource and the sample rejection of each sample multimedia resource; and calculating the satisfaction loss of the initial prediction model according to the third product and the fourth product obtained by calculation.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for ranking search results, including an obtaining unit, a predicting unit, and a ranking unit; the device comprises an acquisition unit, a search unit and a search unit, wherein the acquisition unit is used for acquiring a current search word and a plurality of multimedia resources related to the current search word; the prediction unit is used for predicting the click rate of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a pre-trained prediction model; the search word characteristics of the current search word are used for identifying the current search word, and the prediction model is obtained by training according to the sample search word, a plurality of sample multimedia resources related to the sample search word and the sample operation record of each sample multimedia resource; the sample operation records are used for representing the click rate of each sample multimedia resource when a user searches for sample search terms in a historical time period; and the sequencing unit is used for sequencing the plurality of multimedia resources according to the click rate of each multimedia resource.
Optionally, the prediction unit is specifically configured to: predicting the attention and satisfaction of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a prediction model; the attention degree is used for reflecting the sequence of click operation executed by the user on different multimedia resources, and the satisfaction degree is used for reflecting the feedback operation of the user on the satisfaction information of each multimedia resource; and determining the click rate of each multimedia resource according to the attention and the satisfaction of each multimedia resource.
Optionally, the prediction unit is specifically configured to: determining the fusion characteristics of each multimedia resource according to the search word characteristics of the current search word and the resource characteristics of each multimedia resource; the fusion characteristics comprise characteristics obtained by splicing the search word characteristics of the current search word and the resource characteristics of each multimedia resource; and inputting the fusion characteristics of each multimedia resource into a prediction model to obtain the attention and satisfaction of each multimedia resource.
Optionally, the prediction unit is specifically configured to: carrying out cross processing on the search word characteristics of the current search word and the resource characteristics of each multimedia resource to obtain the cross characteristics of each multimedia resource and the current search word; and splicing the search word characteristics of the current search word, the resource characteristics of each multimedia resource and the cross characteristics corresponding to each multimedia resource to obtain the fusion characteristics of each multimedia resource.
Optionally, the search term feature of the current search term is obtained by splicing the text feature of the current search term and the embedded feature of the identifier of the current search term.
Optionally, the resource feature of each multimedia resource is obtained by concatenating at least two of the text feature of the description of each multimedia resource, the image feature of each multimedia resource, and the embedded feature of the identifier of each multimedia resource.
Optionally, the sorting apparatus further includes a training unit; the acquisition unit is also used for acquiring a plurality of groups of training samples; each group of training samples comprises search word characteristics of a sample search word, resource characteristics of sample multimedia resources matched with the sample search word, sample attention and sample satisfaction of the sample multimedia resources; and the training unit is used for carrying out iterative training on the initial prediction model according to the obtained training samples so as to obtain the prediction model.
Optionally, the training unit is specifically configured to: predicting to obtain the estimated attention and the estimated satisfaction of the sample multimedia resources according to the search word characteristics of the sample search words, the resource characteristics of the sample multimedia resources and the initial prediction model; determining the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resources and the sample attention of the sample multimedia resources; determining the satisfaction loss of the initial prediction model according to the pre-estimated satisfaction of the sample multimedia resources and the sample satisfaction of the sample multimedia resources; and optimizing the initial prediction model according to the attention loss of the initial prediction model and the satisfaction loss of the initial prediction model to obtain the prediction model.
Optionally, the training unit is specifically configured to: determining the estimated neglect degree of the sample multimedia resource according to the estimated attention degree of the sample multimedia resource; the estimated ignorance of the sample multimedia resource is inversely related to the estimated attention of the sample multimedia resource; determining the sample neglect degree of the sample multimedia resource according to the sample attention degree of the sample multimedia resource; the sample neglect degree of the sample multimedia resource is inversely related to the sample attention degree of the sample multimedia resource; and determining the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resource, the estimated neglect of the sample multimedia resource, the sample attention of the sample multimedia resource and the sample neglect of the sample multimedia resource.
Optionally, the training unit is specifically configured to: determining a logarithm value of the estimated attention of each sample multimedia resource and a logarithm value of the estimated ignorance of each sample multimedia resource, and respectively calculating a first product and a second product; the first product is the product of the logarithm value of the estimated attention of each sample multimedia resource and the sample attention of each sample multimedia resource, and the second product is the product of the logarithm value of the estimated ignorance of each sample multimedia resource and the sample ignorance of each sample multimedia resource; and calculating the attention loss of the initial prediction model according to the first product and the second product obtained by calculation.
Optionally, the training unit is specifically configured to: determining the estimated rejection degree of the sample multimedia resources according to the estimated satisfaction degree of the sample multimedia resources; the estimated rejection of the sample multimedia resource is inversely related to the estimated satisfaction of the sample multimedia resource; determining the sample rejection degree of the sample multimedia resources according to the sample satisfaction degree of the sample multimedia resources; the sample rejection degree of the sample multimedia resource and the sample satisfaction degree of the sample multimedia resource; and determining the satisfaction loss of the initial prediction model according to the estimated satisfaction of the sample multimedia resources, the estimated rejection of the sample multimedia resources, the sample satisfaction of the sample multimedia resources and the sample rejection of the sample multimedia resources.
Optionally, the training unit is specifically configured to: determining a logarithm value of the estimated satisfaction degree of each sample multimedia resource and a logarithm value of the estimated repulsion degree of each sample multimedia resource, and respectively calculating a third product and a fourth product; the third product is the product of the logarithm of the estimated satisfaction of each sample multimedia resource and the sample satisfaction of each sample multimedia resource, and the fourth product is the product of the logarithm of the estimated rejection of each sample multimedia resource and the sample rejection of each sample multimedia resource; and calculating the satisfaction loss of the initial prediction model according to the third product and the fourth product obtained by calculation.
According to a third aspect of the embodiments of the present disclosure, there is provided a server, including: a processor, a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the method of ranking search results as provided by the first aspect and any of its possible designs.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of a server, enable the server to perform a method of ranking search results as provided by the first aspect and any one of its possible designs.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when run on a server, cause the server to perform a method of ranking search results as provided by the first aspect and any of its possible designs.
The technical scheme provided by the disclosure at least brings the following beneficial effects: the click rate of each multimedia resource can be determined according to the search term characteristics of the current search term, the resource characteristics of each multimedia resource and a pre-trained prediction model. The prediction model is obtained by training according to the sample search terms, the plurality of sample multimedia resources related to the sample search terms and the sample operation records of each sample multimedia resource, and the sample operation records are used for representing the click rate of each sample multimedia resource when the user searches the sample search terms in the historical time period, so that the click rate of the multimedia resources can be predicted under the condition that posterior behavior data does not exist, the ranking of the multimedia resources is participated, and the search results returned to the user can be more accurate.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a block diagram illustrating a multi-search system in accordance with an exemplary embodiment;
FIG. 2 is one of the flow diagrams illustrating a method of ranking search results according to an exemplary embodiment;
FIG. 3 is a second flowchart illustrating a method for ranking search results according to an exemplary embodiment;
FIG. 4 is a third flowchart illustrating a method of ranking search results according to an exemplary embodiment;
FIG. 5 is a fourth flowchart illustrating a method of ranking search results according to an exemplary embodiment;
FIG. 6 is a fifth flowchart illustrating a method of ranking search results according to an exemplary embodiment;
FIG. 7 is a sixth flowchart illustrating a method of ranking search results according to an exemplary embodiment;
FIG. 8 is a block diagram illustrating an apparatus for ranking search results in accordance with an exemplary embodiment;
fig. 9 is a schematic diagram illustrating a configuration of a server according to an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In addition, in the description of the embodiments of the present disclosure, "/" indicates an OR meaning, for example, A/B may indicate A or B, unless otherwise specified. "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, in the description of the embodiments of the present disclosure, "a plurality" means two or more than two.
The search result ordering method provided by the embodiment of the disclosure can be applied to a search system. Fig. 1 shows a schematic structural diagram of the search system. As shown in fig. 1, the search system 10 is used to solve the problem of inaccurate ranking of search results in the related art. The search system 10 includes a ranking device (hereinafter simply referred to as a ranking device for convenience of description) 11 of search results and a server 12. The sorting means 11 are connected to a server 12. The sorting apparatus 11 and the server 12 may be connected in a wired manner or in a wireless manner, which is not limited in the embodiment of the present disclosure.
The sorting apparatus 11 may be used for data interaction with the server 12, for example, the sorting apparatus 11 may obtain a search word and a search result corresponding to the search word from the server 12.
The sorting device 11 may also execute a sorting method of search results in the embodiment of the present disclosure, for example, determine the attention and the satisfaction of each multimedia resource in the search results for the obtained search terms and the search results corresponding to the search terms, calculate the click rate of each multimedia resource according to the attention and the satisfaction of each multimedia resource, and sort the multimedia resources in the search results according to the determined click rate.
It should be noted that the multimedia resources related to the embodiment of the present disclosure may include resources such as video, audio, and graphics, and the embodiment of the present disclosure is not limited specifically here. Meanwhile, in the following description of the embodiment of the present disclosure, the following description is performed by taking a video as an example of a multimedia resource, and other resources such as audio, graphics, and text may refer to the following description.
The ranking means 11 may also send the ranking results of the multimedia assets in the search results to the server 12.
The server 12 is configured to receive a search request sent by a user equipment of a user, determine a multimedia resource related to a search term according to the search term in the search request, and send the search term and an identifier of the multimedia resource in a search result to the sorting apparatus 11.
Meanwhile, the server 12 is further configured to receive the ranking result sent by the ranking device 11, and return the ranked search result to the user.
It should be noted that the sorting apparatus 11 and the server 12 may be independent devices or may be integrated into the same device, and the present invention is not limited to this.
When the sequencing device 11 and the server 12 are integrated in the same device, the communication mode between the sequencing device 11 and the server 12 is the communication between the internal modules of the device. In this case, the communication flow between the two is the same as the "communication flow between the sequencer 11 and the server 12 when they are independent of each other".
In the following embodiments provided by the present invention, the present invention is described by taking an example in which the sorting device 11 and the server 12 are set independently of each other.
In practical applications, the method for sorting search results provided by the embodiment of the present invention may be applied to a sorting apparatus, and may also be applied to a server.
As shown in fig. 2, the method for ranking search results provided by the embodiment of the present disclosure includes the following steps S201 to S204.
S201, the sequencing device obtains a current search word and a plurality of multimedia resources related to the current search word.
As a possible implementation manner, the sorting apparatus obtains the current search term and a plurality of multimedia resources related to the current search term from the server.
It should be noted that the number of the plurality of multimedia resources is two or more. The current search word is a search word in a search request sent to a server by a user through user equipment.
Correspondingly, after receiving the search request, the server acquires the current search word from the search request, and determines a plurality of multimedia resources related to the current search word as the search result of the current search word according to the current search word.
The method for determining the plurality of multimedia resources related to the current search term by the server may specifically adopt a mode of similarity between the search term and the multimedia resources, and is not described herein again.
As another possible implementation manner, the sorting apparatus obtains the current search term from the server, and determines a plurality of multimedia resources related to the current search term according to the current search term.
The specific implementation of this step may refer to the above steps of the server, and will not be described herein again, but the difference lies in that the execution subject is different.
S202, the sequencing device respectively determines the search term characteristics of the current search term and the resource characteristics of each multimedia resource.
Wherein the search term characteristics of a search term are used to uniquely identify a search term. The resource characteristics of a multimedia asset are used to uniquely identify a multimedia asset.
It will be appreciated that the search term characteristics of the current search term are used to identify the current search term, and the resource characteristics of the multimedia resource are used to uniquely identify the multimedia resource.
For a specific implementation of this step, reference may be made to the subsequent description of the embodiment of the present disclosure, and details are not repeated here.
S203, the sequencing device predicts the click rate of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a pre-trained prediction model.
The prediction model is obtained by training according to the sample search terms, a plurality of sample multimedia resources related to the sample search terms and the sample operation records of each sample multimedia resource. The sample operation records are used for representing the click rate of each sample multimedia resource when a user searches for the sample search terms in a historical time period. The click-through rate of a multimedia asset is used to reflect the probability that the multimedia asset is clicked on by the user.
As a possible implementation mode, the sequencing device inputs the search term characteristics of the current search term and the resource characteristics of each multimedia resource into a pre-trained prediction model to obtain the click rate of each multimedia resource.
The specific implementation manner of this step may refer to the subsequent description of the embodiment of the present disclosure, and is not described herein again.
S204, the sequencing device sequences the multimedia resources according to the click rate of each multimedia resource.
As a possible implementation manner, the sorting device sorts the plurality of multimedia resources according to the click rate of each of the plurality of multimedia resources.
The technical scheme provided by the disclosure at least brings the following beneficial effects: the click rate of each multimedia resource can be determined according to the search term characteristics of the current search term, the resource characteristics of each multimedia resource and a pre-trained prediction model. The prediction model is obtained by training according to the sample search terms, the plurality of sample multimedia resources related to the sample search terms and the sample operation records of each sample multimedia resource, and the sample operation records are used for representing the click rate of each sample multimedia resource when the user searches the sample search terms in the historical time period, so that the click rate of the multimedia resources can be predicted under the condition that posterior behavior data does not exist, the ranking of the multimedia resources is participated, and the search results returned to the user can be more accurate.
In one design, in order to determine the click rate of each multimedia resource predicted by the prediction model, as shown in fig. 3, S203 provided in this embodiment of the disclosure specifically includes following S2031 to S2032.
S2031, the sequencing device predicts the attention and satisfaction of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a pre-trained prediction model.
The attention degree is used for reflecting the sequence of click operation executed by the user on different multimedia resources, and the satisfaction degree is used for reflecting the feedback operation of the user on the satisfaction information of each multimedia resource.
It can be understood that the attention of one multimedia resource is also used for representing the attraction degree of the one multimedia resource to the user under different search terms, and the attention of one multimedia resource is in direct proportion to the sequence of the click operations executed by the user in the search result of the one multimedia resource, i.e. the higher the attention of one multimedia resource is, the more easily the one multimedia resource is clicked by the user.
The satisfaction information may specifically include information such as a playing duration, whether to approve, whether to pay attention to, and the like, where the satisfaction of one multimedia resource is related to a feedback operation of the user on the one multimedia resource, that is, the higher the satisfaction of one multimedia resource is, the more easily the one multimedia resource is approved, paid attention to, or played for a long time by the user.
The above-mentioned attention and satisfaction are both values of 0 or more and 1 or less.
As a possible implementation manner, the sorting device determines the fusion feature of each multimedia resource according to the search term feature of the current search term and the resource feature of each multimedia resource, and inputs the fusion feature of each multimedia resource into the prediction model to obtain the attention and satisfaction of each multimedia resource.
The fusion characteristics comprise the search word characteristics of the current search word and fusion characteristics obtained by splicing the resource characteristics of each multimedia resource.
The specific implementation manner in this step may refer to the subsequent description of the embodiment of the present disclosure, and is not described herein again.
S2032, the sequencing device determines the click rate of each multimedia resource according to the attention and satisfaction of each multimedia resource.
As a possible implementation manner, the ranking device determines the product of the attention and the satisfaction of each multimedia resource, which is the click rate of each multimedia resource.
The technical scheme provided by the disclosure at least brings the following beneficial effects: considering the relevance between the search term characteristics of the search terms and the resource characteristics of the multimedia resources and the attention and satisfaction respectively, the attention and satisfaction of each multimedia resource can be predicted according to the search term characteristics of the current search term, the resource characteristics of each multimedia resource and a pre-trained prediction model, and then the click rate of each multimedia resource can be determined. Therefore, the click rate obtained through prediction of the prediction model can be ensured to be more accurate.
In order to be able to determine the search term characteristics of the current search term, the embodiments of the present disclosure provide at least one of the following methods of determining the search term characteristics of the current search term.
In one design, the search term features of the current search term provided by embodiments of the present disclosure may be text features of the current search term.
Specifically, the sorting device inputs the current search word into a preset text model, and the text model performs semantic analysis on the current search word to obtain the text characteristics of the current search word.
For example, the preset text model may be a chinese word vector model (Chinese word 2vector), and the text feature of the current search word may be a text vector.
In one design, the search term features of the current search term provided by the embodiments of the present disclosure may also be embedded features of the identity of the current search term.
Specifically, after the sorting device obtains the current search word, the sorting device determines the identifier of the current search word in the prediction model, inputs the identifier of the current search word into a preset word vector model, and performs embedding processing on the identifier of the current search word through the word vector model to obtain the embedding characteristics of the identifier of the current search word output by the word vector model.
It can be understood that the word vector model can obtain the word vector of the identifier of the current search word in a word embedding mode, so that the identifier of the search word can be converted into a vector form which can be processed by a computer, and the processability and expression capability of the search word are improved.
For example, the preset word vector model may be a word embedding model (word embedding), and the embedded feature of the identifier of the current search word may be an embedding vector.
In one design, in order to enable the search term features of the current search term to comprehensively reflect the features of the current search term, the search term features of the current search term provided in the embodiments of the present disclosure may also be obtained by concatenating the text features of the current search term and the embedded features of the identifier of the current search term.
Specifically, after determining the text feature of the current search word and the embedded feature of the identifier of the current search word, the sorting device splices and merges the text feature of the current search word and the embedded feature of the identifier of the current search word to obtain the search word feature of the current search word.
It should be noted that, in the process of splicing and merging the text features and the embedded features, the embodiment of the present disclosure does not limit the sequence of the text features and the embedded features in the search term features of the current search term.
For example, in the case that the text feature is a text vector and the embedded feature is an embedded vector, the text feature of the current search word may be [ a, b, c ], the identified embedded feature of the current search word is [ d, e, f, g ], and the search word feature of the current search word obtained by merging and splicing the text feature of the current search word and the identified embedded feature of the current search word may be [ a, b, c, d, e, f, g ].
It should be noted that the above manner for determining the search term features of the current search term may also be applicable to the step of determining the search term features of the sample search terms in the subsequent training process of the prediction model, and details are not repeated in the following of the embodiment of the present disclosure.
The technical scheme provided by the disclosure at least brings the following beneficial effects: the text features of the current search terms are spliced with the embedded features of the identifications of the current search terms to obtain the search term features of the current search terms, so that the determined search term features can uniquely and comprehensively identify the current search terms.
In order to be able to determine a resource characteristic of a multimedia resource, the embodiments of the present disclosure provide at least one of the following methods of determining a resource characteristic of a multimedia resource.
In one design, the resource features of the multimedia resource provided by the embodiments of the present disclosure may be textual features of a description of the multimedia resource.
Specifically, the sequencing device inputs the description of the multimedia resource into a preset text model, and the text model performs semantic analysis on the description of the multimedia resource to obtain the text features of the description of the multimedia resource.
It should be noted that the description of the multimedia resource may be a description in the content of the multimedia resource, or a description on the cover of the multimedia resource.
For example, the preset text model may be a chinese word vector model (Chinese word 2vector), and the text feature of the description of the multimedia resource may be a text vector.
In one design, the resource feature of the multimedia resource provided by the embodiment of the disclosure may also be an image feature of the multimedia resource.
Specifically, the sequencing device inputs the multimedia resource into a preset convolutional neural network, and the convolutional neural network processes the multimedia resource to obtain the image characteristics of the multimedia resource.
It is understood that the image feature of the multimedia asset is an image feature of an image frame included in the multimedia asset.
In one design, the resource features of the multimedia resource provided by the embodiments of the present disclosure may also be embedded features of the identity of the multimedia resource.
Specifically, after the sequencing device obtains the multimedia resource, the identifier of the multimedia resource in the prediction model is determined, and the identifier of the multimedia resource is input into a preset word vector model, so that the identifier of the multimedia resource is embedded through the word vector model, and the embedding characteristics of the identifier of the multimedia resource output by the word vector model are obtained.
It can be understood that the word vector model can obtain the word vectors of the identifiers of the multimedia resources in a word embedding mode, so that the identifiers of the search words can be converted into a vector form which can be processed by a computer, and the processability and expression capability of the search words are improved.
For example, the preset word vector model may be a word embedding model (word embedding), and the embedded feature of the identifier of the multimedia resource may be an embedding vector.
In one design, in order to enable the resource features of the multimedia resources to comprehensively reflect the features of the multimedia resources, the resource features of the multimedia resources provided by the embodiments of the present disclosure may also be obtained by concatenating at least two of the text features of the description of each multimedia resource, the image features of each multimedia resource, and the embedded features of the identifier of each multimedia resource.
Specifically, after determining the text feature of the description of the multimedia resource, the image feature of the multimedia resource, and the embedded feature of the identifier of the multimedia resource, the sorting device splices and merges at least two of the text feature of the description of the multimedia resource, the image feature of the multimedia resource, and the embedded feature of the identifier of the multimedia resource to obtain the resource feature of the multimedia resource.
It should be noted that, in the process of splicing and merging the text features, and/or the image features, and/or the embedded features, the embodiment of the present disclosure does not limit the order of the text features, the image features, and the embedded features in the resource features of the multimedia resources.
For example, in the case that the text feature is a text vector, the image feature and the embedded feature are embedded vectors, the text feature of the description of the multimedia resource may be [1, 2, 3, 4], the image feature of the multimedia feature may be [3, 1, 4, 2], the embedded feature of the identifier of the multimedia resource is [3, 5, 3, 2], and the resource feature of the multimedia resource obtained by splicing and combining the text feature of the description of the multimedia resource, the image feature of the multimedia resource and the embedded feature of the identifier of the multimedia resource may be [1, 2, 3, 4, 3, 1, 4, 2, 3, 5, 3, 2 ].
The technical scheme provided by the disclosure at least brings the following beneficial effects: the resource characteristics of the multimedia resources are obtained by splicing and combining at least two of the text characteristics of the description of the multimedia resources, the image characteristics of the multimedia resources and the embedded characteristics of the identification of the multimedia resources, so that the determined resource characteristics can uniquely and comprehensively identify the multimedia resources.
In one design, in order to predict the attention and satisfaction of each multimedia resource, as shown in fig. 4, S2031 provided in the embodiment of the present disclosure may specifically include the following S301 to S302.
S301, the sequencing device determines the fusion characteristics of each multimedia resource according to the search term characteristics of the current search term and the resource characteristics of each multimedia resource.
The first fusion characteristics comprise the search word characteristics of the current search word and fusion characteristics obtained by splicing the resource characteristics of each multimedia resource.
As a possible implementation manner, the sorting device splices and merges the search term features of the current search term and the resource features of each multimedia resource to obtain the fusion features of each multimedia resource.
The specific implementation manner of the splicing and merging of the search term features and the resource features in this step may refer to the above splicing and merging of the feature vectors in the embodiment of the present disclosure, which is not described herein again, but the difference lies in that the spliced objects are different.
In some embodiments, as shown in fig. 5, the embodiment of the present disclosure further shows another implementation manner of this step S301, specifically as following S3011-S3012.
S3011, the sequencing device carries out cross processing on the search word features of the current search word and the resource features of each multimedia resource to obtain cross features of each multimedia resource and the current search word.
Wherein the cross-feature is used to reflect the relevance between the current search term and each multimedia resource.
For example, the cross feature may be a combined cross between the search term feature of the current search term and the resource feature of each multimedia resource.
The specific implementation manner of this step may refer to an implementation manner of combination of feature vectors in the prior art, and details are not described here.
S3012, the sequencing device splices the search word features of the current search word, the resource features of each multimedia resource and the corresponding cross features of each multimedia resource to obtain the fusion features of each multimedia resource.
As a possible implementation manner, the implementation manner of this step may specifically refer to the above-mentioned splicing manner for the feature vectors in the embodiment of the present disclosure, and the difference is that the spliced objects are different, and details are not described here again.
The technical scheme provided by the disclosure at least brings the following beneficial effects: the search word characteristics of the current search word and the resource characteristics of the multimedia resources are subjected to cross processing, the search word characteristics, the resource characteristics and the cross characteristics obtained through the cross processing are spliced into the fusion characteristics of the multimedia resources, and the cross characteristics obtained through the cross processing reflect the correlation between the search word and the multimedia resources, so that the obtained fusion characteristics also include the correlation between the current search word and the multimedia resources, and the determined fusion characteristics can be ensured to be more comprehensive.
S302, the sequencing device inputs the fusion characteristics of each multimedia resource into a prediction model to obtain the attention and satisfaction of each multimedia resource.
As a possible implementation manner, the sorting device respectively inputs the fusion characteristics of each multimedia resource into the prediction model to obtain the attention and satisfaction of each multimedia resource output by the prediction model.
The technical scheme provided by the disclosure at least brings the following beneficial effects: the fusion characteristics comprise the search word characteristics of the current search word and the fusion characteristics obtained after the resource characteristics of each multimedia resource are spliced, namely the fusion characteristics comprise the search word characteristics of the current search word and the resource characteristics of the multimedia resources, so that the influence on the attention and the satisfaction can be considered from the multi-dimension of the search word and the multimedia resources, and the attention and the satisfaction obtained through prediction can be more accurate.
In one design, since the prediction model is trained in advance, in order to train the prediction model, as shown in fig. 6, the method for ranking the search results according to the embodiment of the present disclosure further includes following steps S401 to S402.
S401, the sequencing device obtains a plurality of groups of training samples.
Each group of training samples comprises the search term characteristics of a sample search term, the resource characteristics of sample multimedia resources matched with the sample search term, and the sample attention and the sample satisfaction of the sample multimedia resources.
As a possible implementation manner, the sorting device obtains, from the server, the sample search terms requested by different users within the historical duration, the search term characteristics of each sample search term, the resource characteristics of the sample multimedia resources matched with each sample search term, and the sample attention and the sample satisfaction of the sample multimedia resources.
It should be noted that the sample attention degree of each sample multimedia resource obtained in this step may be determined according to the click operation of the user on the sample multimedia resource in the search result corresponding to the sample search term in the historical long time. The sample satisfaction of each sample multimedia resource can be determined according to the playing time of the user on the sample multimedia resource in the historical time, whether the user likes or not, whether the user pays attention to the sample multimedia resource, and the like.
In this step, the implementation manner of the sorting apparatus for obtaining the search term features of the sample search terms may refer to the implementation manner of the sorting apparatus for obtaining the search term features of the current search term in the above embodiments of the present disclosure, which is not described herein again, but the difference is that the obtained search term features have different objects.
Meanwhile, in this step, the implementation manner of the sorting device for obtaining the resource features of the sample multimedia resources may refer to the implementation manner of the sorting device for obtaining the resource features of the multiple multimedia resources in the above embodiments of the present disclosure, which is not described herein again, but the difference is that the obtained multimedia resource features have different objects.
S402, the sequencing device conducts iterative training on the initial prediction model according to the obtained training samples to obtain the prediction model.
As a possibility, the sequencing device performs iterative training on the initial prediction model according to the obtained training samples until the initial prediction model is trained to fit, so as to obtain a trained prediction model.
The specific implementation manner of this step may refer to the following description of the embodiment of the present disclosure, and is not described herein again.
The technical scheme provided by the disclosure at least brings the following beneficial effects: the initial prediction model can be iteratively trained according to the sample data to obtain a trained prediction model. And the student set in the sample data is the search word characteristics of the sample search words and the resource characteristics of the sample multimedia resources, and the teacher set is the sample attention and the sample satisfaction of the multimedia resources. Therefore, the trained prediction model comprises the fitting relation between the search words and the multimedia resources and the attention and satisfaction degrees, and the attention and satisfaction degrees obtained through prediction can be more accurate.
In one design, in order to train a prediction model that is fit, as shown in fig. 7, S402 provided in this embodiment of the disclosure specifically includes the following S4021 to S4024.
S4021, predicting to obtain the estimated attention and the estimated satisfaction of the sample multimedia resources by the sequencing device according to the search word characteristics of the sample search words, the resource characteristics of the sample multimedia resources and the initial prediction model.
The estimated attention is the predicted attention, and the estimated satisfaction is the predicted satisfaction.
As a possible implementation manner, the sorting device may splice the search term characteristics of the sample search term and the resource characteristics of the sample multimedia resources to obtain sample fusion characteristics, and input the sample fusion characteristics into the initial prediction model to predict the estimated attention and the estimated satisfaction.
The specific implementation manner of this step may refer to the above splicing of the embodiment of the present disclosure to obtain the description of the fusion feature, and is not described here again.
As another possible implementation manner, the sorting device may perform cross processing on the search term features of the sample search terms and the resource features of the sample multimedia resources to obtain sample cross features, splice and combine the search term features of the sample search terms, the resource features of the sample multimedia resources, and the sample cross features to obtain sample fusion features, and input the sample fusion features into the initial prediction model to predict and obtain the predicted attention and the predicted satisfaction.
The specific implementation manner of this step may refer to the above splicing of the embodiment of the present disclosure to obtain the description of the fusion feature, and is not described here again.
S4022, determining attention loss of the initial prediction model by the sequencing device according to the estimated attention of the sample multimedia resources and the sample attention of the sample multimedia resources.
As a possible implementation manner, the sorting apparatus may determine the estimated ignorance of the sample multimedia resource and the sample ignorance of the sample multimedia resource, and further determine the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resource, the estimated ignorance of the sample multimedia resource, the sample attention of the sample multimedia resource, and the sample ignore of the sample multimedia resource.
Wherein the estimated ignorance of the sample multimedia resource is inversely related to the estimated attention of the sample multimedia resource. The sample ignorance of the sample multimedia asset is inversely related to the sample attention of the sample multimedia asset.
The specific implementation manner of this step may refer to the subsequent description of the embodiment of the present disclosure, and is not described herein again.
S4023, determining the satisfaction loss of the initial prediction model by the sequencing device according to the pre-estimated satisfaction of the sample multimedia resources and the sample satisfaction of the sample multimedia resources.
As a possible implementation manner, the sorting apparatus may determine the estimated rejection of the sample multimedia resource and the sample rejection of the sample multimedia resource, and further determine the satisfaction loss of the initial prediction model according to the estimated satisfaction of the sample multimedia resource, the estimated rejection of the sample multimedia resource, the sample satisfaction of the sample multimedia resource and the sample rejection of the sample multimedia resource.
Wherein the estimated rejection of the sample multimedia resource is inversely related to the estimated satisfaction of the sample multimedia resource. The sample rejection of the sample multimedia asset and the sample satisfaction of the sample multimedia asset.
The specific implementation manner of this step may refer to the subsequent description of the embodiment of the present disclosure, and is not described herein again.
It should be noted that in a specific implementation process of the embodiment of the present disclosure, S4022 may be executed first and then S4023 may be executed, S4023 may be executed first and then S4022 may be executed, and S4022 and S4023 may be executed simultaneously, which is not limited in the embodiment of the present disclosure.
S4024, optimizing the initial prediction model by the sequencing device according to the attention loss of the initial prediction model and the satisfaction loss of the initial prediction model to obtain the prediction model.
As a possible implementation manner, the ranking device determines that the sum of the attention loss of the initial prediction model and the satisfaction loss of the initial prediction model is the total loss of the initial prediction model, and performs back propagation optimization on the initial prediction model according to the total loss to obtain the prediction model.
It should be noted that, in this step, the initial prediction model is optimized according to the total loss, which may refer to the description in the Stochastic Gradient Descent (SGD) method in the back propagation, and is not described herein again.
The technical scheme provided by the disclosure at least brings the following beneficial effects: the initial prediction model can be subjected to back propagation optimization through corresponding attention loss and satisfaction loss, and the accuracy of the initial prediction model in the fitting process can be ensured.
In some designs, in order to determine the attention loss of the initial prediction model, the above S4022 provided by the embodiments of the present disclosure specifically includes the following S501 to S503.
S501, the ordering device determines the estimated neglect degree of the sample multimedia resources according to the estimated attention degree of the sample multimedia resources.
Wherein the estimated ignorance of the sample multimedia resource is inversely related to the estimated attention of the sample multimedia resource. The estimated ignorance is used for representing the predicted probability that the user does not click on the sample multimedia resource.
As a possible implementation manner, the sorting apparatus may use the difference between 1 and the estimated attention as the estimated ignorance of the sample multimedia resource.
S502, the sequencing device determines the sample neglect degree of the sample multimedia resource according to the sample attention degree of the sample multimedia resource.
Wherein the sample ignorance of the sample multimedia asset is inversely related to the sample attention of the sample multimedia asset. The sample ignorance of the sample multimedia asset is used to represent a probability that the user is not paying attention to the sample multimedia asset.
As a possible implementation manner, the sorting apparatus may determine a difference value between 1 and the sample attention of the sample multimedia resource as a sample neglect degree of the sample multimedia resource.
S503, the sequencing device determines the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resources, the estimated neglect of the sample multimedia resources, the sample attention of the sample multimedia resources and the sample neglect of the sample multimedia resources.
As a possible implementation, the sorting apparatus determines a logarithm of the estimated interest and a logarithm of the estimated ignorance of each sample multimedia resource, and calculates a first product and a second product, respectively.
The first product is the product of the logarithm value of the estimated attention of each sample multimedia resource and the sample attention, and the second product is the product of the logarithm value of the estimated ignorance of each sample multimedia resource and the sample ignorance.
Further, the ranking device calculates the attention loss of the initial prediction model according to the calculated first product and the second product.
In some embodiments, the loss of attention of the initial predictive model satisfies the following equation one:
Figure BDA0003329986590000171
therein, lossattractiveIn order to lose the attention of the initial predictive model,
Figure BDA0003329986590000181
is the sample attention of the ith multimedia resource in the sample multimedia resources, p is the estimated attention of the ith sample multimedia resource,
Figure BDA0003329986590000182
is the first product of the first and second products,
Figure BDA0003329986590000183
is the sample neglect of the ith sample multimedia resource, 1-p is the estimated neglect of the ith sample multimedia resource,
Figure BDA0003329986590000184
is the second product.
The technical scheme provided by the disclosure at least brings the following beneficial effects: through the formula, a specific implementation mode for determining the attention loss of the initial prediction model can be provided, and the attention loss can be calculated.
In some designs, in order to be able to determine the satisfaction loss of the initial prediction model, the above-mentioned S4023 provided by the embodiments of the present disclosure specifically includes the following 601-S603.
S601, the ordering device determines the estimated rejection of the sample multimedia resources according to the estimated satisfaction of the sample multimedia resources.
Wherein the estimated rejection of the sample multimedia resource is inversely related to the estimated satisfaction of the sample multimedia resource. The estimated rejections are used to represent the probability that the predicted user is not satisfied with the sample multimedia asset.
As a possible implementation manner, the sorting apparatus may use the difference between 1 and the estimated satisfaction as the estimated rejection of the sample multimedia resource.
S602, the sequencing device determines the sample rejection degree of the sample multimedia resources according to the sample satisfaction degree of the sample multimedia resources.
Wherein the sample rejection of the sample multimedia asset is inversely related to the sample satisfaction of the sample multimedia asset. The sample rejections of the sample multimedia assets are used to represent the probability that the user is not satisfied with the sample multimedia assets.
As a possible implementation manner, the sorting apparatus may determine a difference between 1 and the sample satisfaction of the sample multimedia asset as the sample rejection of the sample multimedia asset.
S603, the sequencing device determines the satisfaction loss of the initial prediction model according to the estimated satisfaction of the sample multimedia resources, the estimated rejection of the sample multimedia resources, the sample satisfaction of the sample multimedia resources and the sample rejection of the sample multimedia resources.
As a possible implementation, the sorting apparatus determines a logarithm of the estimated satisfaction and a logarithm of the estimated rejections of each sample multimedia resource, and calculates a third product and a fourth product, respectively.
The third product is the product of the logarithm of the estimated satisfaction of each sample multimedia resource and the sample satisfaction, and the fourth product is the product of the logarithm of the estimated repulsion of each sample multimedia resource and the sample repulsion.
Further, the ranking device calculates the satisfaction loss of the initial prediction model according to the calculated third product and the fourth product.
In some embodiments, the loss of attention of the initial predictive model satisfies the following equation two:
Figure BDA0003329986590000191
therein, losssatisfyIn order to lose the satisfaction of the initial predictive model,
Figure BDA0003329986590000192
is the sample satisfaction of the ith sample multimedia resource in the sample multimedia resources, q is the estimated satisfaction of the ith sample multimedia resource,
Figure BDA0003329986590000193
is the third product of the first and second products,
Figure BDA0003329986590000194
is the sample rejections of the ith multimedia asset, 1-q is the predicted rejections of the ith media assets,
Figure BDA0003329986590000195
is the fourth product.
The technical scheme provided by the disclosure at least brings the following beneficial effects: through the formula, a specific implementation mode for determining the satisfaction loss of the initial prediction model can be provided, and the satisfaction loss can be calculated.
Fig. 8 is a schematic structural diagram illustrating an apparatus for ranking search results according to an exemplary embodiment. Referring to fig. 8, the ranking apparatus 70 for search results provided by the embodiment of the present disclosure includes an obtaining unit 701, a predicting unit 702, and a ranking unit 703.
An obtaining unit 701 is configured to obtain a current search term and a plurality of multimedia resources related to the current search term.
The predicting unit 702 is configured to predict a click rate of each multimedia resource according to a search term feature of the current search term, a resource feature of each multimedia resource, and a pre-trained prediction model. The search term features of the current search terms are used for identifying the current search terms, and the prediction model is obtained by training according to the sample search terms, a plurality of sample multimedia resources related to the sample search terms and the sample operation records of each sample multimedia resource. The sample operation records are used for representing the click rate of each sample multimedia resource when a user searches for the sample search terms in a historical time period.
The sorting unit 703 is configured to sort the multiple multimedia resources according to the click rate of each multimedia resource.
Optionally, as shown in fig. 8, the prediction unit 702 provided in the embodiment of the present disclosure is specifically configured to:
and predicting the attention and satisfaction of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and the prediction model. The attention degree is used for reflecting the sequence of click operation executed by the user on different multimedia resources, and the satisfaction degree is used for reflecting the feedback operation of the satisfaction information of the user on each multimedia resource.
And determining the click rate of each multimedia resource according to the attention and the satisfaction of each multimedia resource.
Optionally, as shown in fig. 8, the prediction unit 702 provided in the embodiment of the present disclosure is specifically configured to:
and determining the fusion characteristics of each multimedia resource according to the search word characteristics of the current search word and the resource characteristics of each multimedia resource. The fusion characteristics comprise characteristics obtained by splicing the search term characteristics of the current search term and the resource characteristics of each multimedia resource.
And inputting the fusion characteristics of each multimedia resource into a prediction model to obtain the attention and satisfaction of each multimedia resource.
Optionally, as shown in fig. 8, the prediction unit 702 provided in the embodiment of the present disclosure is specifically configured to:
and performing cross processing on the search word characteristics of the current search word and the resource characteristics of each multimedia resource to obtain the cross characteristics of each multimedia resource and the current search word.
And splicing the search word characteristics of the current search word, the resource characteristics of each multimedia resource and the cross characteristics corresponding to each multimedia resource to obtain the fusion characteristics of each multimedia resource.
Optionally, as shown in fig. 8, the search term feature of the current search term provided in the embodiment of the present disclosure is obtained by concatenating the text feature of the current search term and the embedded feature of the identifier of the current search term.
Optionally, as shown in fig. 8, the resource feature of each multimedia resource provided in the embodiment of the present disclosure is obtained by concatenating at least two of the text feature of the description of each multimedia resource, the image feature of each multimedia resource, and the embedded feature of the identifier of each multimedia resource.
Optionally, as shown in fig. 8, the sequencing device 70 provided in the embodiment of the present disclosure further includes a training unit 704.
The obtaining unit 701 is further configured to obtain a plurality of sets of training samples. Each group of training samples comprises the search term characteristics of a sample search term, the resource characteristics of sample multimedia resources matched with the sample search term, and the sample attention and the sample satisfaction of the sample multimedia resources.
A training unit 704, configured to perform iterative training on the initial prediction model according to the obtained training samples to obtain a prediction model.
Optionally, as shown in fig. 8, the training unit 704 provided in the embodiment of the present disclosure is specifically configured to:
and predicting to obtain the estimated attention and the estimated satisfaction of the sample multimedia resources according to the search word characteristics of the sample search words, the resource characteristics of the sample multimedia resources and the initial prediction model.
And determining the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resources and the sample attention of the sample multimedia resources.
And determining the satisfaction loss of the initial prediction model according to the estimated satisfaction of the sample multimedia resources and the sample satisfaction of the sample multimedia resources.
And optimizing the initial prediction model according to the attention loss of the initial prediction model and the satisfaction loss of the initial prediction model to obtain the prediction model.
Optionally, as shown in fig. 8, the training unit 704 provided in the embodiment of the present disclosure is specifically configured to:
and determining the estimated neglect degree of the sample multimedia resource according to the estimated attention degree of the sample multimedia resource. The estimated ignorance of the sample multimedia resource is inversely related to the estimated attention of the sample multimedia resource.
And determining the sample neglect degree of the sample multimedia resource according to the sample attention degree of the sample multimedia resource. The sample ignorance of the sample multimedia asset is inversely related to the sample attention of the sample multimedia asset.
And determining the attention loss of the initial prediction model according to the estimated attention of the sample multimedia resource, the estimated neglect of the sample multimedia resource, the sample attention of the sample multimedia resource and the sample neglect of the sample multimedia resource.
Optionally, as shown in fig. 8, the training unit 704 provided in the embodiment of the present disclosure is specifically configured to:
and determining the logarithm value of the estimated attention of each sample multimedia resource and the logarithm value of the estimated ignorance of each sample multimedia resource, and respectively calculating a first product and a second product. The first product is the product of the logarithmic value of the estimated interest of each sample multimedia resource and the sample interest of each sample multimedia resource, and the second product is the product of the logarithmic value of the estimated interest of each sample multimedia resource and the sample interest of each sample multimedia resource.
And calculating the attention loss of the initial prediction model according to the first product and the second product obtained by calculation.
Optionally, as shown in fig. 8, the training unit 704 provided in the embodiment of the present disclosure is specifically configured to:
and determining the estimated rejection of the sample multimedia resources according to the estimated satisfaction of the sample multimedia resources. The estimated rejection of the sample multimedia resource is inversely related to the estimated satisfaction of the sample multimedia resource.
And determining the sample rejection degree of the sample multimedia resources according to the sample satisfaction degree of the sample multimedia resources. The sample rejection of the sample multimedia asset and the sample satisfaction of the sample multimedia asset.
And determining the satisfaction loss of the initial prediction model according to the estimated satisfaction of the sample multimedia resources, the estimated rejection of the sample multimedia resources, the sample satisfaction of the sample multimedia resources and the sample rejection of the sample multimedia resources.
Optionally, as shown in fig. 8, the training unit 704 provided in the embodiment of the present disclosure is specifically configured to:
and determining the logarithm value of the estimated satisfaction degree of each sample multimedia resource and the logarithm value of the estimated repulsion degree of each sample multimedia resource, and respectively calculating a third product and a fourth product. The third product is the product of the logarithm of the estimated satisfaction of each sample multimedia resource and the sample satisfaction of each sample multimedia resource, and the fourth product is the product of the logarithm of the estimated rejections of each sample multimedia resource and the sample rejections of each sample multimedia resource.
And calculating the satisfaction loss of the initial prediction model according to the third product and the fourth product obtained by calculation.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 9 is a schematic structural diagram of a server provided by the present disclosure. As in fig. 9, the server 80 may include at least one processor 801 and a memory 803 for storing processor-executable instructions. Wherein the processor 801 is configured to execute instructions in the memory 803 to implement the ranking method of the search results in the above-described embodiments.
Additionally, server 80 may include a communication bus 802 and at least one communication interface 804.
The processor 801 may be a Central Processing Unit (CPU), a micro-processing unit, an ASIC, or one or more integrated circuits for controlling the execution of programs according to the present disclosure.
The communication bus 802 may include a path that conveys information between the aforementioned components.
The communication interface 804 may be any device, such as a transceiver, for communicating with other devices or communication networks, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.
The memory 803 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and connected to the processing unit by a bus. The memory may also be integrated with the processing unit.
The memory 803 is used for storing instructions for performing the disclosed aspects and is controlled in execution by the processor 801. The processor 801 is configured to execute instructions stored in the memory 803 to implement the functions of the disclosed method.
As an example, in conjunction with fig. 7, the functions implemented by the obtaining unit 501, the predicting unit 502, the determining unit 503, the ranking unit 504, and the training unit 505 in the ranking apparatus 50 of the search results are the same as those of the processor 801 in fig. 9.
In particular implementations, processor 801 may include one or more CPUs such as CPU0 and CPU1 in fig. 9, for example, as an example.
In particular implementations, server 80 may include multiple processors, such as processor 801 and processor 807 in FIG. 9, for example, as an embodiment. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
In particular implementations, server 80 may also include an output device 805 and an input device 806, as one embodiment. The output device 805 is in communication with the processor 801 and may display information in a variety of ways. For example, the output device 805 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 806 is in communication with the processor 801 and can accept user input in a variety of ways. For example, the input device 806 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.
Those skilled in the art will appreciate that the architecture shown in FIG. 9 does not constitute a limitation on server 80, and may include more or fewer components than shown, or combine certain components, or employ a different arrangement of components.
In addition, the present disclosure also provides a computer-readable storage medium, in which instructions, when executed by a processor of a server, enable the server to perform the ranking method of search results provided as the above embodiment.
In addition, the present disclosure also provides a computer program product comprising computer instructions which, when run on a server, cause the server to perform the method of ranking search results as provided in the above embodiments.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A method for ranking search results, comprising:
acquiring a current search word and a plurality of multimedia resources related to the current search word;
predicting the click rate of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a pre-trained prediction model, and sequencing the plurality of multimedia resources according to the click rate of each multimedia resource; the search word characteristics of the current search word are used for identifying the current search word, and the prediction model is obtained by training according to a sample search word, a plurality of sample multimedia resources related to the sample search word and a sample operation record of each sample multimedia resource; the sample operation records are used for representing the click rate of each sample multimedia resource when a user searches the sample search words in a historical time period.
2. The method for sorting search results according to claim 1, wherein predicting the click through rate of each multimedia resource according to the search term characteristics of the current search term, the resource characteristics of each multimedia resource and a pre-trained prediction model comprises:
predicting the attention and satisfaction of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and the prediction model; the attention degree is used for reflecting the sequence of click operations executed by the user on different multimedia resources, and the satisfaction degree is used for reflecting the feedback operation of the user on the satisfaction information of each multimedia resource;
and determining the click rate of each multimedia resource according to the attention and satisfaction of each multimedia resource.
3. The method for sorting search results according to claim 2, wherein predicting the attention and the satisfaction of each multimedia resource according to the search term characteristics of the current search term, the resource characteristics of each multimedia resource and the prediction model comprises:
determining the fusion characteristics of each multimedia resource according to the search word characteristics of the current search word and the resource characteristics of each multimedia resource; the fusion characteristics comprise characteristics obtained by splicing the search word characteristics of the current search word and the resource characteristics of each multimedia resource;
inputting the fusion characteristics of each multimedia resource into the prediction model to obtain the attention and the satisfaction of each multimedia resource.
4. The method according to claim 3, wherein the determining the fusion feature of each multimedia resource according to the search term feature of the current search term and the resource feature of each multimedia resource comprises:
performing cross processing on the search word characteristics of the current search word and the resource characteristics of each multimedia resource to obtain the cross characteristics of each multimedia resource and the current search word;
and splicing the search word characteristics of the current search word, the resource characteristics of each multimedia resource and the cross characteristics corresponding to each multimedia resource to obtain the fusion characteristics of each multimedia resource.
5. The method for sorting the search results according to any one of claims 1 to 4, wherein the search term features of the current search term are obtained by concatenating the text features of the current search term with the embedded features of the identifier of the current search term.
6. The method for ranking search results according to any of claims 1-4, wherein the resource feature of each multimedia resource is obtained by concatenating at least two of the text feature of the description of each multimedia resource, the image feature of each multimedia resource, and the embedded feature of the identification of each multimedia resource.
7. The device for sorting the search results is characterized by comprising an acquisition unit, a prediction unit and a sorting unit;
the acquisition unit is used for acquiring a current search word and a plurality of multimedia resources related to the current search word;
the prediction unit is used for predicting the click rate of each multimedia resource according to the search word characteristics of the current search word, the resource characteristics of each multimedia resource and a pre-trained prediction model; the search word characteristics of the current search word are used for identifying the current search word, and the prediction model is obtained by training according to a sample search word, a plurality of sample multimedia resources related to the sample search word and a sample operation record of each sample multimedia resource; the sample operation records are used for representing the click rate of each sample multimedia resource when a user searches the sample search terms in a historical time period;
the sorting unit is used for sorting the plurality of multimedia resources according to the click rate of each multimedia resource.
8. A server, comprising: a processor, a memory for storing instructions executable by the processor; wherein the processor is configured to execute instructions to implement the method of ranking search results of any of claims 1-6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of a server, enable the server to perform the method of ranking search results of any of claims 1-6.
10. A computer program product comprising instructions, characterized in that it comprises computer instructions which, when run on a server, cause the server to perform the method of ranking search results according to any of claims 1-6.
CN202111277526.0A 2021-10-29 2021-10-29 Search result sorting method, device, equipment and storage medium Pending CN113934872A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111277526.0A CN113934872A (en) 2021-10-29 2021-10-29 Search result sorting method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111277526.0A CN113934872A (en) 2021-10-29 2021-10-29 Search result sorting method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113934872A true CN113934872A (en) 2022-01-14

Family

ID=79285083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111277526.0A Pending CN113934872A (en) 2021-10-29 2021-10-29 Search result sorting method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113934872A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140351247A1 (en) * 2012-05-07 2014-11-27 Tencent Technology (Shenzhen) Company Limited Method and server for searching information
CN105354235A (en) * 2015-10-08 2016-02-24 天脉聚源(北京)传媒科技有限公司 Search result processing method and apparatus
CN109508394A (en) * 2018-10-18 2019-03-22 青岛聚看云科技有限公司 A kind of training method and device of multi-medium file search order models
CN112000822A (en) * 2020-08-21 2020-11-27 北京达佳互联信息技术有限公司 Multimedia resource sequencing method and device, electronic equipment and storage medium
CN112434134A (en) * 2020-12-04 2021-03-02 中国科学院深圳先进技术研究院 Search model training method and device, terminal equipment and storage medium
CN112749333A (en) * 2020-07-24 2021-05-04 腾讯科技(深圳)有限公司 Resource searching method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140351247A1 (en) * 2012-05-07 2014-11-27 Tencent Technology (Shenzhen) Company Limited Method and server for searching information
CN105354235A (en) * 2015-10-08 2016-02-24 天脉聚源(北京)传媒科技有限公司 Search result processing method and apparatus
CN109508394A (en) * 2018-10-18 2019-03-22 青岛聚看云科技有限公司 A kind of training method and device of multi-medium file search order models
CN112749333A (en) * 2020-07-24 2021-05-04 腾讯科技(深圳)有限公司 Resource searching method and device, computer equipment and storage medium
CN112000822A (en) * 2020-08-21 2020-11-27 北京达佳互联信息技术有限公司 Multimedia resource sequencing method and device, electronic equipment and storage medium
CN112434134A (en) * 2020-12-04 2021-03-02 中国科学院深圳先进技术研究院 Search model training method and device, terminal equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GEORGES E. DUPRET ET AL.: "A user browsing model to predict search engine click data from past observations", 《PROCEEDINGS OF THE 31ST ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL》, 31 July 2008 (2008-07-31), pages 331 - 338, XP058244119, DOI: 10.1145/1390334.1390392 *
茅锦丹: "隐式反馈场景下基于embedding向量的融合推荐算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, 15 March 2021 (2021-03-15), pages 138 - 757 *
陈强: "电商搜索广告中语义理解和广告排序方法研究", 《中国优秀硕士学位论文全文数据库 哲学与人文科学辑》, 15 April 2020 (2020-04-15), pages 084 - 7 *

Similar Documents

Publication Publication Date Title
CN110149540B (en) Recommendation processing method and device for multimedia resources, terminal and readable medium
CN110781321B (en) Multimedia content recommendation method and device
Hettiachchi et al. A survey on task assignment in crowdsourcing
CN109376267B (en) Method and apparatus for generating a model
CN107784010B (en) Method and equipment for determining popularity information of news theme
CN109981785B (en) Method and device for pushing information
US20210385510A1 (en) Live stream playback video generation method, device and apparatus
CN109165302A (en) Multimedia file recommendation method and device
CN104782138A (en) Identifying a thumbnail image to represent a video
CN111783810B (en) Method and device for determining attribute information of user
CN111159563B (en) Method, device, equipment and storage medium for determining user interest point information
KR101725510B1 (en) Method and apparatus for recommendation of social event based on users preference
RU2714594C1 (en) Method and system for determining parameter relevance for content items
KR102244697B1 (en) Project curation method considering worker’s tendency of crowdsourcing based projects for artificial intelligence training data generation
CN110737824B (en) Content query method and device
CN109255036B (en) Method and apparatus for outputting information
CN112328889A (en) Method and device for determining recommended search terms, readable medium and electronic equipment
CN113971243A (en) Data processing method, system, equipment and storage medium applied to questionnaire survey
CN110992127B (en) Article recommendation method and device
CN112182281B (en) Audio recommendation method, device and storage medium
CN109636530B (en) Product determination method, product determination device, electronic equipment and computer-readable storage medium
KR102368043B1 (en) Apparatus and method for recommending news of user interest using user-defined topic modeling
US20230316106A1 (en) Method and apparatus for training content recommendation model, device, and storage medium
CN112995248B (en) Information pushing method, device and equipment
CN113836388A (en) Information recommendation method and device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination