CN116578725A - Search result ordering method and device, computer equipment and storage medium - Google Patents

Search result ordering method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN116578725A
CN116578725A CN202310539712.XA CN202310539712A CN116578725A CN 116578725 A CN116578725 A CN 116578725A CN 202310539712 A CN202310539712 A CN 202310539712A CN 116578725 A CN116578725 A CN 116578725A
Authority
CN
China
Prior art keywords
result
document
search
click rate
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310539712.XA
Other languages
Chinese (zh)
Inventor
彭宗徽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202310539712.XA priority Critical patent/CN116578725A/en
Publication of CN116578725A publication Critical patent/CN116578725A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure provides a search result ordering method, apparatus, computer device, and storage medium, wherein the method includes: obtaining search results matched with the search word, wherein the search results comprise first results of a first genre and at least one second result of a second genre except the first genre; the second result comprises at least one result document; determining the document characteristics corresponding to the result document according to the text attribute information of the result document, the dimension information of the result document in a plurality of preset feature dimensions, the search words and the source information of the search words aiming at any result document in the second result; determining the document association features corresponding to the second result according to the document features of the result documents in the second result; and sorting the first results and the second results according to the document associated features of the second results and the result click rate of the first results.

Description

Search result ordering method and device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to a search result ranking method, apparatus, computer device, and storage medium.
Background
When searching in the search software, the search results of multiple genres can be often searched, wherein the search results of multiple genres can include the core genres natural search results corresponding to the search software and other search results of other genres. For example, for video software, the core genre is a video genre, other genres such as a text genre, a card genre, etc., natural search results tend to be searched videos, and other search results tend to be cards of various card genres.
When the search results of various genres are displayed, the ranking of the search results often affects the utilization rate of the search results and the search experience of the user. Most of conventional sorting methods are used for sorting the search results by using the historical click rate of the search results, but the sorting methods have the problems of poor rationality of the sorting results and non-ideal sorting results, and the utilization rate of the search results is affected.
Disclosure of Invention
The embodiment of the disclosure at least provides a search result ordering method, a search result ordering device, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a search result ranking method, including:
Obtaining search results matched with the search word, wherein the search results comprise first results of a first genre and at least one second result of a second genre except the first genre; the second result comprises at least one result document;
determining document features corresponding to the result document according to text attribute information of the result document, dimension information of the result document in a plurality of preset feature dimensions, the search words and source information of the search words aiming at any one of the second results;
determining the document association characteristics corresponding to the second result according to the document characteristics of each result document in the second result;
and sorting the first results and the second results according to the document associated features of the second results and the result click rate of the first results.
In one possible implementation manner, the determining the document association feature corresponding to the second result according to the document feature of each result document in the second result includes:
determining context characteristics of each result document according to document characteristics of each result document in the second result, and determining document association characteristics corresponding to the second result according to average characteristics of the context characteristics; the contextual features are used to characterize association information between each of the result documents and each of the result documents in the second result.
In one possible implementation manner, the sorting the first results and the second results according to the document associated features of the respective second results and the result click rate of the respective first results includes:
determining a first click rate of the second result on a first genre and a second click rate of the second result on the second genre corresponding to the second result according to the document association characteristics of the second result;
determining a target click rate of the second result according to the first click rate and the second click rate;
and sorting the first results and the second results according to the target click rate of each second result and the result click rate of each first result.
In one possible implementation manner, the dimension information in any one of the preset feature dimensions includes a plurality of dimension information;
determining, for any one of the second results, a document feature corresponding to the result document according to text attribute information of the result document, dimension information of the result document in a plurality of preset feature dimensions, the search word, and source information of the search word, including:
determining a plurality of target features in any preset feature dimension according to a plurality of dimension information of the result document in the preset feature dimension;
Splicing the target features in the preset feature dimension to obtain a first spliced feature in the preset feature dimension;
splicing the source characteristics corresponding to the source information, the text attribute characteristics corresponding to the text attribute information, the target characteristics under a plurality of preset characteristic dimensions and the search word characteristics corresponding to the search word to obtain second spliced characteristics;
and determining the document characteristics corresponding to the result document according to the first splicing characteristics and the second splicing characteristics.
In a possible implementation manner, the search word features include target word features corresponding to the search words and word segmentation features corresponding to each word segmentation in the search words;
the determining the document features corresponding to the result document according to the first splicing features and the second splicing features includes:
determining a first matching feature according to the matching degree between the target word feature and the text attribute feature;
determining a second matching feature according to the matching degree between the word segmentation feature and the text attribute feature;
and performing full connection processing on the first splicing characteristic, the second splicing characteristic, the first matching characteristic and the second matching characteristic to obtain the document characteristic corresponding to the result document.
In one possible embodiment, the second body cut includes a plurality, the method further comprising:
determining intention scores of the search words respectively corresponding to the second body according to the target word characteristics and the word segmentation characteristics;
and determining target intents corresponding to the search words according to the intention scores respectively corresponding to the second body cuts, wherein the target intents are used for indicating the second body cuts with the highest matching degree with the search words.
In one possible implementation manner, the first click rate and the second click rate are output by using a pre-trained click rate prediction model, wherein the click rate prediction model is obtained by training according to the following steps:
acquiring a sample search word and a sample result matched with the sample search word; the sample result having the second genre and comprising at least one sample result document;
inputting each sample result document, sample source information of the sample search word and the sample search word in the sample result into a click rate prediction model to be trained to obtain a first predicted click rate of the sample result on a first genre and a second predicted click rate of the sample result on a second genre;
Determining a first loss according to the first predicted click rate and a first standard click rate corresponding to the sample result, and determining a second loss according to the second predicted click rate and a second standard click rate corresponding to the sample result; the first standard click rate is determined according to whether a target result matched with the sample result is clicked or not, wherein the target result has the first genre and comprises key information in the sample result; a second standard click rate is determined according to whether the sample result is clicked or not;
and carrying out iterative training on the click rate prediction model to be trained by utilizing the first loss and the second loss until a preset training cut-off condition is met, so as to obtain a trained click rate prediction model.
In one possible embodiment, the intent score is output using a pre-trained intent prediction model that is trained according to the following steps:
acquiring a sample search word and a sample result matched with the sample search word; the sample result having the second genre;
inputting the sample search word into an intention prediction model to be trained to obtain a predicted intention score of the sample search word under a second body cut corresponding to the sample result;
Determining a third loss according to the label score corresponding to the sample result and the prediction intention score; the label score is determined according to whether the sample result is clicked or not;
and carrying out iterative training on the intention prediction model to be trained by utilizing the third loss until a preset training cut-off condition is met, so as to obtain a trained intention prediction model.
In a second aspect, an embodiment of the present disclosure further provides a search result sorting apparatus, including:
an acquisition module, configured to acquire search results matched with a search term, where the search results include first results of a first genre and at least one second result of a second genre other than the first genre; the second result comprises at least one result document;
the first determining module is used for determining document characteristics corresponding to the result document according to text attribute information of the result document, dimension information of the result document in a plurality of preset feature dimensions, the search words and source information of the search words for any one of the result documents in the second result;
the second determining module is used for determining the document association characteristics corresponding to the second result according to the document characteristics of each result document in the second result;
And the sorting module is used for sorting the first results and the second results according to the document associated features of the second results and the result click rate of the first results.
In a third aspect, an optional implementation manner of the disclosure further provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, where the machine-readable instructions, when executed by the processor, perform the steps in the first aspect, or any possible implementation manner of the first aspect, when executed by the processor.
In a fourth aspect, an alternative implementation of the present disclosure further provides a computer readable storage medium having stored thereon a computer program which when executed performs the steps of the first aspect, or any of the possible implementation manners of the first aspect.
The description of the effect of the search result sorting apparatus, the computer device, and the computer-readable storage medium is referred to the description of the search result sorting method, and is not repeated here.
According to the search result ordering method, the search result ordering device, the computer equipment and the storage medium, text attributes of the result document and dimension information under a plurality of preset feature dimensions are used, and the search words and corresponding source information are combined, so that the document features can be determined by utilizing the information under multiple angles, the richness of the information included in the determined document features can be improved, and the accuracy of the document features in representing the result document can be improved. Because the plurality of result documents exist in the second result, the ordering of the second result may be influenced, so that the document features of the result documents are utilized to determine the article association features corresponding to the second result, the influence of the association between the result documents on the ordering of the second result may be fully considered, and reasonable and accurate article association features are obtained. And finally, ordering the second results by utilizing the article association features, so that the ordering optimization of the second results can be realized, the rationality of the ordering results is improved, and the utilization rate of the search results is improved.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
FIG. 1 illustrates a flow chart of a search result ranking method provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a searched user card according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a searched music card according to an embodiment of the present disclosure;
FIG. 4 illustrates a model predictive schematic provided by an embodiment of the present disclosure;
FIG. 5 illustrates a flowchart of a method of training a click rate prediction model provided by an embodiment of the present disclosure;
FIG. 6 illustrates a flowchart of a method of training an intent prediction model provided by embodiments of the present disclosure;
FIG. 7 illustrates a schematic diagram of a search result ranking apparatus provided by an embodiment of the present disclosure;
fig. 8 shows a schematic structural diagram of a computer device according to an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
Furthermore, the terms first, second and the like in the description and in the claims of embodiments of the disclosure and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein.
Reference herein to "a plurality of" or "a number" means two or more than two. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
It has been found that when sorting search results of other genres, for example, when sorting search result cards including a plurality of result documents, a maximum value in the historical click rates corresponding to the plurality of result documents respectively is often used as the historical click rate of the search result cards, and the search result cards are sorted by using the historical click rate. However, the click rate determination method is equivalent to that each result document in the search result card is independently existed, and the association between the result documents is ignored, so that the accuracy of the determined click rate is affected, and the rationality and accuracy of the sorting of the search result cards are reduced.
Based on the above research, the disclosure provides a search result ordering method, a device, a computer device and a storage medium, by using text attributes of a result document and dimension information under a plurality of preset feature dimensions, and combining search words and corresponding source information, the document feature determination can be realized by utilizing information under multiple angles, so that the richness of the information included in the determined document feature can be improved, and the accuracy of the document feature in characterizing the result document can be improved. Because the plurality of result documents exist in the second result, the ordering of the second result may be influenced, so that the document features of the result documents are utilized to determine the article association features corresponding to the second result, the influence of the association between the result documents on the ordering of the second result may be fully considered, and reasonable and accurate article association features are obtained. And finally, ordering the second results by utilizing the article association features, so that the ordering optimization of the second results can be realized, the rationality of the ordering results is improved, and the utilization rate of the search results is improved.
The present invention is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
For the sake of understanding the present embodiment, first, a detailed description will be given of a search result sorting method disclosed in the present embodiment, where an execution main body of the search result sorting method provided in the present embodiment is generally a terminal device or other processing device with a certain computing capability, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a personal digital assistant device (Personal Digital Assistant, PDA), a handheld device, a computer device, etc.; in some possible implementations, the search result ordering method may be implemented by way of a processor invoking computer readable instructions stored in a memory.
The search result ranking method provided by the embodiment of the present disclosure is described below by taking an execution body as an example of a computer device.
As shown in fig. 1, a flowchart of a search result sorting method provided by an embodiment of the disclosure may include the following steps:
s101: obtaining search results matched with the search word, wherein the search results comprise first results of a first genre and at least one second result of a second genre except the first genre; the second result includes at least one result document.
Here, the search term may be search information input by the user, and the search result is a result matched with the search term searched by using the search engine. Illustratively, the method is used for inputting a search query (query) 1 in search software, and obtaining search results matched with the search query1 in response to receiving the search query1 input by a user.
The search results may include one or more first results of the first genre and at least one second result of a second genre other than the first genre. The first genre may be a genre of the resource with the largest data amount that can be provided by the search software, for example, the first genre corresponding to the video search software may be a video genre, and the first genre corresponding to the news search software may be an article genre.
The second genre may be any resulting genre other than the first genre, and illustratively, in the case where the first genre is a video genre, the second genre may be a user card genre, a music card genre, a topic label card genre, an article genre, or the like. In the case where the first genre is an article genre, the second genre may be a video genre, a user card genre, a music card genre, or the like.
For each search, the second results included in the obtained search results often have different second body configurations, i.e., there is no duplication of body configurations between the second body configurations to which each second result included in the search results corresponds. For example, where the first genre is a video genre, the search results may include a plurality of video results, a user card for one user card genre, a music card for one music card genre.
For each second result, at least one result document may be included therein. For example, in the case where the second result is a user card, one result document in the user card may be document information of one user, and the document information is information authorized to be acquired, and may include, for example, a user head portrait, a user name, a user identification, a user attention number, a user work number, and the like. As shown in fig. 2, a schematic diagram of a user card obtained by searching according to an embodiment of the present disclosure includes document information of 3 users (i.e., document (doc) 1 corresponding to user 1, doc2 corresponding to user 2, and doc3 corresponding to user 3 in fig. 2). In the case that the second result is a music card, one result document in the music card may be music information in which one piece of music corresponds to a different singer, and the music information may be, for example, version information of the music, style information of the music, music cover information, music duration information, user number information, or the like. As shown in fig. 3, a schematic diagram of a music card obtained by searching according to an embodiment of the present disclosure includes music information of music a under three singers, namely, music information 1 corresponding to singer 1, music information 2 corresponding to singer 2, and music information 3 corresponding to singer 3 in fig. 3. In the case where the second result is a topic tag card, one of the result documents in the topic tag card may be one topic tag card.
For example, search results matching the search query may be obtained in response to receiving the search query, and the search results may include a plurality of videos, a user card, and a music card.
S102: and determining the document characteristics corresponding to the result document according to the text attribute information of the result document, the dimension information of the result document in a plurality of preset feature dimensions, the search words and the source information of the search words aiming at any result document in the second result.
Here, the text attribute information is used to characterize basic attributes of the document, and may include text information, title information, text language information, text length information, and the like of the document, for example. The number of preset feature dimensions may be empirically set, and embodiments of the present disclosure are not particularly limited. Illustratively, the plurality of preset feature dimensions may include a user information dimension, a topic tag information dimension, a music content dimension, an authority dimension, and so forth. The dimension information may be information corresponding to the result document in a preset feature dimension, and it is understood that the dimension information of the result document in a certain preset feature dimension may be null. For example, when the result document is doc in the user card, the dimension information in both the tag information dimension and the music content dimension may be null; when the result document is music information in a music card, the dimension information in the user information dimension and the topic label information dimension can be null.
The source information of the search term is used to characterize the source of the search term, and may include, for example, search mode information (e.g., whether the search term is an actively searched search term, a recommended search term, a historical search term, a search term under a comprehensive search channel, or a search term under a vertical search channel), search time, search user, etc.
The document features are used for representing document information of the result document at multiple angles, and the complete result document can be restored based on the document features of the result document.
In the implementation, for any result document in the second result, text attribute features corresponding to the result document can be generated according to document attribute information in document information of the result document; according to the document information of the result document, dimension information corresponding to the result document under a plurality of preset feature dimensions is determined, and then according to the dimension information under each preset feature dimension, dimension characteristics corresponding to the result document under each preset feature dimension can be determined. Generating search word characteristics corresponding to the search words according to the text information of the search words, and determining source characteristics corresponding to the search words according to the source information of the search words. Then, according to the text attribute characteristics, the dimension characteristics corresponding to the result document in each preset characteristic dimension, the search word characteristics and the source characteristics, the document characteristics corresponding to the result document can be determined. For example, feature fusion processing may be performed on the text attribute feature, each dimension feature, the search term feature, and the source feature, and then convolution processing, full connection processing, and the like may be performed on the fused result, so as to obtain the document feature of the result document.
Alternatively, the document features of the result document may be determined using a pre-trained click rate prediction model. Specifically, after text attribute information, dimension information under a plurality of preset feature dimensions and source information of search words of a result document are determined, the text attribute information, the dimension information under the plurality of preset feature dimensions, the search words and the source information can be input into a pre-trained click rate prediction model, and the input information is identified and processed by utilizing the pre-trained click rate prediction model, so that document features of the result document are obtained. The pre-trained click rate prediction model is used for predicting the click rate corresponding to each search result.
In one embodiment, the dimension information in any preset feature dimension may include a plurality of sub-dimensions, that is, a plurality of sub-dimensions may be set in advance in each preset feature dimension, and then, for any preset feature dimension, the dimension information corresponding to each sub-dimension of the result document in the preset feature dimension may be obtained according to the document information of the result document, where the plurality of dimension information in any preset feature dimension is the dimension information corresponding to each sub-dimension in the preset feature dimension.
For example, in the case where the preset feature dimension is a user information dimension, the sub-dimension in the user information dimension may include a user name dimension, a user attention number dimension, a user gender dimension, a user age dimension, a user work number dimension, a user latest online time dimension, and the like, and the plurality of dimension information in the user information dimension may include a user name, a user attention number, a user gender, a user age, a user work number, a user latest online time, and the like. It is understood that for any sub-dimension, the dimension information in that sub-dimension may also be null. In the case that the preset feature dimension is a topic label information dimension, the sub-dimension in the topic label information dimension may include a label text dimension, a label subject dimension, a label length dimension, a label usage amount dimension, a label release time length dimension, and the like, and the plurality of dimension information in the topic label information dimension may include a label text, a label subject, a label length, a label usage amount, a label release time length, and the like. In the case where the preset feature dimension is a music content dimension, the sub-dimension in the music content dimension may include a collection dimension, a play amount dimension, a music name dimension, a music singer dimension, and the like, and the plurality of dimension information in the music content dimension may include a music collection, a music play amount, a music name, a music singer dimension, and the like.
For the above S102, the following steps may be further performed:
s102-1: and determining a plurality of target features in any preset feature dimension according to the plurality of dimension information of the result document in the preset feature dimension.
Here, the target feature is a document feature corresponding to a sub-dimension of the result document in a preset feature dimension, and the target feature may be determined for dimension information in the sub-dimension. The dimension features in the preset feature dimension may include a plurality of target features in the preset feature dimension.
In specific implementation, for any preset feature dimension, the click rate prediction model can be utilized to determine dimension information corresponding to a plurality of sub-dimensions of the result document under the preset feature dimension according to the document information of the result document. Then, for each sub-dimension, a target feature corresponding to the sub-dimension can be generated according to the dimension information in the sub-dimension. Thus, a plurality of target features in the preset feature dimension can be obtained.
S102-2: and splicing the plurality of target features in the preset feature dimension to obtain a first spliced feature in the preset feature dimension.
Here, the first stitching feature is finer granularity of feature expression than the target feature.
In the implementation, for a plurality of target features in each preset feature dimension, a click rate prediction model can be utilized to perform feature interaction on each two target features, so that the features after interaction are obtained. And then, performing feature stitching processing on the obtained features after interaction to obtain a first stitching feature under the preset feature dimension.
In this way, through the interaction of the target features, the first-order continuous or discrete target features can be converted into the second-order interacted features, and then through the feature stitching processing of the interacted features, the first stitching features with stronger expression capability and finer expression granularity can be obtained.
S102-3: and splicing the source features corresponding to the source information, the text attribute features corresponding to the text attribute information, the target features under the plurality of preset feature dimensions and the search word features corresponding to the search words to obtain second spliced features.
Here, the target feature in the plurality of preset feature dimensions may include a plurality of target features in each preset feature dimension. The second splicing feature is a feature obtained by splicing the source feature, the text attribute feature, the target feature and the search word feature, and the feature contains rich information.
In specific implementation, the click rate prediction model can be utilized to splice the source feature corresponding to the source information, the text attribute feature corresponding to the text attribute information, the multiple target features under each preset feature dimension and the search word feature corresponding to the search word, so as to obtain a second spliced feature.
S102-4: and determining the document characteristics corresponding to the result document according to the first splicing characteristics and the second splicing characteristics.
For example, the first stitching feature and the second stitching feature may be stitched to obtain a third stitching feature, and then one or more of full connection processing, convolution processing, and normalization processing is performed on the third stitching feature, so as to obtain a document feature corresponding to the result document.
In one embodiment, since different search terms often have different search intents, even for the same search term, the search may be different at different times or by different users, and the matching degree between the corresponding search intents and the result document often affects the display position of the second result corresponding to the result document. Therefore, in order to further improve the accuracy and rationality of the ranking, the determination of the document features of the result document can also be performed based on the degree of matching between the search word features of the search word and the result document. Specifically, the search term features may include target term features corresponding to the search term and word segmentation features corresponding to each word segmentation in the search term.
Here, the target word feature may be a feature of the search word itself, and may be determined according to text information and semantic information of the search word. The word segmentation features are features of each word segmentation, and can be determined according to text information and semantic information of the word segmentation.
Further, for the above S102-4, the following steps may be performed:
s102-4-1: and determining a first matching feature according to the matching degree between the target word feature and the text attribute feature.
Here, the first matching feature is used to characterize the degree of matching between the search term and the result document.
In the implementation, after the target word characteristics corresponding to the search word and the text attribute characteristics of the result document are obtained, a click rate prediction model can be utilized to determine the matching degree between the target word characteristics and the text attribute characteristics according to the target word characteristics and the text attribute characteristics, and then a first matching characteristic corresponding to the result document is generated according to the matching degree.
S102-4-2: and determining a second matching feature according to the matching degree between the word segmentation feature and the text attribute feature.
Here, the second matching feature is used to characterize a degree of matching between the segmented word corresponding to the search term and the result document. The number of the second matching features is consistent with the number of the segmented words corresponding to the search word.
In the implementation, after the word segmentation characteristics of each word segmentation corresponding to the search word and the text attribute characteristics of the result document are obtained, a click rate prediction model can be utilized for each word segmentation, and the matching degree between the word segmentation characteristics and the text attribute characteristics of the word segmentation can be determined according to the word segmentation characteristics and the text attribute characteristics of the word segmentation. Then, a second matching feature between the result document and the segmentation is generated according to the matching degree.
S102-4-3: and performing full connection processing on the first splicing characteristic, the second splicing characteristic, the first matching characteristic and the second matching characteristic to obtain document characteristics corresponding to the result document.
By way of example, the click rate prediction model may be used to perform a stitching process on the first stitching feature, the second stitching feature, the first matching feature, and the second matching feature to obtain a stitched feature, and then a full connection process may be performed on the stitched feature to obtain a full connection processed feature.
Optionally, the click rate prediction model may be used to perform full connection processing on the first splicing feature, the second splicing feature, the first matching feature and the second matching feature, so as to obtain full connection features corresponding to the features. Then, each full-connection feature can be spliced to obtain the document feature corresponding to the result document.
In another embodiment, the second genre may include a plurality of, for example, where the first genre is video, the second genre may include a user card genre, a music card genre, and a hashtag card genre.
For the search term, in addition to determining the document characteristics of the result document by using the search term, the search intention corresponding to the search term can be predicted. Specifically, the search intention can be predicted according to the following steps:
step one, according to the target word characteristics and the word segmentation characteristics, determining intention scores of the search words respectively corresponding to the search words under each second body.
Here, the intention score of one second body is used to characterize the probability that the search term has an intention to search the search result of that second body, the higher the intention score, the stronger the intention of the search term to search the search result of the corresponding second body, and the higher the probability of searching the search result of the corresponding second body.
In the specific implementation, the search word can be input into a pre-trained intention prediction model, and the intention prediction model is utilized to identify the search word so as to obtain the target word characteristics corresponding to the search word; meanwhile, the intention prediction model can be utilized to perform word segmentation processing on the search words, and then the obtained word segments are subjected to recognition processing to obtain word segment characteristics corresponding to the word segments. Then, the intention prediction model can be utilized to perform feature recognition processing on the word segmentation features and the target word features, so that the intention score corresponding to the search word under each second body is output.
And step two, determining target intents corresponding to the search words according to the intention scores corresponding to the second body cuts, wherein the target intents are used for indicating the second body cuts with the highest matching degree with the search words.
Here, the target intention is the true search intention corresponding to the search word, the second genre corresponding to the target intention is the second genre with the highest matching degree with the search word, the search result under the second genre corresponding to the target intention may be the second result with the highest matching degree with the search word,
for example, after obtaining the intention scores respectively corresponding to the respective second body cuts, the second genre corresponding to the highest intention score may be regarded as the second genre matching the target intention. That is, the target intent is to characterize the intent of the search term to have the search result under the second body cut with the highest search intent score.
It should be noted that the intent prediction model and the click rate prediction model may be two decoupled models.
After the target intention is obtained, the search amount corresponding to various target intentions can be determined according to the target intention corresponding to each search word in a certain time period, and then resource recommendation can be performed according to the search amount corresponding to various target intentions.
S103: and determining the document association characteristics corresponding to the second result according to the document characteristics of each result document in the second result.
Here, the document association feature can not only characterize complete document information corresponding to each result document in the second results, but also characterize association between every two second results.
In the implementation, after the document features of each result document in the second result are obtained, the associated features between the document features of each result document and other result documents can be determined, then the document features of each result document and the associated features corresponding to the result document can be fused to obtain the fused features corresponding to each result document, and then the feature fusion processing can be performed on the fused features corresponding to each result document to obtain the document associated features corresponding to the second result.
Or after the document features of each result document in the second result are obtained by using the pre-trained click rate prediction model, the document features of each result document can be subjected to feature recognition processing by using a transform model structure in the click rate prediction model to obtain the associated features between the document features of each result document and other result documents, the document features of each result document are fused with the associated features, and the fused features corresponding to each result document are obtained and output. Among these, the transducer model is a neural network that learns context and thus meaning by tracking relationships in sequence data. And then, carrying out feature fusion on the fused features corresponding to the result documents by utilizing a click rate prediction model to obtain document associated features corresponding to the second result.
In one embodiment, for S103, it may be implemented as follows:
determining the context characteristics of each result document according to the document characteristics of each result document in the second result, and determining the document association characteristics corresponding to the second result according to the mean value characteristics of each context characteristic; the contextual features are used to characterize the association information between and among the result documents in the second result.
Here, a result document may correspond to a contextual feature that characterizes association information between the result document and other result documents as well as complete document information for the result document. The average feature is the feature obtained by taking the average of the contextual features of each result document.
In the implementation, after the document features of each result document are obtained, the document features of each result document can be combined into a feature sequence, the feature sequence is input into a transducer model module in a click rate prediction model, then feature recognition processing is carried out on each document feature in the feature sequence by utilizing a transducer model structure, and the context features of each result document are output. And then, the context features of the result documents can be input into a mean processing module in the click rate prediction model together, and the mean processing module is utilized to perform mean value taking operation on the context features of each result document to obtain document associated features corresponding to the second result.
Optionally, after obtaining the context features of each result document, performing full connection processing on each context feature to obtain the context feature after full connection; and inputting the fully-connected context features to a mean processing module together to obtain document associated features corresponding to the second result.
In this way, the transducer model can learn the context information in the feature well, so that the document feature of each result document is recognized by the transducer model, and an accurate context feature can be obtained. And obtaining accurate document association features corresponding to the second result by using the accurate context features.
S104: and sorting the first results and the second results according to the document associated features of the second results and the result click rate of the first results.
Here, the result click rate of the first result may be a click rate of the first result within a preset history period.
In the implementation, the target click rate of each second result can be determined according to the document associated features of the second result. For example, a click rate prediction model may be utilized to output a target click rate for the second result based on the document-associated features of the second result. Then, the first results and the second results may be ranked according to the result click rate of each first result and the target click rate of each second result in order of the click rate from high to low.
In one embodiment, for S104, the following steps may be implemented:
s104-1: and determining the first click rate of the second result on the first genre and the second click rate of the second result on the second genre corresponding to the second result according to the document association characteristic of the second result.
Here, the second genre corresponding to the second result is the second genre possessed by the second result. The first click rate is used for representing the click rate of the search results of the same subject object as the second result in the search results of the first genre. For example, when the second result is a user card, the subject object may be a user in the user card, and the search result having the same subject object as the second result may be a video uploaded by the user in the user card. For example, when the second result is a music card, the subject object may be music in the music card, and the search result having the same subject object as the second result may be video in which the background music is music in the music card. For another example, when the second result is a topic tag card, the topic object may be a topic tag in the topic tag card, and the search result having the same topic object as the second result may be a video including the topic tag in the topic tag card.
For example, after obtaining the document associated feature of the second result, the mmoe module in the click rate prediction model may be used to perform feature recognition on the document associated feature to obtain a first click rate of the second result on the first genre and a second click rate of the second result on the second genre. The mmoe model is obtained by combining m expert-of-expertise (moe) modules with gating structures, and can have more independence besides sharing information when learning each task.
S104-2: and determining the target click rate of the second result according to the first click rate and the second click rate.
For example, the average of the first click rate and the second click rate may be used as the target click rate for the second result.
Alternatively, the maximum value of the first click rate and the second click rate may be set as the target click rate of the second result.
Or, different preset weights can be set for the first click rate and the second click rate, and then the first click rate is weighted by the preset weight corresponding to the first click rate to obtain a first weighted value; meanwhile, the second click rate can be weighted by using a preset weight corresponding to the second click rate, so as to obtain a second weighted value. Then, the sum of the first weighted value and the second weighted value can be used as a target click rate; or the maximum value of the first weighted value and the second weighted value may be taken as the target click rate.
As shown in fig. 4, a model prediction schematic diagram provided by the embodiment of the present disclosure, specifically, session indicates source features corresponding to source information of a search word, query indicates target word features corresponding to the search word, query-vecs indicates word segmentation features corresponding to the search word, and doc indicates a result document. user represents a dimension feature in the user information dimension, which may include a plurality of target features in the user information dimension; hashtag represents dimension characteristics in the dimension of the topic label information and can also comprise a plurality of target characteristics in the dimension of the topic label information; music represents a dimension feature in the music content dimension, and may also include a plurality of target features in the music content dimension. user-ffm represents a first stitching feature in the user information dimension, hashtag-ffm represents a first stitching feature in the topic tag information dimension, music-ffm represents a first stitching feature in the music content dimension, wide-features represents a second stitching feature, qd-match1 represents a first matching feature, qd-match2 represents a second matching feature. In the case where the result documents in the second result include 3, the resulting document features may include 3, in FIG. 4, doc-1 represents the document features of the first result document in the second result, doc-2 represents the document features of the second result document in the second result, and doc-3 represents the document features of the third result document in the second result. The transducer represents a transducer model module in the click rate prediction model, the mean module represents a mean processing module, the video-nobias represents a first click rate, the card-nobias represents a second click rate, and the card-bias represents a third click rate which is only used in the training process and is used for eliminating the influence of position offset information and result size information of the second result on the accuracy of the predicted second click rate. Bias-features represent Bias features corresponding to position Bias information used in the training process, and the Bias features may include position Bias features, and/or result size features corresponding to result size information. The third click rate is output with bias because the third click rate uses the position bias feature and the result size feature, and the second click rate is a click rate without bias because the second click rate is a click rate that avoids the influence of the position bias feature and the result size feature. The intent represents the intent score that the search term output by the intent prediction model corresponds to under each second body cut, respectively, wherein the label sharing represents that the intent prediction model and the click rate prediction model can be trained using the same sample search term and the same sample search result (with the second body cut), so that the labels used are the same, but the trained intent prediction model and click rate prediction model can be decoupled from each other.
S104-3: and sorting the first results and the second results according to the target click rate of each second result and the result click rate of each first result.
For example, the first results and the second results may be ranked according to the target click rate of each second result and the result click rate of each first result in order of higher click rate.
Here, since the trained model has reliable prediction accuracy, the first click rate and the second click rate determined by using the trained click rate prediction model have reliable accuracy, and the accurate target click rate can be obtained by determining the target click rate by using the accurate first click rate and the accurate second click rate. And furthermore, the second results are ranked by using the accurate target click rate, so that the rationality and accuracy of the determined ranking results can be improved.
Therefore, by using the text attribute of the result document and the dimension information under a plurality of preset feature dimensions and combining the search word and the corresponding source information, the document feature can be determined by utilizing the information under multiple angles, so that the richness of the information included in the determined document feature can be improved, and the accuracy of the document feature in representing the result document can be improved. Because the plurality of result documents exist in the second result, the ordering of the second result may be influenced, so that the document features of the result documents are utilized to determine the article association features corresponding to the second result, the influence of the association between the result documents on the ordering of the second result may be fully considered, and reasonable and accurate article association features are obtained. And finally, ordering the second results by utilizing the article association features, so that the ordering optimization of the second results can be realized, the rationality of the ordering results is improved, and the utilization rate of the search results is improved.
For example, where the search results include a plurality of videos, one music card, and one user card, a first click rate of the music card on the video genre and a second click rate on the music card genre may be output for the music card using the click rate prediction model. And determining the target click rate according to the average value of the first click rate and the second click rate. For a music card as the second result, a click rate prediction model may be utilized to output a first click rate of the music card on the video genre and a second click rate on the music card genre. And determining the target click rate according to the average value of the first click rate and the second click rate. For the user card, a click rate prediction model may be utilized to output a first click rate of the user card on the video genre and a second click rate on the user card genre. And determining the target click rate according to the average value of the first click rate and the second click rate. Then, the videos, the music cards and the user cards can be ordered according to the corresponding result click rate of the videos, the target click rate of the music cards and the target click rate of the user cards.
In an embodiment, as can be seen from the foregoing embodiment, the first click rate and the second click rate may be output by using a pre-trained click rate prediction model, so that the embodiment of the disclosure further provides a model training method to obtain a trained click rate prediction model. As shown in fig. 5, a flowchart of a method for training a click rate prediction model according to an embodiment of the disclosure may include the following steps:
s501: acquiring sample search words and sample results matched with the sample search words; the sample result has a second fit and includes at least one sample result document.
The sample search word is used for training the click rate prediction model to be trained, and the sample result is obtained after searching by using the sample search word and has the second genre. The sample result document may be a result document included in a sample result, one sample result may include at least one sample result document, and the number of sample result documents included in different sample results may be different.
In practice, a large number of sample search terms may be obtained, as well as individual sample results that match each sample search term.
S502: and inputting each sample result document, sample source information of the sample search word and the sample search word in the sample result into a click rate prediction model to be trained to obtain a first predicted click rate of the sample result on a first genre and a second predicted click rate of the sample result on a second genre.
The first predicted click rate is the predicted click rate of the sample result on the first genre, which is the click rate prediction model to be trained; the second predicted click rate is the predicted click rate of the sample result on the second genre of the sample result, and the predicted click rate is the click rate prediction model to be trained.
In specific implementation, each sample result document, sample source information of a sample search word and the sample search word in the sample result can be input into a click rate prediction model to be trained, and the input information is processed by using the click rate prediction model to be trained to obtain a first predicted click rate and a second predicted click rate corresponding to the sample result.
S503: determining a first loss according to the first predicted click rate and a first standard click rate corresponding to the sample result, and determining a second loss according to the second predicted click rate and a second standard click rate corresponding to the sample result; the first standard click rate is determined according to whether a target result matched with the sample result is clicked or not, wherein the target result has a first body cut and comprises key information in the sample result; the second standard click rate is determined based on whether the sample result is clicked.
Here, the target result matched with the sample result is the search result of the first genre obtained when the sample search word is used for searching, and the search result includes the key information in the sample result. The key information may be a subject object in the sample result, for example, when the sample result is a user card, the subject object may be a user in the user card, and the target result having the same subject object as the sample result may be a video uploaded by the user in the user card.
For example, if the target result is clicked, the first standard click rate is 1, that is, the label corresponding to the first predicted click rate is 1; if the target result is not clicked, the first standard click rate is 0, that is, the label corresponding to the first predicted click rate is 0.
If the sample result is clicked, the second standard click rate is 1, namely, the label corresponding to the second predicted click rate is 1; if the sample result is not clicked, the second standard click rate is 0, that is, the label corresponding to the second predicted click rate is 0.
In particular, in the case where the click rate prediction model does not include a branch for predicting the third click rate (i.e., there is no branch for predicting the card-bias), the first loss for predicting the video-bias may be determined based on the first predicted click rate and the first standard click rate corresponding to the sample result after the first predicted click rate and the second predicted click rate are obtained. Meanwhile, a second loss of branches for predicting card-nobias may be determined according to a second predicted click rate and a second standard click rate corresponding to the sample result.
S504: and carrying out iterative training on the click rate prediction model to be trained by utilizing the first loss and the second loss until a preset training cut-off condition is met, so as to obtain a trained click rate prediction model.
Here, the preset training cutoff condition may be that the number of rounds of iterative training reaches a preset number of rounds, and/or the prediction accuracy of the trained model reaches a preset accuracy.
For example, the first loss may be used to iteratively train the branches for predicting video-nobias, and the second loss may be used to iteratively train the branches for predicting card-nobias until a preset training cutoff condition is met, to obtain a trained click rate prediction model.
Or, the total loss can be determined according to the first loss and the second loss, and then the click rate prediction model to be trained is iteratively trained by utilizing the total loss until a preset training cut-off condition is met, so that the trained click rate prediction model is obtained.
Optionally, to further improve the prediction accuracy of the branch for predicting the card-nobias, a branch for predicting the third click rate (i.e. a branch for predicting the card-bias) may be used in the training process, and specifically, the position offset information corresponding to the sample result, the sample result document, the sample source information and the sample search term may be input together to the click rate prediction model to be trained. The position offset information corresponding to the sample result is used for representing information capable of affecting the second click rate of the model prediction, and may include a display position of the sample result in the search result page and an occupied size of the sample result (for example, when the sample result is a music card, the occupied size may be a size of the music card). Then, the click rate prediction model to be trained can determine the position Bias feature and the result size feature corresponding to the sample result according to the position information, and splice the two features into Bias features (namely Bias-features).
Then, after the click rate prediction model to be trained obtains the predicted document associated feature corresponding to the sample result, a branch for predicting the card-bias can be utilized to output a third predicted click rate corresponding to the sample result on the second genre according to the bias feature and the predicted document associated feature. Meanwhile, a first predicted click rate and a second predicted click rate can be respectively output according to the predicted document association characteristics by using the branches of the predicted video-nobias and the branches of the predicted card-nobias. Then, determining a first loss of branches used for predicting video-nobias according to the first predicted click rate and a first standard click rate corresponding to the sample result; determining a second loss of the branch for predicting the card-bias according to the second predicted click rate and a second standard click rate corresponding to the sample result, and determining a third loss of the branch for predicting the card-bias according to the third predicted click rate and the second standard click rate corresponding to the sample result. Finally, the first loss, the second loss and the third loss can be utilized to determine the total loss, and then the click rate prediction model to be trained is iteratively trained by utilizing the total loss until the preset training cut-off condition is met, so that the trained click rate prediction model is obtained.
It should be noted that, according to the first predicted click rate and the second predicted click rate output by the trained click rate prediction model, the determined target predicted click rate has a distribution close to the distribution of the target click rate determined by the following formula:
ctr= (clicka+clickb+clickc)/impreA; (equation I)
Wherein CTR represents target click rate, and when the second result comprises three result documents, clickA represents total click times of a first result document in the second result, wherein the total click times are the total times of the first result document in the second result when the search word is used for carrying out historical search for a plurality of times in a preset historical time period; the clickB represents the total clicking times of the second result documents in the second result, wherein the total clicking times are the total times of the second result documents in the second result when the second result documents are clicked in the historical search by using the search word in the preset historical time period; the clickC represents the total number of clicks of a third result document in the second result, where the total number of clicks is the total number of clicks of the third result document in the second result obtained when a historical search is performed using the search term in a preset historical time period. The impreA represents the total showing times of the first result document in the second result, wherein the total showing times are the times of the second result obtained when searching for historical times by using the search word in a preset historical time period.
In one embodiment, as can be seen from the foregoing embodiments, the intent score may be output by using a pre-trained intent prediction model, so that the embodiments of the present disclosure further provide another model training method to obtain a trained intent prediction model. As shown in fig. 6, a flowchart of a method for training an intent prediction model provided by an embodiment of the present disclosure may include the following steps:
s601: acquiring sample search words and sample results matched with the sample search words; the sample result has a second genre.
Here, the sample search word that trains the intent prediction model to be trained may be the sample search word that trains the click rate prediction model to be trained. The sample result of training the intention prediction model to be trained can also be the sample result of training the click rate prediction model to be trained.
For the specific implementation steps of S601, reference may be made to S501, which is not described herein.
S602: and inputting the sample search word into an intention prediction model to be trained to obtain a predicted intention score of the sample search word under a second body cut corresponding to the sample result.
Here, the predicted intent score may be an intent prediction model to be trained, and the output sample search term cuts out the intent score at a second body possessed by the sample result.
In specific implementation, the intention prediction model to be trained can determine sample word segmentation corresponding to the sample search word, and then determine sample target characteristics corresponding to the sample search word and sample word segmentation characteristics corresponding to each sample word segmentation. And then, determining the predicted intention score of the sample search word under the second body cut corresponding to the sample result by using the sample target feature and each sample word segmentation feature.
S603: determining a third loss according to the label score and the predicted intention score corresponding to the sample result; the tag score is determined based on whether the sample result is clicked.
Here, if the sample result is clicked, the tag score is 1, that is, the label corresponding to the predicted intention score is 1; if the sample result is not clicked, the label score is 0, that is, the label corresponding to the predicted intention score is 0.
In specific implementation, the label score and the predicted intention score corresponding to the sample result can be used for calculating the third loss of the intention prediction model to be trained.
S604: and carrying out iterative training on the intention prediction model to be trained by utilizing the third loss until a preset training cut-off condition is met, so as to obtain a trained intention prediction model.
Here, the preset training cutoff condition may be that the number of rounds of iterative training reaches a preset number of rounds, and/or the prediction accuracy of the trained model reaches a preset accuracy.
Optionally, when the intention prediction model to be trained is learned, the prediction intention score corresponding to the sample search word under each second body can be output. Since the tag score is one of 0 and 1, the tag score under the second genre with respect to the sample search word in the other second genres than the second genre with the sample result may be 1 if the sample result is clicked, and the tag score of the other second genres may be 0; if the sample result is not clicked, the sample result may have a label score of 0 for the second genre and a label score of 1 for the other second genres. At this time, when calculating the third loss, the first sub-loss may be determined according to the label score of the sample result under the second genre that the sample result has and the predicted intention score of the sample result under the second genre. And determining a second sub-loss according to the label scores of the sample results under other second communities and the predicted intention scores of the sample results under other second communities. The third loss is determined using the first sub-loss and the second sub-loss. The intent prediction model to be trained may then be iteratively trained using the third penalty.
For example, in the case that the second genre includes the user genre, the music genre, and the topic label genre, if the sample result is the music genre, the sample result is clicked, the label score of the sample result under the music genre is 1, and the label scores under the user genre and the topic label genre are 0. Then, the first sub-loss may be determined according to the intention prediction model to be trained, the output predicted intention score under the music card genre, and the tag score 1. According to the intention prediction model to be trained, the output predicted intention score and label score 0 under the user card genre determine a second sub-loss, and according to the intention prediction model to be trained, the output predicted intention score and label score 0 under the topic label card genre determine another second sub-loss. Then, a third loss may be determined based on the first sub-loss and the two second sub-losses. And finally, carrying out iterative training on the intention prediction model to be trained by utilizing a third loss.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiments of the present disclosure further provide a search result sorting device corresponding to the search result sorting method, and since the principle of solving the problem by the device in the embodiments of the present disclosure is similar to that of the search result sorting method in the embodiments of the present disclosure, the implementation of the device may refer to the implementation of the method, and the repetition is not repeated.
As shown in fig. 7, a schematic diagram of a search result sorting apparatus according to an embodiment of the disclosure includes:
an obtaining module 701, configured to obtain search results matched with a search term, where the search results include first results of a first genre and at least one second result of a second genre other than the first genre; the second result comprises at least one result document;
a first determining module 702, configured to determine, for any one of the result documents in the second result, a document feature corresponding to the result document according to text attribute information of the result document, dimension information of the result document in a plurality of preset feature dimensions, the search term, and source information of the search term;
a second determining module 703, configured to determine, according to the document feature of each of the result documents in the second result, a document associated feature corresponding to the second result;
And the ranking module 704 is configured to rank the first results and the second results according to the document associated features of the second results and the result click rate of the first results.
In a possible implementation manner, the second determining module 703 is configured to, when determining, according to the document feature of each of the result documents in the second result, a document associated feature corresponding to the second result:
determining context characteristics of each result document according to document characteristics of each result document in the second result, and determining document association characteristics corresponding to the second result according to average characteristics of the context characteristics; the contextual features are used to characterize association information between each of the result documents and each of the result documents in the second result.
In a possible implementation manner, the ranking module 704 is configured, when ranking the first results and the second results according to the document associated feature of each of the second results and the result click rate of each of the first results, to:
determining a first click rate of the second result on a first genre and a second click rate of the second result on the second genre corresponding to the second result according to the document association characteristics of the second result;
Determining a target click rate of the second result according to the first click rate and the second click rate;
and sorting the first results and the second results according to the target click rate of each second result and the result click rate of each first result.
In one possible implementation manner, the dimension information in any one of the preset feature dimensions includes a plurality of dimension information;
the first determining module 702 is configured to, when determining, for any one of the second results, a document feature corresponding to the result document according to text attribute information of the result document, dimension information of the result document in a plurality of preset feature dimensions, the search term, and source information of the search term:
determining a plurality of target features in any preset feature dimension according to a plurality of dimension information of the result document in the preset feature dimension;
splicing the target features in the preset feature dimension to obtain a first spliced feature in the preset feature dimension;
splicing the source characteristics corresponding to the source information, the text attribute characteristics corresponding to the text attribute information, the target characteristics under a plurality of preset characteristic dimensions and the search word characteristics corresponding to the search word to obtain second spliced characteristics;
And determining the document characteristics corresponding to the result document according to the first splicing characteristics and the second splicing characteristics.
In a possible implementation manner, the search word features include target word features corresponding to the search words and word segmentation features corresponding to each word segmentation in the search words;
the first determining module 702 is configured to, when determining the document feature corresponding to the result document according to the first stitching feature and the second stitching feature:
determining a first matching feature according to the matching degree between the target word feature and the text attribute feature;
determining a second matching feature according to the matching degree between the word segmentation feature and the text attribute feature;
and performing full connection processing on the first splicing characteristic, the second splicing characteristic, the first matching characteristic and the second matching characteristic to obtain the document characteristic corresponding to the result document.
In one possible embodiment, the second body cut includes a plurality, and the apparatus further includes:
a third determining module 705, configured to determine, according to the target word feature and the word segmentation feature, intent scores of the search word corresponding to the second body in each of the second body cuts;
And determining target intents corresponding to the search words according to the intention scores respectively corresponding to the second body cuts, wherein the target intents are used for indicating the second body cuts with the highest matching degree with the search words.
In one possible implementation, the first click rate and the second click rate are output using a pre-trained click rate prediction model, and the apparatus further comprises:
the first training module 706 is configured to train to obtain the click rate prediction model according to the following steps:
acquiring a sample search word and a sample result matched with the sample search word; the sample result having the second genre and comprising at least one sample result document;
inputting each sample result document, sample source information of the sample search word and the sample search word in the sample result into a click rate prediction model to be trained to obtain a first predicted click rate of the sample result on a first genre and a second predicted click rate of the sample result on a second genre;
determining a first loss according to the first predicted click rate and a first standard click rate corresponding to the sample result, and determining a second loss according to the second predicted click rate and a second standard click rate corresponding to the sample result; the first standard click rate is determined according to whether a target result matched with the sample result is clicked or not, wherein the target result has the first genre and comprises key information in the sample result; a second standard click rate is determined according to whether the sample result is clicked or not;
And carrying out iterative training on the click rate prediction model to be trained by utilizing the first loss and the second loss until a preset training cut-off condition is met, so as to obtain a trained click rate prediction model.
In one possible embodiment, the intent score is output using a pre-trained intent prediction model, the apparatus further comprising:
a second training module 707 for training to obtain the intent prediction model according to the following steps:
acquiring a sample search word and a sample result matched with the sample search word; the sample result having the second genre;
inputting the sample search word into an intention prediction model to be trained to obtain a predicted intention score of the sample search word under a second body cut corresponding to the sample result;
determining a third loss according to the label score corresponding to the sample result and the prediction intention score; the label score is determined according to whether the sample result is clicked or not;
and carrying out iterative training on the intention prediction model to be trained by utilizing the third loss until a preset training cut-off condition is met, so as to obtain a trained intention prediction model.
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
Based on the same technical conception, the embodiment of the application also provides computer equipment. Referring to fig. 8, a schematic structural diagram of a computer device according to an embodiment of the present application includes:
a processor 81, a memory 82 and a bus 83. The memory 82 stores machine-readable instructions executable by the processor 81, and the processor 81 is configured to execute the machine-readable instructions stored in the memory 82, where the machine-readable instructions are executed by the processor 81, and the processor 81 performs the following steps: s101: obtaining search results matched with the search word, wherein the search results comprise first results of a first genre and at least one second result of a second genre except the first genre; the second result comprises at least one result document; s102: determining the document characteristics corresponding to the result document according to the text attribute information of the result document, the dimension information of the result document in a plurality of preset feature dimensions, the search words and the source information of the search words aiming at any result document in the second result; s103: according to the document characteristics of each result document in the second result, determining the document associated characteristics corresponding to the second result and S104: and sorting the first results and the second results according to the document associated features of the second results and the result click rate of the first results.
The memory 82 includes a memory 821 and an external memory 822; the memory 821 is also referred to as an internal memory herein, and is used for temporarily storing operation data in the processor 81 and data exchanged with the external memory 822 such as a hard disk, and the processor 81 exchanges data with the external memory 822 through the memory 821, and when the computer device is running, the processor 81 and the memory 82 communicate with each other through the bus 83, so that the processor 81 executes the execution instructions mentioned in the above method embodiment.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the search result ranking method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The computer program product of the search result sorting method provided in the embodiments of the present disclosure includes a computer readable storage medium storing program code, where the program code includes instructions for executing the steps of the search result sorting method described in the above method embodiments, and the detailed description of the method embodiments will be omitted herein.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding process in the foregoing method embodiment for the specific working process of the apparatus described above, which is not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions in actual implementation, and for example, multiple units or components may be combined, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
If the technical scheme of the application relates to personal information, the product applying the technical scheme of the application clearly informs the personal information processing rule before processing the personal information and obtains the autonomous agreement of the individual. If the technical scheme of the application relates to sensitive personal information, the product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'explicit consent'. For example, a clear and remarkable mark is set at a personal information acquisition device such as a camera to inform that the personal information acquisition range is entered, personal information is acquired, and if the personal voluntarily enters the acquisition range, the personal information is considered as consent to be acquired; or on the device for processing the personal information, under the condition that obvious identification/information is utilized to inform the personal information processing rule, personal authorization is obtained by popup information or a person is requested to upload personal information and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing mode, and a type of personal information to be processed.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (11)

1. A method for ranking search results, comprising:
obtaining search results matched with the search word, wherein the search results comprise first results of a first genre and at least one second result of a second genre except the first genre; the second result comprises at least one result document;
determining document features corresponding to the result document according to text attribute information of the result document, dimension information of the result document in a plurality of preset feature dimensions, the search words and source information of the search words aiming at any one of the second results;
determining the document association characteristics corresponding to the second result according to the document characteristics of each result document in the second result;
and sorting the first results and the second results according to the document associated features of the second results and the result click rate of the first results.
2. The method of claim 1, wherein determining the document associated feature corresponding to the second result based on the document feature of each of the result documents in the second result comprises:
Determining context characteristics of each result document according to document characteristics of each result document in the second result, and determining document association characteristics corresponding to the second result according to average characteristics of the context characteristics; the contextual features are used to characterize association information between each of the result documents and each of the result documents in the second result.
3. The method of claim 1, wherein the ranking the first results and the second results according to the document associated features of each of the second results and the result click rate of each of the first results comprises:
determining a first click rate of the second result on a first genre and a second click rate of the second result on the second genre corresponding to the second result according to the document association characteristics of the second result;
determining a target click rate of the second result according to the first click rate and the second click rate;
and sorting the first results and the second results according to the target click rate of each second result and the result click rate of each first result.
4. The method of claim 1, wherein the dimension information in any of the predetermined feature dimensions includes a plurality of;
Determining, for any one of the second results, a document feature corresponding to the result document according to text attribute information of the result document, dimension information of the result document in a plurality of preset feature dimensions, the search word, and source information of the search word, including:
determining a plurality of target features in any preset feature dimension according to a plurality of dimension information of the result document in the preset feature dimension;
splicing the target features in the preset feature dimension to obtain a first spliced feature in the preset feature dimension;
splicing the source characteristics corresponding to the source information, the text attribute characteristics corresponding to the text attribute information, the target characteristics under a plurality of preset characteristic dimensions and the search word characteristics corresponding to the search word to obtain second spliced characteristics;
and determining the document characteristics corresponding to the result document according to the first splicing characteristics and the second splicing characteristics.
5. The method of claim 4, wherein the search term features include target term features corresponding to the search term and word segmentation features corresponding to each of the search terms;
The determining the document features corresponding to the result document according to the first splicing features and the second splicing features includes:
determining a first matching feature according to the matching degree between the target word feature and the text attribute feature;
determining a second matching feature according to the matching degree between the word segmentation feature and the text attribute feature;
and performing full connection processing on the first splicing characteristic, the second splicing characteristic, the first matching characteristic and the second matching characteristic to obtain the document characteristic corresponding to the result document.
6. The method of claim 5, wherein the second body cut includes a plurality, the method further comprising:
determining intention scores of the search words respectively corresponding to the second body according to the target word characteristics and the word segmentation characteristics;
and determining target intents corresponding to the search words according to the intention scores respectively corresponding to the second body cuts, wherein the target intents are used for indicating the second body cuts with the highest matching degree with the search words.
7. The method of claim 3, wherein the first click rate and the second click rate are output using a pre-trained click rate prediction model trained according to the steps of:
Acquiring a sample search word and a sample result matched with the sample search word; the sample result having the second genre and comprising at least one sample result document;
inputting each sample result document, sample source information of the sample search word and the sample search word in the sample result into a click rate prediction model to be trained to obtain a first predicted click rate of the sample result on a first genre and a second predicted click rate of the sample result on a second genre;
determining a first loss according to the first predicted click rate and a first standard click rate corresponding to the sample result, and determining a second loss according to the second predicted click rate and a second standard click rate corresponding to the sample result; the first standard click rate is determined according to whether a target result matched with the sample result is clicked or not, wherein the target result has the first genre and comprises key information in the sample result; a second standard click rate is determined according to whether the sample result is clicked or not;
and carrying out iterative training on the click rate prediction model to be trained by utilizing the first loss and the second loss until a preset training cut-off condition is met, so as to obtain a trained click rate prediction model.
8. The method of claim 6, wherein the intent score is output using a pre-trained intent prediction model trained according to the steps of:
acquiring a sample search word and a sample result matched with the sample search word; the sample result having the second genre;
inputting the sample search word into an intention prediction model to be trained to obtain a predicted intention score of the sample search word under a second body cut corresponding to the sample result;
determining a third loss according to the label score corresponding to the sample result and the prediction intention score; the label score is determined according to whether the sample result is clicked or not;
and carrying out iterative training on the intention prediction model to be trained by utilizing the third loss until a preset training cut-off condition is met, so as to obtain a trained intention prediction model.
9. A search result ordering apparatus, comprising:
an acquisition module, configured to acquire search results matched with a search term, where the search results include first results of a first genre and at least one second result of a second genre other than the first genre; the second result comprises at least one result document;
The first determining module is used for determining document characteristics corresponding to the result document according to text attribute information of the result document, dimension information of the result document in a plurality of preset feature dimensions, the search words and source information of the search words for any one of the result documents in the second result;
the second determining module is used for determining the document association characteristics corresponding to the second result according to the document characteristics of each result document in the second result;
and the sorting module is used for sorting the first results and the second results according to the document associated features of the second results and the result click rate of the first results.
10. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor for executing the machine-readable instructions stored in the memory, which when executed by the processor, perform the steps of the search result ordering method of any one of claims 1 to 8.
11. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when run by a computer device, performs the steps of the search result ranking method according to any one of claims 1 to 8.
CN202310539712.XA 2023-05-15 2023-05-15 Search result ordering method and device, computer equipment and storage medium Pending CN116578725A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310539712.XA CN116578725A (en) 2023-05-15 2023-05-15 Search result ordering method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310539712.XA CN116578725A (en) 2023-05-15 2023-05-15 Search result ordering method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116578725A true CN116578725A (en) 2023-08-11

Family

ID=87542671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310539712.XA Pending CN116578725A (en) 2023-05-15 2023-05-15 Search result ordering method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116578725A (en)

Similar Documents

Publication Publication Date Title
CN111444428B (en) Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
US10217058B2 (en) Predicting interesting things and concepts in content
CN106649818B (en) Application search intention identification method and device, application search method and server
Shi et al. Functional and contextual attention-based LSTM for service recommendation in mashup creation
US9846836B2 (en) Modeling interestingness with deep neural networks
CN112148889A (en) Recommendation list generation method and device
Kanwal et al. A review of text-based recommendation systems
US20100191758A1 (en) System and method for improved search relevance using proximity boosting
CN113254711B (en) Interactive image display method and device, computer equipment and storage medium
CN110888990A (en) Text recommendation method, device, equipment and medium
US11194963B1 (en) Auditing citations in a textual document
CN111557000B (en) Accuracy Determination for Media
KR101450453B1 (en) Method and apparatus for recommending contents
CN112328889A (en) Method and device for determining recommended search terms, readable medium and electronic equipment
Xu et al. Learning to annotate via social interaction analytics
CN116956183A (en) Multimedia resource recommendation method, model training method, device and storage medium
CN116361428A (en) Question-answer recall method, device and storage medium
CN111753199B (en) User portrait construction method and device, electronic device and medium
Kumar et al. Classification of Mobile Applications with rich information
CN116578725A (en) Search result ordering method and device, computer equipment and storage medium
US12001462B1 (en) Method and system for multi-level artificial intelligence supercomputer design
CN112148702B (en) File retrieval method and device
Che et al. A feature and deep learning model recommendation system for mobile application
CN110147488A (en) The processing method of content of pages, calculates equipment and storage medium at processing unit
Basile et al. Augmenting a content-based recommender system with tags for cultural heritage personalization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination