CN116306964A - Sample data generation method and device - Google Patents

Sample data generation method and device Download PDF

Info

Publication number
CN116306964A
CN116306964A CN202310108190.8A CN202310108190A CN116306964A CN 116306964 A CN116306964 A CN 116306964A CN 202310108190 A CN202310108190 A CN 202310108190A CN 116306964 A CN116306964 A CN 116306964A
Authority
CN
China
Prior art keywords
search result
search
result record
sample data
records
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310108190.8A
Other languages
Chinese (zh)
Inventor
秦泽民
董大祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310108190.8A priority Critical patent/CN116306964A/en
Publication of CN116306964A publication Critical patent/CN116306964A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method and a device for generating sample data, relates to the technical fields of data processing and artificial intelligence, and particularly relates to the technical fields of big data, intelligent search, deep learning and the like. The specific implementation scheme is as follows: the method comprises the steps of obtaining a search result record list corresponding to a search request, determining search parameters corresponding to each search result record in the search result record list according to the obtained historical search record, and generating sample data according to the search parameters corresponding to each search result record, the search request and the search result record list, wherein the sample data can be generated from the dimension with stronger objectivity, so that the efficiency and the reliability of generating the sample data are improved.

Description

Sample data generation method and device
Technical Field
The disclosure relates to the technical fields of data processing and artificial intelligence, in particular to the technical fields of big data, intelligent search, deep learning and the like, and particularly relates to a method and a device for generating sample data.
Background
The sample data may be applied to training and optimization of the model. For example, applied in a search scenario, sample data may be used for training and optimization of a search model.
In some embodiments, the sample data is obtained mainly by means of manual labeling, and accordingly, the sample data may also be referred to as labeling data.
Disclosure of Invention
The present disclosure provides a method and apparatus for generating sample data for improving the validity of the sample data.
According to a first aspect of the present disclosure, there is provided a method for generating sample data, including:
obtaining a search result record list corresponding to the search request;
determining search parameters corresponding to each search result record in the search result record list according to the acquired historical search records;
and generating sample data according to the search parameters, the search request and the search result record list corresponding to each search result record.
According to a second aspect of the present disclosure, there is provided a generation apparatus of sample data, including:
the acquisition unit is used for acquiring a search result record list corresponding to the search request;
the first determining unit is used for determining search parameters corresponding to each search result record in the search result record list according to the acquired historical search records;
and the generating unit is used for generating sample data according to the search parameters, the search request and the search result record list which are respectively corresponding to the search result records.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to the first aspect.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising: a computer program stored in a readable storage medium, from which it can be read by at least one processor of an electronic device, the at least one processor executing the computer program causing the electronic device to perform the method of the first aspect.
The present disclosure provides a method and an apparatus for generating sample data, including: the method comprises the steps of obtaining a search result record list corresponding to a search request, determining search parameters corresponding to each search result record in the search result record list according to the obtained historical search record, generating sample data according to the search parameters corresponding to each search result record, the search request and the search result record list, determining the search parameters corresponding to each search result record based on the historical search record, and combining the technical characteristics of the sample data generated by each search parameter, so that the sample data can be generated from the dimension with stronger objectivity, and the efficiency and the reliability of generating the sample data are improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an editing process of an embodiment of the present disclosure;
FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 6 is a schematic diagram according to a fifth embodiment of the present disclosure;
FIG. 7 is a schematic diagram according to a sixth embodiment of the present disclosure;
fig. 8 is a block diagram of an electronic device for implementing a method of generating sample data of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Search engine technology is a widely used information technology, and is widely used in the fields of mobile internet, enterprise service, and the like.
When the search engine technology is applied to the mobile internet scene, the search engine technology can index web page contents and can realize refined services such as recall, sorting and the like according to search (query) requests of users.
For example, the search system is a system implemented based on search engine technology, and the search system may receive a search request initiated by a user to obtain and feed back search results corresponding to the search request based on the search engine technology.
When the search engine technology is applied to the enterprise service scenario, the search engine technology is mainly used for searching internal information of an enterprise on one hand and providing searching capability in an external informatization application of the enterprise on the other hand.
Especially when the search engine technology is applied to an enterprise service scenario, the architecture design requirements of the search system are light weight, easy maintenance, expandable and the like. Meanwhile, the search system is a product function related to specific data, and along with the evolution of the use habit of a user and the continuously updated database, the search effect supported by the search system is a factor which must be considered.
Therefore, whether from the construction of a search system or the evaluation of the effect of a search by a search system is an important content of search engine technology.
With the development of artificial intelligence technology, in some embodiments, a search model (which may also be referred to as a query model, etc.) may be constructed to implement recall, sort output, etc. of search requests based on the search model.
For example, sample data may be obtained to train the underlying network model based on the sample data to obtain a search model, and a search system may be built based on the search model to implement recall of a search request, and search effects of the search model may be evaluated based on the sample data to implement optimization of the search model.
That is, both training and optimizing the search model are strongly dependent on the sample data, and the quality of the sample data determines the search effect of the search model to a certain extent and also determines the optimization effect of the search model to a certain extent.
In some embodiments, the sample data is obtained primarily by way of labeling, and thus, the sample data may also be referred to as labeling data. And sample data can be obtained by manual labeling.
For example, according to a search result list acquired by a search request, the search result list includes search result records, and the search result records are output one by one, so that marking operation is performed on the currently output search result records by a marking person.
Correspondingly, in response to the scoring of the labeling personnel corresponding to each search result record, scoring processing is carried out on each search result record, such as the search system records, the scoring of each search result record is reserved, and sample data is derived.
The sample data comprises a plurality of pieces of data, one search result record corresponds to one piece of sample data in the sample data, and one piece of data in the sample data comprises a search request, a search result record and a score.
However, when the method is adopted to obtain the sample data, the labeling personnel can hardly accurately quantify the score of each search record result, and particularly when scoring is carried out on the search results of massive search requests, the consumed resources and the cost are high, the large-scale landing is difficult, the influence of human subjective factors is avoided, and the accuracy and the reliability of the sample data are relatively low.
In order to avoid the technical problems described above, the present disclosure provides a technical idea of creative work: and combining the historical search record, the search request and the search result record list to generate sample data.
Based on the technical conception, the present disclosure provides a method and an apparatus for generating sample data, which are applied to the technical fields of data processing and artificial intelligence, and in particular relates to the technical fields of big data, intelligent search, deep learning and the like, so as to improve the effectiveness and reliability of sample data generation.
Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure, and as shown in fig. 1, a method for generating sample data according to an embodiment of the present disclosure includes:
s101: and obtaining a search result record list corresponding to the search request.
The execution body of the present embodiment may be, for example, a generating device of sample data (hereinafter simply referred to as a generating device), where the generating device may be a server, may be a terminal device, may be a processor, may be a chip, or the like, and is not listed here.
If the generating device is a server, the generating device may be a local server, a cloud server, a server cluster, or an independent server, which is not limited in this embodiment.
The search result record column includes one or more search result records, the content of which may be different for different search scenarios, e.g., the search result records may be title records (which may also be referred to as name records, etc.).
For example, the search system may recall one or more search result records for one search request, while the content of the search result records may be different for different search scenarios, such as the search result records may be title records. Accordingly, one or more title records may be included in the search result record list.
For example, for a search request, the search system may recall a search result record, including the search result record in the search result record list; the search system may recall multiple search result records, and the search result record list includes multiple search result records, and one search result record may be one title record.
The search result records in the search result record list may be full search result records or partial search result records.
For example, if the number of search result records is N (N is a positive integer greater than or equal to 1), the search result record list may include N search result records, or may include a portion of the search result records obtained from the N search result records.
S102: and determining search parameters corresponding to each search result record in the search result record list according to the acquired historical search records.
By way of example, a historical search record may be understood as a historical search record based on the search system performing a search task. The history search record comprises one or more of a history search request, a history recall result corresponding to the history search request and a browsing record aiming at the history recall result.
Wherein the search task may include a search task for a search request.
The present embodiment does not limit the manner of acquiring the history search record. For example, the generating means may be communicatively coupled to a search system, which may store the historical search record and transmit the historical search record to the generating means, and the generating means may obtain the historical search record accordingly.
The search parameter may be understood as a browsing-related parameter corresponding to the search result record, such as a click rate of the search result record, etc.
S103: and generating sample data according to the search parameters, the search request and the search result record list corresponding to each search result record.
By combining the search parameters to generate sample data, the defect of low accuracy of the sample data caused by an artificial mode in the embodiment can be avoided, and the effectiveness and reliability of the sample data are improved.
Based on the above analysis, the present disclosure provides a method for generating sample data, including: the method comprises the steps of obtaining a search result record list corresponding to a search request, determining search parameters corresponding to each search result record in the search result record list according to the obtained historical search record, and generating sample data according to the search parameters corresponding to each search result record, the search request and the search result record list.
As can be seen from the above examples, the search parameter may be a click rate, and for convenience of readers to understand the implementation principle of the present disclosure, the method for generating sample data of the present disclosure will be described in more detail with reference to fig. 2 from the dimension of the click rate. Wherein fig. 2 is a schematic diagram according to a second embodiment of the disclosure, and as shown in fig. 2, a method for generating sample data according to an embodiment of the disclosure includes:
S201: and obtaining a search result record list corresponding to the search request.
It should be understood that, in order to avoid the cumbersome statement, the technical features of this embodiment that are the same as those of the above embodiment are not repeated.
For example, regarding the implementation principle of S201, reference may be made to the description of S101, which is not repeated here.
S202: and determining search parameters corresponding to each search result record in the search result record list according to the acquired historical search records.
For example, regarding the implementation principle of S202, reference may be made to the description of S102, which is not repeated here.
S203: and calculating to obtain the scores corresponding to the search result records according to the search parameters corresponding to the search result records.
In combination with the above analysis, in some embodiments, the score corresponding to each search result record may be determined based on the scoring operation of the labeling personnel, however, the manner of determining the score is inefficient and has low accuracy.
Therefore, in this embodiment, the score is determined by combining the search parameters, and because the search parameters can objectively represent the search information recorded by the search results, by determining the score based on the search parameters, the image of the human factors on the accuracy of the score can be avoided, thereby improving the effectiveness and reliability of the score.
In some embodiments, the search parameter is click through rate; s203 may include the steps of:
a first step of: and acquiring the click rate of the first search result record in the search result record list.
The click rate may be specifically understood as a click rate of an actual search display position, and the click rate may be a normalized click rate. For example, when the search system recalls a search result record, the search result record may be output by a display device communicatively coupled to the search system, and the location at which the search result record is output may be referred to as the actual search presentation location.
And a second step of: for each search result record, determining a score for the search result record based on the click rate of the search result record and the click rate of the first search result record.
Illustratively, the search result record list includes n (n is a positive integer greater than or equal to 1) search result records, and is referred to as search result record 1, search result record 2, and up to search result record n, respectively.
Correspondingly, aiming at any one search result record i (i is more than or equal to 1 and less than or equal to n) in the n search result records, the score of the search result record i can be calculated based on the click rate of the search result record 1 and the click rate of the search result record i.
In this embodiment, by combining the click rate of the first search result record and the click rates of other search result records, the scores of the other search result records are determined, so that a high correlation between the scores and the click rates can be achieved, the scores have higher objectivity, and the effectiveness and reliability of the scores are further improved.
In some embodiments, the second step may include: for each search result record, a ratio of the click rate of the search result record to the click rate of the first search result record is calculated and determined as a score of the search result record.
Illustratively, in combination with the above example, a score rel for search result record i may be calculated based on equation 1 i Formula 1:
Figure BDA0004076054910000081
wherein C is i Record the click rate of i for search results, C 1 Click rate recorded for the first search result.
S204: and generating sample data according to the scores, the search requests and the search result record list corresponding to each search result record.
Accordingly, the scoring has higher objectivity, effectiveness and reliability, so that the sample data generated based on the scoring has higher effective and reliable technical effects.
In other embodiments, sample data may also be generated in connection with editing operations. The method of generating sample data of an embodiment of the present disclosure will now be described in detail with reference to fig. 3 from the dimension of the editing operation. Wherein, fig. 3 is a schematic diagram according to a third embodiment of the disclosure, and as shown in fig. 3, a method for generating sample data according to an embodiment of the disclosure includes:
s301: and obtaining a search result record list corresponding to the search request.
Similarly, in order to avoid the cumbersome statement, the technical features of this embodiment that are the same as those of the above embodiment are not repeated.
For example, for the implementation principle of S301, reference may be made to the description of S101, which is not repeated here.
S302: and outputting a search request and a search result record list.
The generating means may comprise a display, and the search request and the list of search result records may be output and displayed via the display.
Accordingly, the annotator can see the search request and the search result record list through the display.
S303: and responding to the editing operation aiming at the search result record list, and editing the search result record list according to the editing operation to obtain the search result record list after editing.
For example, the labeling personnel can edit the search result record list through a display and an editing device (such as a mouse and/or a keyboard, etc.), and correspondingly, the generating device can edit the search result record corresponding to the editing operation based on the editing operation of the labeling personnel, so as to obtain the search result record list after the editing operation.
In this embodiment, the editing operation is combined to edit the search result record list, so as to score the search result record list after the editing operation, so that the click rate is considered and the editing operation is considered in the generation of the sample data, thereby further improving the effectiveness and reliability of the sample data.
In some embodiments, the editing process includes at least one of a new process, a delete process, and a sequence adjustment process.
For example, the editing process may include a new process, or a deletion process, or a sequence adjustment process. Of course, the editing process may also include a new process and a deletion process, or a new process and a sequence adjustment process, or a deletion process and a sequence adjustment process. Of course, the editing process may include a new process, a deletion process, and a sequence adjustment process.
For example, taking the newly added process as an example, as shown in fig. 4, the search result record n+1 may be added after the search result record n.
As another example, taking the deletion process as shown in fig. 4, the search result record 2 may be deleted from the search result record list.
As another example, taking the sequential adjustment process as shown in fig. 4, the search result record 2 may be adjusted to the position of the first search result record of the search result record list.
It should be understood that fig. 4 is merely for exemplary purposes and is not to be construed as limiting the editing process.
For example, the number of times of the addition process, the deletion process, and the sequence adjustment process is not limited, and the positions of the addition process, the deletion process, and the sequence adjustment process are not limited in this embodiment.
For example, if the editing process includes multiple modes, for example, two or three modes, the sequence of the editing processes of the multiple modes is not limited.
In some embodiments, if the editing process includes multiple modes, and the multiple modes include the sequence adjustment process, then the editing process of other modes may be performed first, and then the sequence adjustment process may be performed.
For example, if the editing process includes a new addition process and a sequence adjustment process, the generating device may execute the new addition process first and then execute the sequence adjustment process.
As another example, if the editing process includes a deletion process and a sequence adjustment process, the generating device may execute the deletion process first and then execute the sequence adjustment process.
For another example, if the editing process includes a new process, a deletion process, and a sequence adjustment process, the generating device may execute the new process and the deletion process first (may execute the new process first and then the deletion process, or may execute the deletion process first and then the new process), and then execute the sequence adjustment process.
In the present embodiment, by implementing the editing process of the search result record list from one or more of addition, deletion, and sequence adjustment, flexibility and diversity of the editing process can be implemented.
S304: and determining the search result record subjected to the deletion processing in the search result record list as negative example data. Wherein the sample data includes negative example data.
For example, if a search result record is deleted, it is explained that the degree to which the search result record corresponds to a search request is relatively low, that is, it is difficult for a user to click and browse for the search result record recalled by the search system according to the search request. Therefore, the search result record is determined to be negative example data, so that the data volume of the sample data can be increased, and the effectiveness and reliability of the negative example data can be improved.
In some embodiments, a piece of negative example data includes: the search request and the search result record subjected to the deletion process may also include a score corresponding to the search result record subjected to the deletion process, where the score may be implemented based on the foregoing example, or may be implemented based on other manners, and the embodiment is not limited thereto.
S305: and determining search parameters corresponding to each search result record in the search result record list after editing processing according to the acquired historical search records.
For example, regarding the implementation principle of S305, reference may be made to the description of S102, which is not repeated here.
S306: and generating positive example data according to the search parameters, the search request and the search result record list after editing processing corresponding to each search result record. Wherein the sample data includes positive example data.
Illustratively, in combination with the above examples, this step may be understood as: and generating positive example data according to the residual search result records. Wherein the remaining search result records are search result records other than the search structure record on which the deletion process is performed.
In this embodiment, the positive example data is generated by combining the remaining search result records, so that the positive example data is relatively data with relatively high matching degree for the search request, and thus the positive example data has relatively high reliability and effectiveness.
In some embodiments, a piece of positive example data includes: search requests, one remaining search result record, search parameters corresponding to the remaining search result record (which may specifically be scores determined based on the search parameters as described in the above examples).
In some embodiments, the normalized break cumulative gain (Normalized Discounted Cumulative Gain, NDCG) of the compiled list of search result records may be calculated based on the respective scores of the search result records to evaluate the compiled list of search result records.
For example, the normalized loss accumulation gain NDCG may be calculated based on equation 2, equation 2:
Figure BDA0004076054910000111
wherein,,
Figure BDA0004076054910000112
or (F)>
Figure BDA0004076054910000113
IDCG n For the preset best-arranged DCG n ,rel 1 For the score corresponding to the first search result record in the edited search result record list, n is the edited search result recordAnd the number of the search result records in the record list, i is the ith search result record in the search result record list after editing processing.
S307: the search model is trained and/or optimized based on the sample data.
For example, for a training scenario of a search model, a search model may be trained based on sample data to recall a received search request and feed back a list of search result records based on the search model.
Because the sample data has higher effectiveness and reliability and comprises the positive example data and the negative example data, the search model obtained based on sample data training has stronger recall capability, so that the effectiveness and reliability of the search model can be improved.
For the optimization scene, if the search model is trained in advance, the search model can be optimized based on sample data to obtain the optimized search model, so that recall effectiveness and reliability of the optimized search model are improved.
Fig. 5 is a schematic diagram of a fourth embodiment of the present disclosure, and as shown in fig. 5, a generating apparatus 500 of sample data according to an embodiment of the present disclosure includes:
an obtaining unit 501, configured to obtain a search result record list corresponding to the search request.
The first determining unit 502 is configured to determine, according to the obtained historical search records, search parameters corresponding to each search result record in the search result record list.
The generating unit 503 is configured to generate sample data according to the search parameters, the search request, and the search result record list corresponding to each search result record.
Fig. 6 is a schematic diagram of a fifth embodiment of the present disclosure, and as shown in fig. 6, a generating apparatus 600 of sample data according to an embodiment of the present disclosure includes:
An obtaining unit 601 is configured to obtain a search result record list corresponding to the search request.
The first determining unit 602 is configured to determine, according to the obtained historical search records, search parameters corresponding to each search result record in the search result record list.
The generating unit 603 is configured to generate sample data according to the search parameters, the search request, and the search result record list corresponding to each search result record.
As can be seen in connection with fig. 6, in some embodiments, the generating unit 603 comprises:
the computing subunit 6031 is configured to calculate, according to the search parameters corresponding to each search result record, a score corresponding to each search result record.
In some embodiments, the search parameter is click through rate; a calculation subunit 6031 comprising:
and the acquisition module is used for acquiring the click rate of the first search result record in the search result record list.
And the determining module is used for determining the score of each search result record according to the click rate of the search result record and the click rate of the first search result record.
In some embodiments, the determining module is configured to, for each search result record, calculate a ratio of the click rate of the search result record to the click rate of the first search result record, and determine the ratio as a score for the search result record.
The generating subunit 6032 is configured to generate sample data according to the score, the search request, and the search result record list corresponding to each search result record.
An output unit 604, configured to output a search request and the search result record list.
And an editing unit 605, configured to respond to an editing operation for the search result record list, and perform editing processing on the search result record list according to the editing operation, so as to obtain an edited search result record list.
In some embodiments, the editing process includes at least one of a new process, a delete process, and a sequence adjustment process.
The editing process includes a deletion process; the sample data generating device further includes:
the second determining unit 606 is configured to determine, as negative example data, a search result record subjected to the deletion processing in the search result record list, where the sample data includes negative example data.
In some embodiments, the generating unit 603 is configured to generate the positive example data according to the search parameters corresponding to the remaining search result records, the search request, and the remaining search result records.
Wherein, in the search result record list, the search result records except the search result record subjected to the deletion processing are the rest search result records; the sample data includes positive example data.
In some embodiments, the sample data is used to train and/or optimize a search model.
Fig. 7 is a schematic diagram according to a sixth embodiment of the present disclosure, as shown in fig. 7, an electronic device 700 in the present disclosure may include: a processor 701 and a memory 702.
A memory 702 for storing a program; the memory 702 may include a volatile memory (english: volatile memory), such as a random-access memory (RAM), such as a static random-access memory (SRAM), a double data rate synchronous dynamic random-access memory (DDR SDRAM), etc.; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory 702 is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more of the memories 702 in partitions. And the above-described computer programs, computer instructions, data, etc. may be called by the processor 701.
The computer programs, computer instructions, etc., described above may be stored in one or more of the memories 702 in partitions. And the above-described computer programs, computer instructions, etc. may be invoked by the processor 701.
A processor 701 for executing a computer program stored in a memory 702 to implement the steps of the method according to the above embodiment.
Reference may be made in particular to the description of the embodiments of the method described above.
The processor 701 and the memory 702 may be separate structures or may be integrated structures integrated together. When the processor 701 and the memory 702 are separate structures, the memory 702 and the processor 701 may be coupled by a bus 703.
The electronic device in this embodiment may execute the technical scheme in the above method, and the specific implementation process and the technical principle are the same, which are not described herein again.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any one of the embodiments described above.
Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 801 performs the respective methods and processes described above, for example, a generation method of sample data. For example, in some embodiments, the method of generating sample data may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the sample data generation method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the method of generating sample data in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (21)

1. A method of generating sample data, comprising:
obtaining a search result record list corresponding to the search request;
determining search parameters corresponding to each search result record in the search result record list according to the acquired historical search records;
and generating sample data according to the search parameters, the search request and the search result record list corresponding to each search result record.
2. The method of claim 1, wherein generating sample data from each search result record's respective corresponding search parameter, the search request, and the list of search result records comprises:
according to the search parameters corresponding to each search result record, calculating to obtain the scores corresponding to each search result record;
and generating the sample data according to the scores corresponding to the search result records, the search request and the search result record list.
3. The method of claim 2, wherein the search parameter is click through rate; according to the search parameters corresponding to each search result record, calculating to obtain the scores corresponding to each search result record, including:
acquiring the click rate of a first search result record in the search result record list;
and determining the score of each search result record according to the click rate of the search result record and the click rate of the first search result record.
4. A method according to claim 3, wherein for each search result record, determining a score for that search result record based on the click rate of that search result record and the click rate of the first search result record comprises:
For each search result record, calculating the ratio of the click rate of the search result record to the click rate of the first search result record, and determining the ratio as the score of the search result record.
5. The method of any of claims 1-4, after obtaining a list of search result records corresponding to a search query request, the method further comprising:
outputting the search request and the search result record list;
and responding to the editing operation aiming at the search result record list, and editing the search result record list according to the editing operation to obtain the search result record list after editing.
6. The method of claim 5, wherein the editing process comprises at least one of a new process, a delete process, and a sequence adjustment process.
7. The method of claim 6, the editing process comprising the deletion process; the method further comprises the steps of:
and determining the search result record subjected to the deleting process in the search result record list as negative example data, wherein the sample data comprise the negative example data.
8. The method of claim 7, wherein generating sample data from each search result record's respective corresponding search parameter, the search request, and the list of search result records comprises:
generating positive example data according to the search parameters, the search request and the residual search result records which correspond to the residual search result records;
wherein, in the search result record list, the search result records other than the search result record subjected to the deletion processing are the remaining search result records; the sample data includes the positive example data.
9. The method according to any of claims 1-8, wherein the sample data is used for training and/or optimizing a search model.
10. A sample data generating apparatus comprising:
the acquisition unit is used for acquiring a search result record list corresponding to the search request;
the first determining unit is used for determining search parameters corresponding to each search result record in the search result record list according to the acquired historical search records;
and the generating unit is used for generating sample data according to the search parameters, the search request and the search result record list which are respectively corresponding to the search result records.
11. The apparatus of claim 10, wherein the generating unit comprises:
the calculating subunit is used for calculating to obtain the scores corresponding to the search result records according to the search parameters corresponding to the search result records;
and the generation subunit is used for generating the sample data according to the scores, the search requests and the search result record list which are respectively corresponding to the search result records.
12. The apparatus of claim 11, wherein the search parameter is click through rate; the computing subunit includes:
the acquisition module is used for acquiring the click rate of the first search result record in the search result record list;
and the determining module is used for determining the score of each search result record according to the click rate of the search result record and the click rate of the first search result record.
13. The apparatus of claim 12, wherein the determining module is configured to, for each search result record, calculate a ratio of the click rate of the search result record to the click rate of the first search result record, and determine the ratio as a score of the search result record.
14. The apparatus of any one of claims 10-13, the apparatus further comprising:
an output unit configured to output the search request and the search result record list;
and the editing unit is used for responding to the editing operation aiming at the search result record list, and carrying out editing processing on the search result record list according to the editing operation to obtain an edited search result record list.
15. The apparatus of claim 14, wherein the editing process comprises at least one of a new process, a delete process, and a sequence adjustment process.
16. The apparatus of claim 15, the editing process comprising the deletion process; the apparatus further comprises:
and the second determining unit is used for determining the search result record subjected to the deleting process in the search result record list as negative example data, wherein the sample data comprises the negative example data.
17. The apparatus of claim 16, wherein the generating unit is configured to generate positive example data according to the search parameters, the search request, and the remaining search result records that each correspond to;
Wherein, in the search result record list, the search result records other than the search result record subjected to the deletion processing are the remaining search result records; the sample data includes the positive example data.
18. The apparatus of any of claims 10-17, wherein the sample data is used to train and/or optimize a search model.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of claims 1-9.
CN202310108190.8A 2023-02-01 2023-02-01 Sample data generation method and device Pending CN116306964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310108190.8A CN116306964A (en) 2023-02-01 2023-02-01 Sample data generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310108190.8A CN116306964A (en) 2023-02-01 2023-02-01 Sample data generation method and device

Publications (1)

Publication Number Publication Date
CN116306964A true CN116306964A (en) 2023-06-23

Family

ID=86780637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310108190.8A Pending CN116306964A (en) 2023-02-01 2023-02-01 Sample data generation method and device

Country Status (1)

Country Link
CN (1) CN116306964A (en)

Similar Documents

Publication Publication Date Title
US20220318275A1 (en) Search method, electronic device and storage medium
EP4134900A2 (en) Method and apparatus for recommending content, method and apparatus for training ranking model, device, and storage medium
US20230134615A1 (en) Method of processing task, electronic device, and storage medium
CN112765452B (en) Search recommendation method and device and electronic equipment
CN113836314B (en) Knowledge graph construction method, device, equipment and storage medium
CN113806660B (en) Data evaluation method, training device, electronic equipment and storage medium
JP2024507902A (en) Information retrieval methods, devices, electronic devices and storage media
US20220198358A1 (en) Method for generating user interest profile, electronic device and storage medium
JP2023060846A (en) Model determination method, apparatus, electronic device, and memory
WO2022198835A1 (en) Method and apparatus for testing electronic map, and electronic device and storage medium
CN113609100A (en) Data storage method, data query method, data storage device, data query device and electronic equipment
CN113220710A (en) Data query method and device, electronic equipment and storage medium
CN111930891A (en) Retrieval text expansion method based on knowledge graph and related device
EP4080383A1 (en) Method and apparatus for presenting information, electronic device, storage medium, and program product
EP4116889A2 (en) Method and apparatus of processing event data, electronic device, and medium
CN115186738B (en) Model training method, device and storage medium
CN116383340A (en) Information searching method, device, electronic equipment and storage medium
CN116340518A (en) Text association matrix establishment method and device, electronic equipment and storage medium
CN113076395B (en) Semantic model training and search display method, device, equipment and storage medium
CN116306964A (en) Sample data generation method and device
CN111222918A (en) Keyword mining method and device, electronic equipment and storage medium
CN113569144B (en) Method, device, equipment, storage medium and program product for searching promotion content
CN113239296B (en) Method, device, equipment and medium for displaying small program
CN113343090B (en) Method, apparatus, device, medium and product for pushing information
JP7200299B2 (en) METHOD, APPARATUS, ELECTRONIC DEVICE, STORAGE MEDIUM AND PROGRAM FOR OPTIMIZING SEARCH SYSTEM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination