CN116521906A

CN116521906A - Meta description generation method, device, equipment and medium thereof

Info

Publication number: CN116521906A
Application number: CN202310493080.8A
Authority: CN
Inventors: 郭志伟
Original assignee: Guangzhou Shangyan Network Technology Co ltd
Current assignee: Guangzhou Shangyan Network Technology Co ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-08-01
Anticipated expiration: 2043-04-28
Also published as: CN116521906B

Abstract

The application relates to a meta description generation method, a device, equipment and a medium thereof in the technical field of electronic commerce, wherein the method comprises the following steps: acquiring commodity class, commodity title and target keywords of the commodity, and constructing a generation basis text, wherein the target keywords are commodity search keywords which are semantically related to the commodity title and have search results meeting preset conditions; inputting the generated text into a meta description generation model to generate a plurality of candidate meta descriptions correspondingly; determining the matching degree between the candidate meta-descriptions and the commodity titles by adopting a preset information retrieval algorithm; and screening candidate meta-descriptions with the matching degree meeting preset conditions, constructing a recommended meta-description list and pushing the list to a user. The method and the device can generate high-quality meta-descriptions by using commodity categories of commodities, commodity titles and target keywords of the commodities.

Description

Meta description generation method, device, equipment and medium thereof

Technical Field

The present disclosure relates to the field of electronic commerce technologies, and in particular, to a meta description generating method, and a corresponding apparatus, computer device, and computer readable storage medium thereof.

Background

With the continuous development of e-commerce business, merchants can draw attention to buyers as search results of a search engine by constructing meta-descriptions of sales commodities, so as to achieve the function of drainage. In the conventional technology, a deep learning model suitable for text generation is generally adopted to generate corresponding meta description according to a few keywords provided by a merchant or according to text information describing commodities, so that the quality of the meta description is difficult to ensure.

In view of the defects of the traditional technology, the applicant has long been engaged in research in the related field, and is in order to solve the problem in the field of electronic commerce, so a new way is developed.

Disclosure of Invention

It is a primary object of the present application to solve at least one of the above problems and provide a meta description generation method and corresponding apparatus, computer device, computer readable storage medium.

In order to meet the purposes of the application, the application adopts the following technical scheme:

a meta description generation method provided in accordance with one of the objects of the present application, comprising the steps of:

acquiring commodity class, commodity title and target keywords of the commodity, and constructing a generation basis text, wherein the target keywords are commodity search keywords which are semantically related to the commodity title and have search results meeting preset conditions;

Inputting the generated text into a meta description generation model to generate a plurality of candidate meta descriptions correspondingly;

determining the matching degree between the candidate meta-descriptions and the commodity titles by adopting a preset information retrieval algorithm;

and screening candidate meta-descriptions with the matching degree meeting preset conditions, constructing a recommended meta-description list and pushing the list to a user.

In a further embodiment, before obtaining the commodity class, the commodity title and the target keyword of the commodity, the method includes the following steps:

determining the corresponding semantic similarity between the commodity title and each commodity search keyword in a preset commodity search keyword set by adopting a preset text similarity model;

and screening out target keywords of which the semantic similarity exceeds a preset threshold value in the commodity search keyword set.

In a further embodiment, before determining the corresponding semantic similarity between the commodity title and each commodity keyword in the preset commodity keyword set by using a preset text similarity model, the method includes the following steps:

acquiring a plurality of commodity search keywords and user historical behavior data thereof;

determining corresponding search results according to the user history behavior data of each commodity search keyword;

Screening commodity search keywords with search results meeting preset conditions to construct a commodity search keyword set.

In a further embodiment, determining a corresponding search result according to the user history behavior data of each commodity search keyword includes the following steps:

determining corresponding click rate and conversion rate according to the user history behavior data of each commodity search keyword;

and adding the weights corresponding to the click rate and the conversion rate, and calculating the search result.

In a further embodiment, the method for obtaining the commodity category, the commodity title and the target keyword of the commodity and before constructing the generated basis text comprises the following steps:

acquiring commodity class, commodity title and target keywords of a training commodity, constructing a generation basis text as a training sample, and marking meta-description of the training commodity as a supervision tag of the training sample;

inputting the training sample into a meta description generation model, extracting deep semantic information of the training sample, and generating meta description;

and determining a loss value of the meta description by adopting a supervision tag of the training sample, updating the weight of the meta description generation model when the loss value does not reach a preset threshold, and continuously calling other training samples to perform iterative training until the meta description generation model converges.

In a further embodiment, before acquiring the commodity class, the commodity title and the target keyword of the training commodity, constructing the generation basis text as a training sample, and labeling the meta description of the training commodity as a supervision tag of the training sample, the method comprises the following steps:

acquiring commodity titles and meta descriptions of a plurality of commodities, determining the reproduction ratio corresponding to texts in the meta description reproduction commodity titles of each commodity, and screening out primary screening meta descriptions of which the reproduction ratio meets a preset threshold;

identifying entity texts and entity types thereof in the primary screen element description by adopting a preset named entity identification model, wherein the entity types comprise entity types to be detected and entity types not to be detected;

determining semantic similarity between an entity text corresponding to the entity type to be detected in the primary screen element description and a commodity title of the corresponding commodity by adopting a preset text similarity model;

screening out the qualified entity types of which the semantic similarity meets a preset threshold value in the primary screen element description, and calculating corresponding quality scores based on the qualified entity types and the preset sub-item scores corresponding to the undetected entity types in the primary screen element description;

And screening commodities corresponding to the fine screening element descriptions with the quality scores meeting preset conditions as training commodities.

In a further embodiment, acquiring the commodity titles and meta descriptions of a plurality of commodities, determining the reproduction ratio of the meta description of each commodity corresponding to the text in the repeated commodity titles, and before screening out the primary screening meta description that the reproduction ratio meets a preset threshold, including the following steps:

acquiring meta description of a commodity as a training sample, and labeling entity text and entity types in the meta description of the commodity as supervision labels of the training sample;

inputting the training sample into a named entity recognition model to predict entity text and entity type in the training sample;

and determining a loss value corresponding to the predicted entity text and the entity type by adopting the supervision label of the training sample, updating the weight of the named entity recognition model when the loss value does not reach a preset threshold, and continuously calling other training samples to perform iterative training until the named entity recognition model converges.

On the other hand, the meta description generating device provided by adapting to one of the purposes of the application comprises a generating basis construction module, a meta description generating module, a matching degree determining module and a meta description screening module, wherein the generating basis construction module is used for acquiring commodity categories of commodities, commodity titles and target keywords thereof, and constructing a generating basis text, and the target keywords are commodity search keywords which are related to commodity titles semantically and have search results meeting preset conditions; the meta description generation module is used for inputting the generation basis text into a meta description generation model to generate a plurality of corresponding candidate meta descriptions; the matching degree determining module is used for determining matching degrees between the candidate element descriptions and the commodity titles by adopting a preset information retrieval algorithm; and the meta description screening module is used for screening candidate meta descriptions with the matching degree meeting the preset condition, constructing a recommended meta description list and pushing the list to a user.

In a further embodiment, before the generating the basis construction module, the generating comprises: the first semantic similarity determining submodule is used for determining the corresponding semantic similarity between the commodity title and each commodity searching keyword in a preset commodity searching keyword set by adopting a preset text similarity model; and the keyword screening sub-module is used for screening out target keywords of which the semantic similarity exceeds a preset threshold value in the commodity searching keyword set.

In a further embodiment, before the semantic similarity determining submodule, the method includes: the data acquisition sub-module is used for acquiring a plurality of commodity search keywords and historical behavior data of users thereof; the search result determining sub-module is used for determining corresponding search results according to the user historical behavior data of each commodity search keyword; and the keyword set construction sub-module is used for screening commodity search keywords with search results meeting preset conditions to construct a commodity search keyword set.

In a further embodiment, the search performance determination submodule includes: the click rate conversion rate determining unit is used for determining corresponding click rate and conversion rate according to the user history behavior data of each commodity searching keyword; and the search result calculation unit is used for adding the weights corresponding to the click rate and the conversion rate in a matching way, and calculating the search result.

In a further embodiment, before the search success determination submodule, the method includes: the first training preparation sub-module is used for acquiring commodity class, commodity title and target keywords of training commodities, constructing a generated basis text as a training sample, and labeling meta-description of the training commodities as a supervision tag of the training sample; the first feedforward reasoning sub-module is used for inputting the training sample into a meta description generation model, extracting deep semantic information of the training sample and generating meta description; and the first iteration convergence sub-module is used for determining the loss value of the meta-description by adopting the supervision label of the training sample, updating the weight of the meta-description generation model when the loss value does not reach a preset threshold value, and continuously calling other training samples to perform iteration training until the meta-description generation model converges.

In a further embodiment, before the first training preparation sub-module, the method further comprises: the primary element description screening module is used for acquiring the commodity titles and element descriptions of a plurality of commodities, determining the reproduction ratio corresponding to the text in the commodity titles of the element descriptions of each commodity, and screening out primary element descriptions of which the reproduction ratio meets a preset threshold value; the named entity recognition sub-module is used for recognizing entity texts and entity types thereof in the primary screen element description by adopting a preset named entity recognition model, wherein the entity types comprise entity types to be detected and entity types not to be detected; the second semantic similarity determining submodule is used for determining semantic similarity between an entity text corresponding to the entity type to be detected in the primary screen element description and a commodity title of the corresponding commodity by adopting a preset text similarity model; the quality score calculation sub-module is used for screening out qualified entity types of which the semantic similarity meets a preset threshold in the primary screening element description, and calculating corresponding quality scores based on the qualified entity types and the preset sub-item scores corresponding to the undetected entity types in the primary screening element description; and the commodity screening sub-module is used for screening commodities corresponding to the fine screening element description of which the quality scores meet preset conditions to serve as training commodities.

In a further embodiment, before the meta-describing the primary screen module, the meta-describing comprises: the second training preparation sub-module is used for acquiring meta description of the commodity as a training sample and labeling entity text and entity type in the meta description of the commodity as a supervision label of the training sample; the second feedforward reasoning sub-module is used for inputting the training sample into a named entity recognition model to predict entity texts and entity types thereof in the training sample; and the second iteration convergence sub-module is used for determining a loss value corresponding to the predicted entity text and the entity type by adopting the supervision label of the training sample, updating the weight of the named entity recognition model when the loss value does not reach a preset threshold value, and continuously calling other training samples to perform iteration training until the named entity recognition model converges.

In yet another aspect, a computer device is provided, adapted for one of the objects of the present application, comprising a central processor and a memory, said central processor being adapted to invoke the execution of a computer program stored in said memory for performing the steps of the meta description generation method described herein.

In yet another aspect, a computer readable storage medium is provided, adapted for another object of the present application, in the form of computer readable instructions storing a computer program implemented according to the meta description generation method, which computer program, when being invoked by a computer, performs the steps comprised by the method.

The technical solution of the present application has various advantages, including but not limited to the following aspects:

the method comprises the steps of constructing a commodity category of a commodity, a commodity title and a corresponding generation basis text of a target keyword of the commodity, inputting the commodity category, the commodity title and the target keyword into a meta description generation model, generating a plurality of corresponding candidate meta descriptions, determining the matching degree between the candidate meta descriptions and the commodity title by adopting a preset information retrieval algorithm, screening out candidate meta descriptions with the matching degree meeting preset conditions, constructing a recommended meta description list and pushing the recommended meta description list to a user. According to the method and the device, on the one hand, the generation basis text is given in advance, the generation basis text contains target keywords with good search results, candidate meta-descriptions generated by the target keywords are closely related to the target keywords semantically, the user can be guaranteed to search corresponding meta-descriptions by inputting the target keywords or the meta-words of the target keywords, the user can be further attracted to pay attention, the exposure rate is improved, and the user searches the meta-descriptions to have a larger probability of further operation to generate search results, so that the flow is brought. On the other hand, meta descriptions in the recommended meta description list are diversified and highly related to the commodity, so that more choices or references can be provided for merchant users, user experience is improved, and user viscosity is increased.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow diagram of an exemplary embodiment of a meta-description generation method of the present application;

FIG. 2 is a schematic flow chart of determining target keywords matching with a product title in a product search keyword set in an embodiment of the present application;

FIG. 3 is a schematic flow chart of constructing a commodity search keyword set in an embodiment of the present application;

FIG. 4 is a schematic flow chart of determining search results of a commodity search keyword in an embodiment of the present application;

FIG. 5 is a flow chart of a training process for meta-description generation model in an embodiment of the present application;

FIG. 6 is a schematic flow chart of determining training commodities in an embodiment of the present application;

FIG. 7 is a flowchart illustrating a training process of named entity recognition model in an embodiment of the present application;

FIG. 8 is a functional block diagram of a meta description generation device of the present application;

fig. 9 is a schematic structural diagram of a computer device used in the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of illustrating the present application and are not to be construed as limiting the present application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, "client," "terminal device," and "terminal device" are understood by those skilled in the art to include both devices that include only wireless signal receivers without transmitting capabilities and devices that include receiving and transmitting hardware capable of two-way communication over a two-way communication link. Such a device may include: a cellular or other communication device such as a personal computer, tablet, or the like, having a single-line display or a multi-line display or a cellular or other communication device without a multi-line display; a PCS (Personal Communications Service, personal communication system) that may combine voice, data processing, facsimile and/or data communication capabilities; a PDA (Personal Digital Assistant ) that can include a radio frequency receiver, pager, internet/intranet access, web browser, notepad, calendar and/or GPS (Global Positioning System ) receiver; a conventional laptop and/or palmtop computer or other appliance that has and/or includes a radio frequency receiver. As used herein, "client," "terminal device" may be portable, transportable, installed in a vehicle (aeronautical, maritime, and/or land-based), or adapted and/or configured to operate locally and/or in a distributed fashion, at any other location(s) on earth and/or in space. As used herein, a "client," "terminal device," or "terminal device" may also be a communication terminal, an internet terminal, or a music/video playing terminal, for example, a PDA, a MID (Mobile Internet Device ), and/or a mobile phone with music/video playing function, or may also be a device such as a smart tv, a set top box, or the like.

The hardware referred to by the names "server", "client", "service node" and the like in the present application is essentially an electronic device having the performance of a personal computer, and is a hardware device having necessary components disclosed by von neumann's principle, such as a central processing unit (including an arithmetic unit and a controller), a memory, an input device, and an output device, and a computer program is stored in the memory, and the central processing unit calls the program stored in the external memory to run in the memory, executes instructions in the program, and interacts with the input/output device, thereby completing a specific function.

It should be noted that the concept of "server" as referred to in this application is equally applicable to the case of a server farm. The servers should be logically partitioned, physically separate from each other but interface-callable, or integrated into a physical computer or group of computers, according to network deployment principles understood by those skilled in the art. Those skilled in the art will appreciate this variation and should not be construed as limiting the implementation of the network deployment approach of the present application.

One or several technical features of the present application, unless specified in the plain text, may be deployed either on a server to implement access by remotely invoking an online service interface provided by the acquisition server by a client, or directly deployed and run on the client to implement access.

The neural network model cited or possibly cited in the application can be deployed on a remote server and used for implementing remote call on a client, or can be deployed on a client with sufficient equipment capability for direct call unless specified in a clear text, and in some embodiments, when the neural network model runs on the client, the corresponding intelligence can be obtained through migration learning so as to reduce the requirement on the running resources of the hardware of the client and avoid excessively occupying the running resources of the hardware of the client.

The various data referred to in the present application, unless specified in the plain text, may be stored either remotely in a server or in a local terminal device, as long as it is suitable for being invoked by the technical solution of the present application.

Those skilled in the art will appreciate that: although the various methods of the present application are described based on the same concepts so as to be common to each other, the methods may be performed independently, unless otherwise indicated. Similarly, for each of the embodiments disclosed herein, the concepts presented are based on the same inventive concept, and thus, the concepts presented for the same description, and concepts that are merely convenient and appropriately altered although they are different, should be equally understood.

The various embodiments to be disclosed herein, unless the plain text indicates a mutually exclusive relationship with each other, the technical features related to the various embodiments may be cross-combined to flexibly construct a new embodiment, so long as such combination does not depart from the inventive spirit of the present application and can satisfy the needs in the art or solve the deficiencies in the prior art. This variant will be known to the person skilled in the art.

A meta description generating method of the present application may be programmed as a computer program product, deployed to be executed in a client or a server, for example, in an exemplary application scenario of the present application, may be deployed in a server of an e-commerce platform, whereby the method may be executed by accessing an interface that is opened after the computer program product is executed, and performing man-machine interaction with a process of the computer program product through a graphical user interface.

Referring to fig. 1, the meta description generation method of the present application, in an exemplary embodiment thereof, includes the following steps:

step S1100, acquiring commodity categories of commodities, commodity titles and target keywords thereof, and constructing a generation basis text, wherein the target keywords are commodity search keywords which are semantically related to the commodity titles and have search results meeting preset conditions;

The commodity class and the commodity title of the commodity are texts describing the commodity, the attribute, the type, the characteristic and the like of the commodity can be directly and clearly described by a simplified expression, a buyer can quickly and accurately know the commodity and attract the attention of the buyer, the commodity class and the commodity title are generally stored in a commodity database in association with corresponding commodities, the commodity class and the commodity title can be acquired in the commodity database by accessing the corresponding data acquisition interface, and the data acquisition interface can be flexibly realized by a person skilled in the art.

The commodity search keywords can be texts input during user search from an e-commerce platform or texts input during user search from a search engine, whether the user search is performed or not can be monitored by the e-commerce platform or the search engine, whether further operations such as clicking on the corresponding commodity pages, purchasing the commodity, praying the commodity, adding the commodity into a shopping cart, collecting the commodity, sharing the commodity and the like are performed, when the user performs the further operations, the corresponding quantized search results can be determined, the search results can be embodied as clicking rate and/or conversion rate, whether the search results reach standards can be correspondingly defined according to the search results based on whether the search results exceed a preset threshold value, the preset threshold value can be set by a person skilled in the art according to service requirements, when the exceeding the preset threshold value, the search results are indicated to reach standards, and when the exceeding the preset threshold value, the corresponding commodity search keywords are used by the user for searching, the further operations can be attracted to the user and are usually used by the user for searching, and flow is brought to the commodity sales; when the search results do not exceed the preset values, the corresponding commodity search keywords are not attracted to the user for further operation when the user searches, and are not usually used for searching, so that the flow cannot be brought to selling commodities. Further, determining the commodity search keywords with the qualified search results related to the commodity title semantics as the target keywords, specifically, calculating the semantic similarity between the commodity search keywords with the qualified search results and the feature representations of the semantics corresponding to the commodity title, and determining that the corresponding commodity search keywords are related to the commodity title semantics as the target keywords when the semantic similarity is larger than a preset threshold, wherein the preset threshold can be set by a person skilled in the art according to the service requirement, and the feature representations can be flexibly realized by adopting a deep learning model which is trained in advance to be converged and suitable for extracting text features and outputting corresponding vectors.

And splicing the commodity class, the commodity title and the target keyword to construct the generation basis text.

Step 1200, inputting the generated text to a meta description generation model to generate a plurality of candidate meta descriptions;

the pre-set meta description generation model is an end-to-end language model, and the selection model can be a GPT series model, a Bert model, an encoding-decoding model, a transducer model and the like, and can be realized by one of skill in the art according to the need. The meta description generation model is trained to a convergence state in advance, the specific training process is further disclosed by the follow-up part of embodiments according to the capability of generating corresponding meta description according to the generated text, and the step is temporarily omitted.

In an embodiment, the preset title generation model adopts a GPT-3 model, the generation is used as an input sequence of the model, the input sequence is segmented to obtain a corresponding segmented sequence, then the segmented sequence is input to a coding end in a transducer model, each segmented word in the segmented sequence is coded by stacking a plurality of multi-head self-attention layers and full-connection layers, specifically, each segmented word is subjected to multi-head attention calculation when passing through the multi-head attention layer, thus self-attention weighting is performed on different dimensions of the segmented word to obtain a corresponding weighted vector representation, after passing through the full-connection layer, a coding vector representation corresponding to the segmented word is obtained, the coding representation corresponding to each segmented word in the segmented sequence is input to a decoding end in the transducer model, the coding representation corresponding to each segmented word is decoded to generate a corresponding meta-description as a candidate meta-description, specifically, the generated word and the current position and the coding representation corresponding to the segmented word are calculated to generate a maximum probability of generating a meta-description of each word, and the generated meta-description probability of each word can be sequentially generated according to the generated word and the generated meta-description of each time.

According to the method, candidate meta-descriptions are generated in a diversified mode, the meta-description generation model is called for multiple times, a plurality of corresponding candidate meta-descriptions are generated according to the generation according to the text, rich meta-descriptions are ensured to be generated, commodities are described from different angles and different language styles, and better meta-descriptions can be screened out on the basis.

Step S1300, determining the matching degree between a plurality of candidate element descriptions and the commodity titles by adopting a preset information retrieval algorithm;

in one embodiment, the information retrieval algorithm may be a keyword matching-based retrieval algorithm, a vector space model-based retrieval algorithm, a machine learning-based retrieval algorithm, or a probability retrieval model-based algorithm, which can be implemented by a person skilled in the art as required, for example, the information retrieval algorithm is implemented by using a probability retrieval model algorithm BM25, the degree of correlation between a plurality of candidate meta descriptions and the commodity title is calculated as the matching degree, specifically, each candidate meta description and the commodity title is segmented to obtain a corresponding word segmentation sequence, the word segments appearing in the word segmentation sequence corresponding to the commodity title in the candidate meta description are matched, the word frequency of the word segment appearing in the candidate meta description is calculated, and the IDF value of each word segment is calculated by an exemplary formula:

Wherein: n represents the number of candidate meta-descriptions, and N represents the number of candidate meta-descriptions containing the word. Calculating a corresponding BM25 weight based on the IDF value of each word, and the exemplary formula is as follows:

wherein: BM25 weight representing the term, tf represents the word frequency at which the term occurs in the candidate meta-description, DL represents the length of the candidate meta-description, avgDL represents the average candidate meta-description length, k ₁ And b are both adjustable parameters, which can be set as desired by those skilled in the art based on a priori knowledge or experimental data.

And S1400, screening candidate meta-descriptions with the matching degree meeting preset conditions, constructing a recommended meta-description list and pushing the list to a user.

And sorting the candidate meta-descriptions according to the sequence of the matching degree corresponding to the candidate meta-descriptions from high to low, screening out the candidate meta-descriptions with the top sorting as recommended meta-descriptions, and summarizing all the recommended meta-descriptions to construct a recommended meta-description list to be pushed to a user so that the user can select the recommended meta-description required by the user or edit the meta-description by referring to the recommended meta-description in the recommended meta-description list.

As can be appreciated from the exemplary embodiments of the present application, the technical solution of the present application has various advantages, including but not limited to the following aspects:

In a further embodiment, before the step S1100, the step of obtaining the commodity class, the commodity title and the target keyword of the commodity includes the following steps:

step S1010, determining the corresponding semantic similarity between the commodity title and each commodity search keyword in a preset commodity search keyword set by adopting a preset text similarity model;

the Text similarity model can be a double-tower model, and comprises two processing branches, each processing branch comprises an identical network structure, each processing branch comprises a Text feature representation layer, then outputs of the two processing branches are respectively connected into a linear layer to calculate semantic similarity, and the Text feature extraction layer can be realized by using Text fransfomer, roBERTa, XLM-RoBERTa, MPNet, BERT and the like as required by one skilled in the art. The linear layer may be MLP (multi-layer perceptron), FC (fully connected layer), etc., and may be implemented as desired by those skilled in the art. This will not be described in detail, as the training process of Roberta+CRF, text Transfomer, roberta, XLM-RoBERTa, MPNet, BERT is known in the art.

And simultaneously, the other processing branch extracts the deep semantic features of the commodity searching keywords in the text data pair to obtain corresponding vectorized feature representations, then a linear layer calculates the vector distance between the commodity title and the feature representations corresponding to the commodity searching keywords as semantic similarity, and the vector distance calculation can be realized by adopting any available vector distance algorithm including but not limited to any one of the following: cosine similarity algorithm, vector dot product algorithm, manhattan distance, euclidean distance algorithm, pearson correlation coefficient, etc.

The commodity search keyword set is constructed by acquiring commodity search keywords which are input by a user from an e-commerce platform and have search results meeting preset conditions, and/or acquiring commodity search keywords which are input by a user from a search engine and have search results meeting preset conditions, and the specific implementation is further disclosed by the follow-up part of embodiments.

And step 1020, screening out target keywords of which the semantic similarity exceeds a preset threshold value in the commodity search keyword set.

And screening out commodity search keywords, the semantic similarity of which exceeds a preset threshold, from the commodity search keyword set, wherein the target keywords are related to the commodity title semantically, and the preset threshold can be set by a person skilled in the art as required.

In the embodiment, the text similarity model is adopted to determine the target keywords related to the commodity title semantics in the commodity search keyword set, so that the implementation is efficient, the accuracy of the semantic similarity can be ensured, and the semantic similarity degree between two texts can be accurately represented.

The further embodiment is characterized in that before determining the semantic similarity between the commodity title and each commodity keyword in the preset commodity keyword set by using a preset text similarity model, step S1010 includes the following steps:

Step S1000, acquiring a plurality of commodity search keywords and user history behavior data thereof;

the text input during the search of the user from the e-commerce platform, namely the commodity search keyword, and/or the text input during the search of the user from the search engine, namely the commodity search keyword, can be obtained, in addition, whether the user from the e-commerce platform or the search engine performs further operations such as clicking to reach a corresponding commodity page, purchasing the commodity, praying the commodity, adding the commodity into a shopping cart, collecting the commodity, sharing the commodity and the like can be monitored, and corresponding user historical behavior data can be generated.

Step S1001, determining a corresponding search result according to the user history behavior data of each commodity search keyword;

according to the commodity search keywords used for searching by the user, after the corresponding search results are searched, calculating the ratio of the times of clicking actions of the user to the times of no actions of the user, obtaining the corresponding clicking rate, and calculating the sum of the times of adding any one or more of shopping carts, purchasing, collecting, praying and sharing by the user and the sum of the times of no actions of the user, thereby obtaining the corresponding conversion rate.

Because the click rate and the conversion rate have different reference values for the search results, in order to reasonably measure the acting force of the click rate and the conversion rate to ensure the rationality and the accuracy of the search results, the click rate and the conversion rate corresponding to each commodity search keyword are multiplied by matched weights respectively and then added, the search results are calculated, the weights respectively matched with the click rate and the conversion rate are added to be 1, and the method can be flexibly set by a person skilled in the art, the recommended click rate corresponds to 0.4, and the conversion rate corresponds to 0.6.

Step S1002, screening out commodity search keywords with search results meeting preset conditions to construct a commodity search keyword set.

It will be understood that the higher the search success rate is, the more the commodity search keywords are focused on by the user and are generally used for searching by the user, the commodity search keywords are ranked according to the order of the higher to lower search success rate, a plurality of commodity search keywords with the top ranking are screened out, the specific number can be set by a person skilled in the art as required, and all the screened commodity search keywords are collected to construct a commodity search keyword set.

In this embodiment, the search results corresponding to the commodity search keywords are determined, so that the commodity search keywords with higher search results are screened out to construct a commodity search keyword set, and the commodity search keywords in the commodity search keyword set can attract the attention of the user and are usually used for searching by the user.

In a further embodiment, step S1001, determining a corresponding search result according to the user history behavior data of each of the commodity search keywords, includes the following steps:

step S2001, determining a corresponding click rate and conversion rate according to the user history behavior data of each commodity search keyword;

Step S3001, adding the weights corresponding to the click rate and the conversion rate, and calculating the search result.

And multiplying the click rate and the conversion rate corresponding to each commodity search keyword by matched weights respectively, adding the products, and calculating the search result, wherein the addition of the weights matched with the click rate and the conversion rate is 1, the products can be flexibly set by a person skilled in the art, the recommended click rate corresponds to 0.4, and the conversion rate corresponds to 0.6.

In this embodiment, the corresponding click rate and conversion rate are determined according to the user history behavior data of the commodity search keyword, and the weights corresponding to the matching of the click rate and conversion rate are added to calculate the search result, so that the rationality and accuracy of the search result can be ensured, that is, the possibility that the commodity search keyword is used for searching and then can attract the attention of the user is reasonably and accurately represented and is generally used for searching by the user.

In a further embodiment, step S1100, before obtaining the commodity class, the commodity title and the target keyword of the commodity and constructing the generated text, includes the following steps:

step S2100, acquiring commodity class, commodity title and target keywords of a training commodity, constructing a generation basis text as a training sample, and marking meta description of the training commodity as a supervision tag of the training sample;

and splicing commodity categories of training commodities, commodity titles and target keywords of the commodity titles to construct corresponding generation basis texts as training samples, and determining the target keywords corresponding to the commodity titles according to steps S1010-S1020.

And the training commodity is selected from commodity titles and meta descriptions of a plurality of commodities acquired in advance through screening, wherein the meta descriptions are corresponding commodities with high correlation with the commodity titles and high quality. The specific implementation further reveals with reference to the following examples that this step is temporarily kept off.

Step S2110, inputting the training sample into a meta description generation model, extracting deep semantic information of the training sample, and generating meta description;

in an embodiment, the preset title generation model adopts a GPT-3 model, the training sample is used as an input sequence of the model, the input sequence is segmented to obtain a corresponding segmented sequence, then the segmented sequence is input to a coding end in a transform model, each segmented word in the segmented sequence is coded by stacking multiple multi-head self-attention layers and full-connection layers, specifically, each segmented word is subjected to multi-head attention calculation when passing through the multi-head attention layers, thus self-attention weighting is performed on different dimensions of the segmented word, corresponding deep semantic information is extracted, corresponding weighted vector representation is obtained, after passing through the full-connection layers, coding vector representation corresponding to the segmented word is obtained, coding representation corresponding to each segmented word in the segmented sequence is input to a decoding end in the transform model, the coding representation corresponding to each segmented word is decoded, corresponding meta-description is generated as candidate meta-description, specifically, the generated meta-description is sequentially generated according to the generated word and the current position and the calculated representation corresponding to the segmented word, the highest probability is generated, and the generated meta-description probability of each meta-description is sequentially generated according to the generated word and the maximum probability is calculated in each time.

Step S2120, a supervision label of the training sample is adopted to determine a loss value of the meta description, when the loss value does not reach a preset threshold value, weight updating is carried out on the meta description generation model, and other training samples are continuously called to carry out iterative training until the meta description generation model converges.

Invoking a preset cross entropy loss function, wherein the cross entropy loss function can be flexibly set according to priori knowledge or experimental experience by a person skilled in the art, calculating a cross entropy loss value of the meta description based on a supervision label according to the training sample, and when the cross entropy loss value reaches a preset threshold value, indicating that the meta description generation model is trained to a convergence state, so that model training can be terminated; when the cross entropy loss value does not reach the preset threshold, the model is indicated to be not converged, gradient update is carried out on the model according to the cross entropy loss value, the model is further approximated to convergence by correcting weight parameters of each link of the model through back propagation, then other training samples are continuously called to carry out iterative training on the model until the model is trained to a convergence state, and the preset threshold can be set by a person skilled in the art according to requirements.

In this embodiment, the meta description of the training commodity is adopted to monitor the training title generation model until convergence, so that the meta description generation model learns the capability of generating the corresponding meta description according to the generated text, and the meta description of the training commodity is highly relevant to the commodity title and has high quality, so that the generation of the commodity title with high quality can be ensured.

In a further embodiment, before step S2100, obtaining the commodity class, the commodity title and the target keyword of the training commodity, and constructing the generation basis text as the training sample, and labeling the meta description of the training commodity as the supervision tag of the training sample, the method includes the following steps:

step S3100, acquiring commodity titles and meta descriptions of a plurality of commodities, determining the meta description of each commodity to repeat the reproduction ratio corresponding to the text in the commodity title, and screening out the primary screening meta description that the reproduction ratio meets a preset threshold;

the method comprises the steps of carrying out word segmentation on a commodity title and meta description of each commodity to obtain a corresponding word segmentation sequence, calculating the number of words in the word segmentation sequence of the meta description, which are appeared in the word segmentation sequence of the commodity title, dividing the number of words in the word segmentation sequence of the commodity title by the total number of words in the word segmentation sequence of the commodity title to obtain the reproduction ratio corresponding to the meta description of each commodity, screening out the meta description with the reproduction ratio exceeding a preset threshold value as a primary screen element description, wherein the preset threshold value can be set by a person skilled in the art according to requirements.

Step S3110, adopting a preset named entity recognition model to recognize entity texts and entity types thereof in the primary screen element description, wherein the entity types comprise entity types to be detected and entity types not to be detected;

the named entity recognition model is suitable for a named entity recognition task, the specific model selection can be RoBERTa+CRF, biLSTM+CRF, IDCNN+CRF, bert+BiLSTM+CRF, FLAT and the like, one type selection can be selected by a person skilled in the art according to the need, the named entity recognition model is trained in advance until convergence, the capability of recognizing entity texts and entity types in meta description is obtained, the specific training process is further disclosed by the follow-up part of embodiments, and the step is temporarily omitted.

In one embodiment, a bert+BiLSTM+CRF model is adopted as a named entity recognition model, the primary screen element description is used as an input of the named entity recognition model, a Bert is used as an embedding layer to extract deep semantic information of the primary screen element description, a corresponding text feature sequence is output and input to the BiLSTM layer, the text feature sequence comprises a feature vector corresponding to each single word or word in the vectorization representation primary screen element description, the BiLSTM layer outputs a score corresponding to each single word or word in the primary screen element description and belongs to each category as an input of the CRF layer, the CRF layer outputs a category sequence, the category sequence comprises a category corresponding to each single word or word in the primary screen element description, entity text and entity types thereof in the primary screen element description can be correspondingly determined according to the category sequence, the entity types are subject, commodity self characteristics, commodity expansion characteristics and effects are used as entity types to be detected, and the entity types are called out as undetectable entity types.

Step S3120, determining semantic similarity between an entity text corresponding to the entity type to be detected in the primary screen element description and a commodity title of the corresponding commodity by adopting a preset text similarity model;

and simultaneously, the other processing branch extracts the deep semantic features of the commodity title in the text data pair to obtain the corresponding vectorized feature representation, and then the vector distance between the feature representations corresponding to the entity text and the commodity title is calculated by a linear layer to serve as the semantic similarity, wherein the vector distance calculation can be realized by adopting any available vector distance algorithm including but not limited to any one of the following: cosine similarity algorithm, vector dot product algorithm, manhattan distance, euclidean distance algorithm, pearson correlation coefficient, etc.

Step 3130, screening out qualified entity types of which the semantic similarity meets a preset threshold in the primary screen element description, and calculating corresponding quality scores based on the qualified entity types and the preset sub-item scores corresponding to the undetected entity types in the primary screen element description;

The pre-set sub-item score corresponding to each to-be-detected entity type and the non-detected entity type may be pre-set, and the pre-set sub-item score corresponding to each to-be-detected entity type and the non-detected entity type may be set by a person skilled in the art as required, for example, the pre-set sub-item score corresponding to each to-be-detected entity type and the non-detected entity type may be set as a theme according to the quantification of the pre-set sub-item score, where the pre-set sub-item score represents the quality of the pre-screen description: 3. the commodity self characteristics are 3, the commodity expansion characteristics are 2, the effect language is 1, and the call is ended: 1.

and screening out that the entity type to be detected, to which the entity text with the semantic similarity exceeding a preset threshold value in the primary screen element description belongs, is a qualified entity type, wherein the preset threshold value can be set by a person skilled in the art according to requirements.

And calculating the sum of the scores of the preset sub-items corresponding to the qualified entity type and the undetected entity type in the primary screening element description, and calculating the corresponding quality score.

And S3140, screening out commodities corresponding to the fine screening element descriptions with the quality scores meeting preset conditions as training commodities.

In one embodiment, the primary screen element descriptions are ranked according to the order of the quality scores of the primary screen element descriptions from high to low, the primary screen element description with the top ranking is screened out to be used as a fine screen element description, and commodities corresponding to the fine screen element description are used as training commodities.

In another embodiment, the quality score exceeding a preset threshold is selected as a fine screening element description, and the commodity corresponding to the fine screening element description is used as a training commodity.

In this embodiment, by determining the recurrence rate of the meta description of the commodity and repeating the text corresponding to the commodity title, preliminarily screening out the primary screen meta description with higher recurrence rate, further, identifying the entity text and the entity type thereof in the primary screen meta description, determining the qualified entity type corresponding to the entity text similar to the semantics of the commodity title, calculating the corresponding quality score according to the qualified entity type and the preset sub item score corresponding to the undetected entity type in the primary screen meta description, precisely screening out the commodity corresponding to the fine screen meta description with higher quality score as the training commodity, it can be understood that the primary screen and the secondary screen can ensure the execution screening efficiency, and can precisely screen out the meta description with high relevance and high quality to the commodity, on the basis, the meta description of the training commodity is used as a supervised training meta description generation model, and can ensure that the model generates the meta description with high quality.

In a further embodiment, step S3100, obtaining the product titles and meta descriptions of a plurality of products, determining the reproduction ratio corresponding to the text in the meta description reproduction commodity title of each product, and before screening out the primary screen meta description that the reproduction ratio satisfies the preset threshold, includes the following steps:

Step S3000, acquiring meta description of the commodity as a training sample, and marking entity text and entity type in the meta description of the commodity as a supervision tag of the training sample;

and labeling the corresponding entity text by adopting any one of BIO, BIOES, BMES labeling methods according to each entity type, wherein the entity type comprises a theme, commodity self characteristics, commodity expansion characteristics, an effect language and an end call language, and in order to facilitate understanding of the relationship between the entity text and the entity type to which the entity text belongs, the entity text is exemplified by:

the meta description text is: "2023 New summer lazy shoes with thick soles, brand TELEIXI-TeLexi, make you feel comfortable stepping on feet and not tired on walking. The slippers are designed up to date, matched with the pedal feeling, fashionable and wild, and worn as desired. The sole is thick, lazy people type wears fast to take off, convenient comfortable. Once you wear the pair of shoes, you can get eye-catching, which becomes a fashion focus in the eyes of the other people. Is suitable for wearing in different occasions, and is a style which is not missed in summer "

Wherein:

the entity type is a theme, and the corresponding entity text is: 2023 New summer lazy shoes with thick soles, brand TELEIXI-Telesi

The entity type is the characteristic of the commodity, and the corresponding entity text is: thick sole and lazy quick putting on and taking off

The entity type is commodity extension attribute, and the corresponding entity text is: comfortable, fashionable wild board, pedal-like shape, and suitable for wearing in different occasions

The entity type is an effect language, and the corresponding entity text is: the user can feel comfortable when stepping on the shoes, walking is not tired, and once the user wears the shoes, the user can look attractive, and the shoes become fashionable focus in other eyes.

The entity type is an ending call, and the corresponding entity text is: is a pattern which you cannot miss in summer.

Step S3010, inputting the training sample into a named entity recognition model to predict entity text and entity types in the training sample;

in one embodiment, a bert+BiLSTM+CRF model is adopted as a named entity recognition model, the training sample is used as an input of the named entity recognition model, the Bert is used as an enabling layer to extract deep semantic information of the sample, a corresponding text feature sequence is output and input to the BiLSTM layer, the text feature sequence comprises a feature vector corresponding to each single word or word in vectorized representation meta-description, the BiLSTM layer outputs a score corresponding to each single word or word in the meta-description and belonging to each category as an input of the CRF layer, and the CRF layer outputs a category sequence comprising a category corresponding to each single word or word in the meta-description, and the entity text and the entity type thereof in the meta-description can be determined correspondingly according to the category sequence. The category is obtained by adopting any one of BIO, BIOES, BMES labeling methods based on the entity type.

Step S3020, determining a loss value corresponding to the predicted entity text and the entity type by using the supervision label of the training sample, and when the loss value does not reach the preset threshold, updating the weight of the named entity recognition model, and continuously calling other training samples to perform iterative training until the named entity recognition model converges.

Invoking a preset cross entropy loss function, wherein the cross entropy loss function can be flexibly set by a person skilled in the art according to priori knowledge or experimental experience, calculating the cross entropy loss value of the predicted entity text and entity type thereof based on the supervision label according to the training sample, and when the cross entropy loss value reaches a preset threshold value, indicating that the named entity recognition model is trained to a convergence state, so that model training can be stopped; when the cross entropy loss value does not reach the preset threshold, the model is indicated to be not converged, gradient update is carried out on the model according to the cross entropy loss value, the model is further approximated to convergence by correcting weight parameters of each link of the model through back propagation, then other training samples are continuously called to carry out iterative training on the model until the model is trained to a convergence state, and the preset threshold can be set by a person skilled in the art according to requirements.

In this embodiment, the named entity recognition model is supervised and trained by using the entity text in the meta description and the entity type to which the entity text belongs until convergence, so that the named entity recognition model learns the capability of recognizing the entity text in the meta description and the entity type thereof, and the accuracy of recognition can be ensured.

Referring to fig. 8, a meta description generating device provided for adapting to one of the purposes of the present application is a functional implementation of the meta description generating method of the present application, where the device includes a generating basis constructing module 1100, a meta description generating module 1200, a matching degree determining module 1300, and a meta description screening module 1400, where the generating basis constructing module 1100 is configured to obtain a commodity class of a commodity, a commodity title and a target keyword thereof, and the target keyword is a commodity search keyword related to the commodity title semantically and the search result meets a preset condition, and generate a generating basis text; the meta description generation module 1200 is configured to input the generated text to a meta description generation model to generate a plurality of candidate meta descriptions; the matching degree determining module 1300 is configured to determine matching degrees between the candidate meta descriptions and the commodity titles by using a preset information retrieval algorithm; the meta description filtering module 1400 is configured to filter out candidate meta descriptions with matching degree satisfying a preset condition, and construct a recommended meta description list to be pushed to the user.

In a further embodiment, before the generating the building block 1100, the generating includes: the first semantic similarity determining submodule is used for determining the corresponding semantic similarity between the commodity title and each commodity searching keyword in a preset commodity searching keyword set by adopting a preset text similarity model; and the keyword screening sub-module is used for screening out target keywords of which the semantic similarity exceeds a preset threshold value in the commodity searching keyword set.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. As shown in fig. 9, the internal structure of the computer device is schematically shown. The computer device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store a control information sequence, and the computer readable instructions when executed by a processor can enable the processor to realize a meta description generation method. The processor of the computer device is used to provide computing and control capabilities, supporting the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform the meta description generation method of the present application. The network interface of the computer device is for communicating with a terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

The processor in this embodiment is configured to execute specific functions of each module and its sub-module in fig. 8, and the memory stores program codes and various data required for executing the above modules or sub-modules. The network interface is used for data transmission between the user terminal or the server. The memory in this embodiment stores program codes and data required for executing all modules/sub-modules in the meta description generation device of the present application, and the server can call the program codes and data of the server to execute the functions of all sub-modules.

The present application also provides a storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the meta description generation method of any of the embodiments of the present application.

Those skilled in the art will appreciate that implementing all or part of the above-described methods of embodiments of the present application may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

In summary, the present application can generate high-quality meta-descriptions using commodity categories, commodity titles, and target keywords of commodities, where the target keywords have good search results, so that it is ensured that good drainage results can be obtained when the corresponding meta-descriptions are used as search results.

Those of skill in the art will appreciate that the various operations, methods, steps in the flow, actions, schemes, and alternatives discussed in the present application may be alternated, altered, combined, or eliminated. Further, other steps, means, or steps in a process having various operations, methods, or procedures discussed in this application may be alternated, altered, rearranged, split, combined, or eliminated. Further, steps, measures, schemes in the prior art with various operations, methods, flows disclosed in the present application may also be alternated, altered, rearranged, decomposed, combined, or deleted.

The foregoing is only a partial embodiment of the present application, and it should be noted that, for a person skilled in the art, several improvements and modifications can be made without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A meta description generation method, comprising the steps of:

2. The meta description generation method according to claim 1, wherein before acquiring the commodity category of the commodity, the commodity title and the target keyword thereof, comprising the steps of:

3. The meta description generation method according to claim 1, wherein before determining a semantic similarity corresponding to each commodity keyword in the set of preset commodity keywords by using a preset text similarity model, the meta description generation method comprises the following steps:

4. A meta description generation method according to claim 3, wherein the corresponding search result is determined based on user history behavior data of each of the commodity search keywords, comprising the steps of:

5. The meta description generation method according to claim 1, wherein the steps of, before obtaining commodity category of commodity, commodity title and target keyword thereof, constructing the generated basis text, are as follows:

6. The meta description generation method according to claim 5, wherein before obtaining a commodity class of a training commodity, a commodity title and a target keyword thereof, constructing a generation basis text as a training sample, and labeling a meta description of the training commodity as a supervision tag of the training sample, comprising the steps of:

7. The meta description generation method according to claim 6, wherein obtaining commodity titles and meta descriptions of a plurality of commodities, determining a reproduction ratio corresponding to a text in the reproduction commodity title of each commodity, and before screening out a primary screen meta description that the reproduction ratio satisfies a preset threshold, comprising the steps of:

8. A meta description generation apparatus, comprising:

the generation basis construction module is used for acquiring commodity categories of commodities, commodity titles and target keywords thereof and constructing generation basis texts, wherein the target keywords are commodity search keywords which are semantically related to the commodity titles and have search results meeting preset conditions;

the meta description generation module is used for inputting the generation basis text into a meta description generation model to generate a plurality of corresponding candidate meta descriptions;

the matching degree determining module is used for determining matching degrees between the candidate element descriptions and the commodity titles by adopting a preset information retrieval algorithm;

and the meta description screening module is used for screening candidate meta descriptions with the matching degree meeting the preset condition, constructing a recommended meta description list and pushing the list to a user.

9. A computer device comprising a central processor and a memory, characterized in that the central processor is arranged to invoke a computer program stored in the memory for performing the steps of the method according to any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that it stores in the form of computer-readable instructions a computer program implemented according to the method of any one of claims 1 to 7, which, when invoked by a computer, performs the steps comprised by the corresponding method.