CN115470322A - Keyword generation system and method based on artificial intelligence - Google Patents

Keyword generation system and method based on artificial intelligence Download PDF

Info

Publication number
CN115470322A
CN115470322A CN202211294577.9A CN202211294577A CN115470322A CN 115470322 A CN115470322 A CN 115470322A CN 202211294577 A CN202211294577 A CN 202211294577A CN 115470322 A CN115470322 A CN 115470322A
Authority
CN
China
Prior art keywords
data
similarity
commodity
value
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211294577.9A
Other languages
Chinese (zh)
Other versions
CN115470322B (en
Inventor
张飞
周南
刘奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Kuaiyun Technology Co ltd
Original Assignee
Shenzhen Kuaiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Kuaiyun Technology Co ltd filed Critical Shenzhen Kuaiyun Technology Co ltd
Priority to CN202211294577.9A priority Critical patent/CN115470322B/en
Publication of CN115470322A publication Critical patent/CN115470322A/en
Application granted granted Critical
Publication of CN115470322B publication Critical patent/CN115470322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a keyword generation system and method based on artificial intelligence, and the method comprises the following steps: acquiring commodity description data, and extracting a first search term from the commodity description data; acquiring potential competitive product data of the commodity according to the first search word; processing the potential competitive product data by using an image processing algorithm, and filtering out data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data; extracting bidding subject data of the competitive products from the data of the competitive products; extracting core commodity words from the competitive bidding data; selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set; and generating keywords corresponding to the commodities by combining keyword generation rules according to the first core commodity words. Through this scheme, can gather race article data, market data and the automatic commodity keyword of editing automatically intelligently, reduce manual operation in a large number, promote the efficiency that generates the commodity file.

Description

Keyword generation system and method based on artificial intelligence
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a keyword generation system and method based on artificial intelligence.
Background
With the rapid development of network technology, electronic commerce technology is also greatly developed, merchants can often use an e-commerce platform to promote own commodities, advertisement keywords are core parameters of advertisement putting services provided by the e-commerce platform for the merchants, the merchants set relevant advertisement keywords and putting strategies for the commodities, and the e-commerce platform can display the commodities to customers who search the keywords with certain strategies. In the advertisement putting process, merchants expect to be capable of generating advertisement keywords with strong pertinence, so that customers can obtain corresponding matched commodities when searching through the keywords, and the advertisement putting effect is improved.
However, in the current general method for determining advertisement keywords by merchants, the keywords of related products are labeled manually, but with the increase of types of commodities, the workload for acquiring the advertisement keywords is increased, the generation efficiency of the keywords is reduced by manual labeling, and meanwhile, the keywords are only labeled from the perspective of the commodities of the merchants, so that more delivery scenes cannot be matched, and the accuracy of the keywords is reduced.
Disclosure of Invention
The invention provides a keyword generation system and a method based on artificial intelligence, and through the scheme of the invention, the competitive product data and the market data can be automatically and intelligently acquired, the commodity keywords can be automatically edited, manual operation is greatly reduced, and the efficiency of generating the commodity copy is improved.
In view of the above, an aspect of the present invention provides an artificial intelligence-based keyword generation system, including: the device comprises an extraction module, a data processing module and a generation module;
the extraction module is configured to:
acquiring commodity description data, and extracting a first search word from the commodity description data;
acquiring potential competitive product data of the commodity according to the first search word;
the data processing module configured to:
processing the potential competitive product data by using an image processing algorithm, and filtering out data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data;
extracting bidding subject data of the competitive products from the data of the competitive products;
extracting core commodity words from the competitive bidding data;
selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set;
the generation module is configured to: and generating keywords corresponding to the commodity according to the first core commodity word and by combining with a keyword generation rule.
Optionally, in the step of processing the potential bid data by using an image processing algorithm, and filtering out data of bids with similarity lower than a preset threshold to obtain the bid data, the data processing module is specifically configured to:
inputting the potential competitive product data, and recording a similarity identification value I as 0;
judging whether a first similarity value A1 of the potential competitive product data is larger than a first threshold value by using a first similarity judgment model;
if the first similarity value A1 is larger than the first threshold value, judging whether a second similarity value A2 of the potential competition product data is smaller than a second threshold value by using a second similarity judgment model, and judging whether a third similarity value A3 of the potential competition product data is smaller than a third threshold value by using a third similarity judgment model;
if the second similarity value A2 is smaller than the second threshold value or the third similarity value A3 is smaller than the third threshold value, adding 1 to the similarity identification value I, and calculating a first similarity S1 by using a first similarity calculation method;
if the second similarity value A2 is not less than the second threshold or the third similarity value A3 is not less than the third threshold, calculating the first similarity S1 by using the first similarity calculation method;
the first similarity calculation method includes: the first similarity S1= A1 × first similarity value A1+ A2 × second similarity value A2+ A3 × third similarity value A3+ b1 × similarity value I, wherein A1, A2, A3, b1 are all weight coefficients greater than 0 and A1+ A2+ A3+ b1=1
If the first similarity value A1 is not larger than the first threshold value, processing image data in the potential competitive product data by using an image processing algorithm to obtain potential competitive product image data;
judging whether a fourth similarity value A4 of the potential competitive product image data is smaller than a fourth threshold value by using a fourth similarity judgment model, and judging whether a fifth similarity value A5 of the potential competitive product image data is smaller than a fifth threshold value by using a fifth similarity judgment model;
if the fourth similarity value A4 is not less than the fourth threshold or the fifth similarity value A5 is not less than the fifth threshold, adding 1 to the similarity identification value I, and calculating a second similarity S2 by using a second similarity calculation method;
if the fourth similarity value A4 is smaller than the fourth threshold or the fifth similarity value A5 is smaller than the fifth threshold, calculating the second similarity S1 by using the second similarity calculation method;
the second similarity calculation method comprises the following steps: a second similarity S2= a6 × first similarity value A1+ A4 × fourth similarity value A4+ A5 × fifth similarity value A5+ b2 × similarity value I, where A4, A5, a6, b2 are all weighting coefficients greater than 0 and A4+ A5+ a6+ b2=1;
judging whether the first similarity S1 or the second similarity S2 is not smaller than the preset threshold value, if so, marking the potential competitive product data as similar, and if not, marking the potential competitive product data as dissimilar;
extracting all data marked as similar in the potential bid data as the bid data.
Optionally, in the step of obtaining the commodity description data and extracting the first search term from the commodity description data, the extracting module is specifically configured to:
the method comprises the following steps: classifying the commodity description data according to commodity names and commodity attributes, and performing text preprocessing on the classified commodity description data to generate a candidate search word sequence;
step two: extracting characteristic data of the candidate search word sequence, and labeling the characteristic data to obtain a labeled sample set and a non-labeled sample set;
step three: taking the labeled sample set as a training set, and training a search term classification model by utilizing a neural network;
step four: carrying out classification prediction on candidate search terms in the unlabeled sample set by using the trained search term classification model, and calculating the matching degree of each unlabeled sample;
step five: selecting the corresponding unlabeled sample with the matching degree exceeding a preset matching degree value, adding the unlabeled sample into the training set, and retraining the search term classification model;
step six: repeating the fourth step to the fifth step until the proportion of the matching degree of each unlabeled sample, which is higher than the preset matching degree value, exceeds a preset proportion, so as to obtain a final search term classification model;
step seven: and inputting the characteristic data of the commodity description data into the final search term classification model for processing, and extracting the first search term from a processing result.
Optionally, the first step: classifying the commodity description data according to commodity names and commodity attributes, performing text preprocessing on the classified commodity description data, and generating a candidate search word sequence, wherein the extraction module is specifically configured to:
extracting text data from the commodity description data;
counting and numbering all sentences in the text data;
dividing the sentence into a plurality of words, and recording position information of the words in the sentence;
analyzing and labeling the part of speech of the word;
deleting a first word with a preset part of speech from the words to obtain a modified word set;
carrying out duplication removal operation on the modified word set to obtain a candidate word set;
classifying the candidate word set according to the commodity name and the commodity attribute;
and performing text preprocessing on the classified candidate word set to generate the candidate search word sequence.
Optionally, in the operation of extracting the feature data of the candidate search word sequence in the second step, the extraction module is specifically configured to:
generating a first word vector table by using the trained word vector model;
generating a candidate search word vector sequence corresponding to the candidate search word sequence according to the first word vector table;
dividing the candidate search word vector sequence into n clusters according to the distance between the candidate search word vector sequences;
generating clustering center vectors of the n clusters according to a clustering algorithm;
quantifying the relation between the candidate search word sequence and the clustering center vector according to a distance formula to obtain the semantic features of the candidate search word sequence;
and extracting language features, word frequency features, length features and position features from the semantic features as the feature data.
Another aspect of the present invention provides a method for generating keywords based on artificial intelligence, including:
acquiring commodity description data, and extracting a first search term from the commodity description data;
obtaining potential competitive product data of the commodities according to the first search word;
processing the potential competitive product data by using an image processing algorithm, and filtering out data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data;
extracting bidding subject data of the competitive products from the data of the competitive products;
extracting core commodity words from the bid topic data;
selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set;
and generating keywords corresponding to the commodity according to the first core commodity word and by combining with a keyword generation rule.
Optionally, the step of processing the potential bid data by using an image processing algorithm, and filtering out data of bids with similarity lower than a preset threshold to obtain the bid data includes:
inputting the potential competitive product data, and recording a similarity identification value I as 0;
judging whether a first similarity value A1 of the potential competitive product data is larger than a first threshold value by using a first similarity judgment model;
if the first similarity value A1 is larger than the first threshold, judging whether a second similarity value A2 of the potential competitive product data is smaller than a second threshold by using a second similarity judgment model, and judging whether a third similarity value A3 of the potential competitive product data is smaller than a third threshold by using a third similarity judgment model;
if the second similarity value A2 is smaller than the second threshold value or the third similarity value A3 is smaller than the third threshold value, adding 1 to the similarity identification value I, and calculating a first similarity S1 by using a first similarity calculation method;
if the second similarity value A2 is not less than the second threshold or the third similarity value A3 is not less than the third threshold, calculating the first similarity S1 by using the first similarity calculation method;
the first similarity calculation method includes: the first similarity S1= A1 × first similarity value A1+ A2 × second similarity value A2+ A3 × third similarity value A3+ b1 × similarity value I, wherein A1, A2, A3, b1 are all weight coefficients greater than 0 and A1+ A2+ A3+ b1=1
If the first similarity value A1 is not larger than the first threshold value, processing image data in the potential competitive product data by using an image processing algorithm to obtain potential competitive product image data;
judging whether a fourth similarity value A4 of the potential competitive product image data is smaller than a fourth threshold value by using a fourth similarity judgment model, and judging whether a fifth similarity value A5 of the potential competitive product image data is smaller than a fifth threshold value by using a fifth similarity judgment model;
if the fourth similarity value A4 is not less than the fourth threshold or the fifth similarity value A5 is not less than the fifth threshold, adding 1 to the similarity identification value I, and calculating a second similarity S2 by using a second similarity calculation method;
if the fourth similarity value A4 is smaller than the fourth threshold or the fifth similarity value A5 is smaller than the fifth threshold, calculating the second similarity S2 by using the second similarity calculation method;
the second similarity calculation method comprises the following steps: a second similarity S2= a6 × first similarity value A1+ A4 × fourth similarity value A4+ A5 × fifth similarity value A5+ b2 × similarity value I, where A4, A5, a6, b2 are all weighting coefficients greater than 0 and A4+ A5+ a6+ b2=1;
judging whether the first similarity S1 or the second similarity S2 is not smaller than the preset threshold value, if so, marking the potential competitive product data as similar, and if not, marking the potential competitive product data as dissimilar;
extracting all data marked as similar in the potential bid data as the bid data.
Optionally, the step of obtaining the commodity description data and extracting the first search term from the commodity description data includes:
the method comprises the following steps: classifying the commodity description data according to commodity names and commodity attributes, and performing text preprocessing on the classified commodity description data to generate a candidate search word sequence;
step two: extracting characteristic data of the candidate search word sequence, and labeling the characteristic data to obtain a labeled sample set and a non-labeled sample set;
step three: taking the labeled sample set as a training set, and training a search term classification model by utilizing a neural network;
step four: carrying out classification prediction on candidate search terms in the unlabeled sample set by using the trained search term classification model, and calculating the matching degree of each unlabeled sample;
step five: selecting the corresponding unlabeled sample with the matching degree exceeding a preset matching degree value, adding the unlabeled sample into the training set, and retraining the search term classification model;
step six: repeating the fourth step to the fifth step until the proportion of the matching degree of each unlabeled sample, which is higher than the preset matching degree value, exceeds a preset proportion, so as to obtain a final search term classification model;
step seven: and inputting the characteristic data of the commodity description data into the final search term classification model for processing, and extracting the first search term from a processing result.
Optionally, the first step: classifying the commodity description data according to commodity names and commodity attributes, performing text preprocessing on the classified commodity description data, and generating a candidate search word sequence, wherein the candidate search word sequence comprises the following steps:
extracting text data from the commodity description data;
counting and numbering all sentences in the text data;
dividing the sentence into a plurality of words, and recording the position information of the words in the sentence;
analyzing and labeling the part of speech of the word;
deleting a first word with a preset part of speech from the words to obtain a modified word set;
carrying out duplication removal operation on the modified word set to obtain a candidate word set;
classifying the candidate word set according to the commodity name and the commodity attribute;
and performing text preprocessing on the classified candidate word set to generate the candidate search word sequence.
Optionally, the operation of extracting feature data of the candidate search word sequence in the second step includes:
generating a first word vector table by using the trained word vector model;
generating a candidate search word vector sequence corresponding to the candidate search word sequence according to the first word vector table;
dividing the candidate search word vector sequence into n clusters according to the distance between the candidate search word vector sequences;
generating clustering center vectors of the n clusters according to a clustering algorithm;
quantizing the relation between the candidate search word sequence and the clustering center vector according to a distance formula to obtain semantic features of the candidate search word sequence;
and extracting language features, word frequency features, length features and position features from the semantic features as the feature data.
According to the technical scheme, the method comprises the steps of obtaining commodity description data and extracting a first search word from the commodity description data; acquiring potential competitive product data of the commodity according to the first search word; processing the potential competitive product data by using an image processing algorithm, and filtering out data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data; extracting bidding data of the competitive products from the data of the competitive products; extracting core commodity words from the competitive bidding data; selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set; and generating keywords corresponding to the commodities by combining keyword generation rules according to the first core commodity words. Through this scheme, can gather race article data, market data and the automatic commodity keyword of editing automatically intelligently, reduce manual operation in a large number, promote the efficiency that generates the commodity file.
Drawings
FIG. 1 is a schematic block diagram of an artificial intelligence based keyword generation system provided by an embodiment of the present invention;
FIG. 2 is a flowchart of a method for generating keywords based on artificial intelligence according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for generating keywords based on artificial intelligence according to another embodiment of the present invention;
FIG. 4 is a flowchart of a keyword generation method based on artificial intelligence according to another embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention, taken in conjunction with the accompanying drawings and detailed description, is set forth below. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as specifically described herein, and thus the scope of the present invention is not limited by the specific embodiments disclosed below.
The terms "first," "second," and the like in the description and claims of the present application and in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein may be combined with other embodiments.
An artificial intelligence based keyword generation system and method provided in accordance with some embodiments of the present invention will be described below with reference to fig. 1 to 4.
As shown in fig. 1, an embodiment of the present invention provides a keyword generation system based on artificial intelligence, including: the device comprises an extraction module, a data processing module and a generation module;
the extraction module is configured to:
acquiring commodity description data, and extracting a first search term from the commodity description data;
acquiring potential competitive product data of the commodity according to the first search word;
the data processing module configured to:
processing the potential competitive product data by using an image processing algorithm, and filtering the data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data;
extracting bidding subject data of the competitive products from the data of the competitive products;
extracting core commodity words from the competitive bidding data;
selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set;
the generation module is configured to: and generating keywords corresponding to the commodity according to the first core commodity word and by combining with a keyword generation rule.
It is understood that, in this embodiment, the extracting module (e.g., a crawler module) may be used to obtain the commodity description data (e.g., the content of the introduced commodity such as the commodity specification, the commodity scheme, etc.) from the network platform and/or the e-commerce platform and/or the network server, and extract the first search word or search sentence or search text, etc., such as the commodity identification, the commodity name, the commodity attribute, etc., from the commodity description data.
Then, potential competitive product data of the commodities are obtained according to the first search word, namely, the corresponding network platform and/or e-commerce platform and/or network server and/or service site are searched through text information, the data of the same-style commodities or similar commodities which are searched as much as possible enter a data acquisition system to serve as the potential competitive product data, and different dictionary banks can be established according to the potential competitive product data based on different dimensions.
Because too much auction data are acquired according to the first search word and need to be filtered and screened, the data processing module can process the potential auction data by using an image processing algorithm and filter the data of the bids with the similarity lower than a preset threshold value to obtain the auction data.
Then, through the data processing module, combining with a pre-established commodity word library, an attribute word library and the like, the bid item bidding data can be extracted from the bid item data, and the core commodity words can be extracted from the bid item bidding data. The commodity word bank data has more than one million items, and is mainly a multi-element word; the attribute word library comprises word data of multiple dimensions such as brand, material, appearance, shape, color, applicability and the like of commodities. After a historical search word data set provided by an e-commerce platform background is obtained, the historical search word data set is stored in a database, and an inverted index is established to improve the efficiency of interface response.
And then, combining a preset search word data set, and selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words. Through the research and the statistical analysis of the applicant, the following results are found: the name of the commodity is generally the most frequent word in the commodity literature and mostly appears in the head of the title. For search terms of commodities, the terms used by applicable people and applicable scenes are very concentrated, and a fixed word bank (namely a search term data set) is established and matched for extraction. The position of the words in the title is fixed, and the matched words are concentrated, and the method for establishing the fixed word bank can be as follows: screening a batch of initial seed words, iteratively mining related applicable population and scene words, and establishing a search word data set. Manual intervention can be added in the iteration to remove irrelevant words in time. Wherein, the words for the applicable crowd and the applicable scene are obviously different in the context of the commodity name/title, such as: applicable people often appear in the commodity names/titles of toys, gifts, clothing, jewelry and the like, more electronic products are applicable scene words and the like, and accordingly, the applicable word vectors and the context words assist in manually distinguishing the applicable scene words from the applicable people words to complete the construction of word banks.
The commodity attribute words are generally used to explain the selling points or features of the commodities, including relatively important commodity attributes, characteristic descriptions of the commodities, and the like. The definition of the characteristic words is fuzzy, the fault tolerance is strong, and the characteristic words can be extracted by combining with the attribute table of each commodity.
And finally, the generating module generates keywords corresponding to the commodities according to the first core commodity words and by combining keyword generation rules.
And generating appropriate description keywords/titles of corresponding commodities under the condition of meeting the differences of the keywords/titles and the platforms by combining the first core commodity words extracted in the previous step, such as core keywords, characteristic words, brand words, applicable crowds, applicable scenes and the like, and the keyword/title generation rules of each e-commerce platform.
By adopting the technical scheme of the embodiment, the competitive product data and the market data can be automatically and intelligently acquired, the commodity keywords can be automatically edited, manual operation is greatly reduced, and the efficiency of generating the commodity file is improved.
It should be understood that the block diagram of the artificial intelligence based keyword generation system shown in fig. 1 is only schematic, and the number of the modules shown is not intended to limit the scope of the present invention.
In some possible embodiments of the present invention, in the step of processing the potential bid data by using an image processing algorithm, and filtering out data of bids with a similarity lower than a preset threshold to obtain the bid data, the data processing module is specifically configured to:
inputting the potential competitive product data, and recording a similarity identification value I as 0;
judging whether a first similarity value A1 of the potential competitive product data is larger than a first threshold value by using a first similarity judgment model;
if the first similarity value A1 is larger than the first threshold value, judging whether a second similarity value A2 of the potential competition product data is smaller than a second threshold value by using a second similarity judgment model, and judging whether a third similarity value A3 of the potential competition product data is smaller than a third threshold value by using a third similarity judgment model;
if the second similarity value A2 is smaller than the second threshold value or the third similarity value A3 is smaller than the third threshold value, adding 1 to the similarity identification value I, and calculating a first similarity S1 by using a first similarity calculation method;
if the second similarity value A2 is not less than the second threshold or the third similarity value A3 is not less than the third threshold, calculating the first similarity S1 by using the first similarity calculation method;
the first similarity calculation method includes: the first similarity S1= A1 × first similarity value A1+ A2 × second similarity value A2+ A3 × third similarity value A3+ b1 × similarity value I, wherein A1, A2, A3, b1 are all weighting coefficients greater than 0 and A1+ A2+ A3+ b1=1
If the first similarity value A1 is not larger than the first threshold value, processing image data in the potential competitive product data by using an image processing algorithm to obtain potential competitive product image data;
judging whether a fourth similarity value A4 of the potential competitive product image data is smaller than a fourth threshold value by using a fourth similarity judgment model, and judging whether a fifth similarity value A5 of the potential competitive product image data is smaller than a fifth threshold value by using a fifth similarity judgment model;
if the fourth similarity value A4 is not less than the fourth threshold or the fifth similarity value A5 is not less than the fifth threshold, adding 1 to the similarity identification value I, and calculating a second similarity S2 by using a second similarity calculation method;
if the fourth similarity value A4 is smaller than the fourth threshold or the fifth similarity value A5 is smaller than the fifth threshold, calculating the second similarity S2 by using the second similarity calculation method;
the second similarity calculation method comprises the following steps: a second similarity S2= a6 × first similarity value A1+ A4 × fourth similarity value A4+ A5 × fifth similarity value A5+ b2 × similarity value I, where A4, A5, a6, b2 are all weighting coefficients greater than 0 and A4+ A5+ a6+ b2=1;
judging whether the first similarity S1 or the second similarity S2 is not smaller than the preset threshold value, if so, marking the potential competitive product data as similar, and if not, marking the potential competitive product data as dissimilar;
and extracting all data marked as similar in the potential bid data as the bid data.
It can be understood that from two dimensions of text and images, a plurality of models can be constructed according to respective characteristics of the text and the images to calculate similarity, and finally, weighted summation is carried out on results given by the plurality of models to judge whether the competitive products are truly similar.
In this embodiment, a first similarity determination model is used to perform a preliminary determination (which may be a similarity determination performed on text data), and when a first similarity value A1 obtained is greater than a first threshold (e.g., 80%), a second determination is further performed from a model trained by other dimensions/precision or other algorithms to improve accuracy, for example, a second similarity determination model is used to determine whether a second similarity value A2 of the potential item data is less than a second threshold, and/or a third similarity determination model is used to determine whether a third similarity value A3 of the potential item data is less than a third threshold. The second similarity determination model and the third similarity determination model may be models (or models of other dimensions) for performing similarity determination on text data; if the second similarity value A2 is smaller than the second threshold or the third similarity value A3 is smaller than the third threshold, adding 1 to the similarity identification value I, and calculating a first similarity value S1 by using a first similarity calculation method, where for potential bid data with higher text comparison similarity, further judgment may be performed by using other precision/dimension judgment models, and when the obtained second similarity value A2 is smaller than the second threshold (e.g., 60%) or the third similarity value A3 is smaller than the third threshold (e.g., 50%), indicating that there may be a false judgment in the preliminary judgment, adding 1 to the similarity identification value I to reduce the weights of the previous three judgment models, and calculating the first similarity value S1 by using the first similarity calculation method. If the second similarity value A2 is not less than the second threshold or the third similarity value A3 is not less than the third threshold, the first similarity S1 is calculated by using the first similarity calculation method. In some embodiments, the second similarity determination model and the third similarity determination model may be models for performing similarity determination on image data, or may be models for performing similarity determination on image data, one of which is a model for performing similarity determination on text data (or other models).
It can be understood that, if the first similarity value A1 is not greater than the first threshold, processing image data in the potential auction data by using an image processing algorithm to obtain potential auction image data, determining whether a fourth similarity value A4 of the potential auction image data is less than a fourth threshold by using a fourth similarity determination model, and determining whether a fifth similarity value A5 of the potential auction image data is less than a fifth threshold by using a fifth similarity determination model; if the fourth similarity value A4 is not less than the fourth threshold or the fifth similarity value A5 is not less than the fifth threshold, which indicates that there may be a false judgment in the preliminary judgment, adding 1 to the similarity identification value I to reduce the weight of the output result of the first, fourth, and fifth similarity judgment models, and calculating a second similarity S2 by using a second similarity calculation method; if the fourth similarity value A4 is smaller than the fourth threshold or the fifth similarity value A5 is smaller than the fifth threshold, the second similarity S2 is calculated by using the second similarity calculation method. In the embodiment, the model for judging the similarity of the image data is added through two models with different precisions (or trained by different algorithms), so that the judgment accuracy is improved, and the problem that real competitive product data is missed due to poor comparison result of text data can be avoided.
In some possible embodiments of the present invention, in the step of obtaining the commodity description data and extracting the first search term from the commodity description data, the extraction module is specifically configured to:
the method comprises the following steps: classifying the commodity description data according to commodity names and commodity attributes, and performing text preprocessing on the classified commodity description data to generate a candidate search word sequence;
step two: extracting characteristic data of the candidate search word sequence, and labeling the characteristic data to obtain a labeled sample set and a non-labeled sample set;
step three: taking the labeled sample set as a training set, and training a search term classification model by utilizing a neural network;
step four: carrying out classification prediction on candidate search words in the label-free sample set by using the trained search word classification model, and calculating the matching degree of each label-free sample;
step five: selecting the corresponding unlabeled sample with the matching degree exceeding a preset matching degree value, adding the unlabeled sample into the training set, and retraining the search term classification model;
step six: repeating the fourth step to the fifth step until the proportion of the matching degree of each unmarked sample which is higher than the preset matching degree value exceeds a preset proportion, so as to obtain a final search term classification model;
step seven: and inputting the characteristic data of the commodity description data into the final search term classification model for processing, and extracting the first search term from a processing result.
It can be understood that, in this embodiment, after the feature data of the candidate search word sequence is extracted, a part of the feature data is labeled to obtain a labeled sample set, and the other part of the feature data is a label-free sample set, the labeled sample set data is used to train the search word classification model through the neural network, and then the label-free sample set data is used to further train the search word classification model until the proportion of the matching value sets of all the label-free samples higher than the preset matching value exceeds the preset proportion, so as to achieve the purpose of improving the performance of the search word classification model.
In some possible embodiments of the invention, the first step: classifying the commodity description data according to commodity names and commodity attributes, performing text preprocessing on the classified commodity description data, and generating a candidate search word sequence, wherein the extraction module is specifically configured to:
extracting text data from the commodity description data;
counting and numbering all sentences in the text data;
dividing the sentence into a plurality of words, and recording the position information of the words in the sentence;
analyzing and labeling the part of speech of the word;
deleting a first word with a preset part of speech from the words to obtain a modified word set;
carrying out duplicate removal operation on the modified word set to obtain a candidate word set;
classifying the candidate word set according to the commodity name and the commodity attribute;
and performing text preprocessing on the classified candidate word set to generate the candidate search word sequence.
It can be understood that, in order to improve the accuracy of text recognition and judgment, in this embodiment, after extracting text data from the commodity description data and counting and numbering all sentences in the text data, the sentences are divided into a plurality of words, and the position information of the words in the sentences is recorded; analyzing and labeling the part of speech of the word; deleting a first word with preset parts of speech (such as adjectives, adverbs, pronouns, auxiliary words and the like) and having no meaning for the keywords from the words to obtain a modified word set; carrying out duplicate removal operation on the modified word set to obtain a candidate word set; classifying the candidate word set according to the commodity name and the commodity attribute; and performing text preprocessing on the classified candidate word set to generate the candidate search word sequence.
In some possible embodiments of the present invention, in the operation of extracting the feature data of the candidate search term sequence in the second step, the extraction module is specifically configured to:
generating a first word vector table by using the trained word vector model;
generating a candidate search word vector sequence corresponding to the candidate search word sequence according to the first word vector table;
dividing the candidate search word vector sequence into n clusters according to the distance between the candidate search word vector sequences;
generating clustering center vectors of the n clusters according to a clustering algorithm;
quantifying the relation between the candidate search word sequence and the clustering center vector according to a distance formula to obtain the semantic features of the candidate search word sequence;
and extracting language features, word frequency features, length features and position features from the semantic features as the feature data.
It can be understood that, in this embodiment, in order to improve the efficiency and accuracy of feature data extraction, a candidate search word vector sequence corresponding to the candidate search word sequence is generated by vectorizing the feature data of the candidate search word sequence and performing vector operation; dividing the candidate search word vector sequence into n clusters according to the distance between the candidate search word vector sequences; generating clustering center vectors of the n clusters according to a clustering algorithm, and quantizing the relation between the candidate search word sequence and the clustering center vectors according to an Euclidean distance formula to obtain semantic features of the candidate search word sequence; and extracting language features, word frequency features, length features and position features from the semantic features as the feature data.
Referring to fig. 2, another embodiment of the present invention provides a keyword generation method based on artificial intelligence, including:
acquiring commodity description data, and extracting a first search word from the commodity description data;
acquiring potential competitive product data of the commodity according to the first search word;
processing the potential competitive product data by using an image processing algorithm, and filtering out data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data;
extracting bidding data of the competitive products from the data of the competitive products;
extracting core commodity words from the bid topic data;
selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set;
and generating keywords corresponding to the commodities by combining keyword generation rules according to the first core commodity words.
It can be understood that, in this embodiment, the extracting module (e.g., a crawler module) may obtain the commodity description data (e.g., the content of the introduced commodity such as the commodity specification, the commodity scheme, etc.) from the network platform and/or the e-commerce platform and/or the network server, and extract the first search term or the search sentence or the search text, etc., such as the commodity identification, the commodity name, the commodity attribute, etc., from the commodity description data.
Then, potential competitive product data of the commodities are obtained according to the first search word, namely, the commodities with the same type or similar commodities are searched in a data acquisition system as much as possible through searching in a corresponding network platform and/or an e-commerce platform and/or a network server and/or a service site through text information, and different dictionary libraries can be established according to the potential competitive product data based on different dimensions.
Because too much auction data are acquired according to the first search word and need to be filtered and screened, the potential auction data can be processed by using an image processing algorithm, and the data of the auction with the similarity lower than a preset threshold value is filtered to obtain the auction data.
Then, by combining a pre-established commodity word library, an attribute word library and the like, the competitive bidding topic data can be extracted from the competitive bidding data, and the core commodity words can be extracted from the competitive bidding topic data. The commodity word bank data has more than one million items, and is mainly a plurality of words; the attribute word library comprises word data of multiple dimensions such as brand, material, appearance, shape, color, applicability and the like of commodities. After a historical search word data set provided by the E-commerce platform background is obtained, the historical search word data set is stored in a database, and an inverted index is established to improve the efficiency of interface response.
And then, combining a preset search word data set, and selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words. Through the research and the statistical analysis of the applicant, the following results are found: the name of the commodity is generally the most frequent word in the commodity literature and mostly appears in the head of the title. For the search terms of the commodities, the terms used by applicable people and applicable scenes are very concentrated, and a fixed word library (namely a search term data set) is established and matched for extraction. The position of the words in the title is fixed, and the words matched with the words are concentrated, and the method for establishing the fixed word bank can be as follows: screening a batch of initial seed words, iteratively mining related applicable population and scene words, and establishing a search word data set. Manual intervention can be added in the iteration, and irrelevant words can be removed in time. Wherein, the words used by the applicable groups and the applicable scenes are obviously different in the context of the commodity name/title, such as: applicable groups often appear in commodity names/titles such as toys, gifts, clothes, jewelry and the like, more electronic products are applicable scene words and the like, and accordingly, the applicable word vectors and the context words assist in manually distinguishing the applicable scene words from the applicable group words, and the construction of a word bank is completed.
The commodity attribute words are generally used for explaining selling points or characteristics of commodities, and include relatively important commodity attributes, characteristic descriptions of each commodity and the like. The definition of the characteristic words is fuzzy, the fault tolerance is strong, and the characteristic words can be extracted by combining with the attribute table of each commodity.
And finally, generating keywords corresponding to the commodities by combining keyword generation rules according to the first core commodity word.
And generating appropriate description keywords/titles of corresponding commodities under the condition of meeting the differences of the keywords/titles and the platforms by combining the first core commodity words extracted in the previous step, such as core keywords, characteristic words, brand words, applicable groups, applicable scenes and the like, and the keyword/title generation rules of each e-commerce platform.
By adopting the technical scheme of the embodiment, the commodity description data is obtained, and the first search word is extracted from the commodity description data; acquiring potential competitive product data of the commodity according to the first search word; processing the potential competitive product data by using an image processing algorithm, and filtering out data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data; extracting bidding subject data of the competitive products from the data of the competitive products; extracting core commodity words from the bid topic data; selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set; and generating keywords corresponding to the commodities by combining with keyword generation rules according to the first core commodity words, automatically and intelligently acquiring competitive product data and market data and automatically editing commodity keywords, greatly reducing manual operation and improving the efficiency of generating commodity copy.
In some possible embodiments of the present invention, the step of processing the potential bid data by using an image processing algorithm and filtering out data of bids with similarity lower than a preset threshold to obtain the bid data includes:
inputting the potential competitive product data, and recording a similarity identification value I as 0;
judging whether a first similarity value A1 of the potential competitive product data is larger than a first threshold value by using a first similarity judgment model;
if the first similarity value A1 is larger than the first threshold, judging whether a second similarity value A2 of the potential competitive product data is smaller than a second threshold by using a second similarity judgment model, and judging whether a third similarity value A3 of the potential competitive product data is smaller than a third threshold by using a third similarity judgment model;
if the second similarity value A2 is smaller than the second threshold value or the third similarity value A3 is smaller than the third threshold value, adding 1 to the similarity identification value I, and calculating a first similarity S1 by using a first similarity calculation method;
if the second similarity value A2 is not less than the second threshold or the third similarity value A3 is not less than the third threshold, calculating the first similarity S1 by using the first similarity calculation method;
the first similarity calculation method includes: the first similarity S1= A1 × first similarity value A1+ A2 × second similarity value A2+ A3 × third similarity value A3+ b1 × similarity value I, wherein A1, A2, A3, b1 are all weight coefficients greater than 0 and A1+ A2+ A3+ b1=1
If the first similarity value A1 is not larger than the first threshold value, processing image data in the potential competitive product data by using an image processing algorithm to obtain potential competitive product image data;
judging whether a fourth similarity value A4 of the potential competitive product image data is smaller than a fourth threshold value by using a fourth similarity judgment model, and judging whether a fifth similarity value A5 of the potential competitive product image data is smaller than a fifth threshold value by using a fifth similarity judgment model;
if the fourth similarity value A4 is not less than the fourth threshold or the fifth similarity value A5 is not less than the fifth threshold, adding 1 to the similarity identification value I, and calculating a second similarity S2 by using a second similarity calculation method;
if the fourth similarity value A4 is smaller than the fourth threshold or the fifth similarity value A5 is smaller than the fifth threshold, calculating the second similarity S2 by using the second similarity calculation method;
the second similarity calculation method comprises the following steps: a second similarity S2= a6 × first similarity value A1+ A4 × fourth similarity value A4+ A5 × fifth similarity value A5+ b2 × similarity value I, where A4, A5, a6, b2 are all weighting coefficients greater than 0 and A4+ A5+ a6+ b2=1;
judging whether the first similarity S1 or the second similarity S2 is not smaller than the preset threshold value, if so, marking the potential competitive product data as similar, and if not, marking the potential competitive product data as dissimilar;
extracting all data marked as similar in the potential bid data as the bid data.
It can be understood that, starting from two dimensions of text and image, a plurality of models can be constructed according to their respective characteristics to calculate the similarity, and finally the results given by the plurality of models are weighted and summed to determine whether the competitive products are truly similar.
In this embodiment, a first similarity determination model is used to perform a preliminary determination (which may be a similarity determination performed on text data), and when a first similarity value A1 obtained is greater than a first threshold (e.g., 80%), a second determination is further performed from a model trained by other dimensions/precision or other algorithms to improve accuracy, for example, a second similarity determination model is used to determine whether a second similarity value A2 of the potential item data is less than a second threshold, and/or a third similarity determination model is used to determine whether a third similarity value A3 of the potential item data is less than a third threshold. The second similarity determination model and the third similarity determination model may be models (or models of other dimensions) for performing similarity determination on text data; if the second similarity value A2 is smaller than the second threshold or the third similarity value A3 is smaller than the third threshold, adding 1 to the similarity identification value I, and calculating a first similarity S1 by using a first similarity calculation method, where for potential item data with higher text comparison similarity, a judgment model with other precision/dimensionality may be used for further judgment, and when the obtained second similarity value A2 is smaller than the second threshold (e.g., 60%) or the third similarity value A3 is smaller than the third threshold (e.g., 50%), indicating that a false judgment may exist in the preliminary judgment, adding 1 to the similarity identification value I to reduce the weights of the previous three judgment models, and calculating the first similarity S1 by using the first similarity calculation method. If the second similarity value A2 is not less than the second threshold or the third similarity value A3 is not less than the third threshold, the first similarity S1 is calculated by using the first similarity calculation method. In some embodiments, the second similarity determination model and the third similarity determination model may be models for performing similarity determination on image data, or may be models for performing similarity determination on image data, one of which is a model for performing similarity determination on text data (or other models).
It can be understood that, if the first similarity value A1 is not greater than the first threshold, processing image data in the potential auction data by using an image processing algorithm to obtain potential auction image data, determining whether a fourth similarity value A4 of the potential auction image data is less than a fourth threshold by using a fourth similarity determination model, and determining whether a fifth similarity value A5 of the potential auction image data is less than a fifth threshold by using a fifth similarity determination model; if the fourth similarity value A4 is not less than the fourth threshold or the fifth similarity value A5 is not less than the fifth threshold, which indicates that there may be a false judgment in the preliminary judgment, adding 1 to the similarity identification value I to reduce the weight of the output result of the first, fourth, and fifth similarity judgment models, and calculating a second similarity S2 by using a second similarity calculation method; if the fourth similarity value A4 is smaller than the fourth threshold or the fifth similarity value A5 is smaller than the fifth threshold, the second similarity S2 is calculated by using the second similarity calculation method. In the embodiment, the model for judging the similarity of the image data with two different precisions (or trained by different algorithms) is added, so that the judgment accuracy is improved, and the problem that real competitive product data are missed due to poor text data comparison results can be solved.
Referring to fig. 3, in some possible embodiments of the present invention, the step of obtaining the commodity description data and extracting the first search term from the commodity description data includes:
the method comprises the following steps: classifying the commodity description data according to commodity names and commodity attributes, and performing text preprocessing on the classified commodity description data to generate a candidate search word sequence;
step two: extracting characteristic data of the candidate search word sequence, and labeling the characteristic data to obtain a labeled sample set and a non-labeled sample set;
step three: taking the labeled sample set as a training set, and training a search term classification model by utilizing a neural network;
step four: carrying out classification prediction on candidate search words in the label-free sample set by using the trained search word classification model, and calculating the matching degree of each label-free sample;
step five: selecting the corresponding unlabeled sample with the matching degree exceeding a preset matching degree value, adding the unlabeled sample into the training set, and retraining the search term classification model;
step six: repeating the fourth step to the fifth step until the proportion of the matching degree of each unlabeled sample, which is higher than the preset matching degree value, exceeds a preset proportion, so as to obtain a final search term classification model;
step seven: and inputting the characteristic data of the commodity description data into the final search term classification model for processing, and extracting the first search term from a processing result.
It can be understood that, in this embodiment, after the feature data of the candidate search word sequence is extracted, a part of the feature data is labeled to obtain a labeled sample set, and the other part is a label-free sample set, a search word classification model is trained through a neural network by using the sample set data labeled with a label, and then the search word classification model is further trained by using the label-free sample set data until the ratio higher than the preset matching value in the matching value set of each label-free sample exceeds a preset ratio, so as to achieve the purpose of improving the performance of the search word classification model.
Referring to fig. 4, in some possible embodiments of the present invention, the first step: classifying the commodity description data according to commodity names and commodity attributes, performing text preprocessing on the classified commodity description data, and generating a candidate search word sequence, wherein the candidate search word sequence comprises the following steps:
extracting text data from the commodity description data;
counting and numbering all sentences in the text data;
dividing the sentence into a plurality of words, and recording the position information of the words in the sentence;
analyzing and labeling the part of speech of the word;
deleting a first word with a preset part of speech from the words to obtain a modified word set;
carrying out duplication removal operation on the modified word set to obtain a candidate word set;
classifying the candidate word set according to the commodity name and the commodity attribute;
and performing text preprocessing on the classified candidate word set to generate the candidate search word sequence.
It can be understood that, in order to improve the accuracy of text recognition and judgment, in this embodiment, after text data is extracted from the commodity description data and all sentences in the text data are counted and numbered, the sentences are divided into a plurality of words, and the position information of the words in the sentences is recorded; analyzing and labeling the part of speech of the word; deleting a first word with preset parts of speech (such as adjectives, adverbs, pronouns, auxiliary words and the like) and having no meaning for the keywords from the words to obtain a modified word set; carrying out duplication removal operation on the modified word set to obtain a candidate word set; classifying the candidate word set according to the commodity name and the commodity attribute; and performing text preprocessing on the classified candidate word set to generate the candidate search word sequence.
In some possible embodiments of the present invention, the operation of extracting the feature data of the candidate search word sequence in the second step includes:
generating a first word vector table by using the trained word vector model;
generating a candidate search word vector sequence corresponding to the candidate search word sequence according to the first word vector table;
dividing the candidate search word vector sequence into n clusters according to the distance between the candidate search word vector sequences;
generating clustering center vectors of the n clusters according to a clustering algorithm;
quantizing the relation between the candidate search word sequence and the clustering center vector according to a distance formula to obtain semantic features of the candidate search word sequence;
and extracting language features, word frequency features, length features and position features from the semantic features as the feature data.
It can be understood that, in this embodiment, in order to improve the efficiency and accuracy of feature data extraction, a candidate search word vector sequence corresponding to the candidate search word sequence is generated by vectorizing the feature data of the candidate search word sequence and performing vector operation; dividing the candidate search word vector sequence into n clusters according to the distance between the candidate search word vector sequences; generating clustering center vectors of the n clusters according to a clustering algorithm, and quantizing the relation between the candidate search word sequence and the clustering center vectors according to a Euclidean distance formula to obtain semantic features of the candidate search word sequence; and extracting language features, word frequency features, length features and position features from the semantic features as the feature data.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the above-described units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer readable memory if it is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solutions of the present application, which are essential or part of the technical solutions contributing to the prior art, or all or part of the technical solutions, may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the above methods of the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps of the methods of the above embodiments may be implemented by a program, which is stored in a computer-readable memory, the memory including: flash Memory disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Although the present invention is disclosed above, the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions without departing from the spirit and scope of the invention, and all changes and modifications can be made, including different combinations of functions, implementation steps, software and hardware implementations, all of which are included in the scope of the invention.

Claims (10)

1. A keyword generation system based on artificial intelligence is characterized by comprising: the device comprises an extraction module, a data processing module and a generation module;
the extraction module is configured to:
acquiring commodity description data, and extracting a first search term from the commodity description data;
acquiring potential competitive product data of the commodity according to the first search word;
the data processing module configured to:
processing the potential competitive product data by using an image processing algorithm, and filtering out data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data;
extracting bidding subject data of the competitive products from the data of the competitive products;
extracting core commodity words from the competitive bidding data;
selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set;
the generation module is configured to: and generating keywords corresponding to the commodities by combining keyword generation rules according to the first core commodity words.
2. The artificial intelligence based keyword generation system of claim 1, wherein in the step of processing the potential bid data by using an image processing algorithm and filtering out data of bids with a similarity lower than a preset threshold to obtain the bid data, the data processing module is specifically configured to:
inputting the potential competitive product data, and recording a similarity identification value I as 0;
judging whether a first similarity value A1 of the potential competitive product data is larger than a first threshold value by using a first similarity judgment model;
if the first similarity value A1 is larger than the first threshold value, judging whether a second similarity value A2 of the potential competition product data is smaller than a second threshold value by using a second similarity judgment model, and judging whether a third similarity value A3 of the potential competition product data is smaller than a third threshold value by using a third similarity judgment model;
if the second similarity value A2 is smaller than the second threshold value or the third similarity value A3 is smaller than the third threshold value, adding 1 to the similarity identification value I, and calculating a first similarity S1 by using a first similarity calculation method;
if the second similarity value A2 is not less than the second threshold or the third similarity value A3 is not less than the third threshold, calculating the first similarity S1 by using the first similarity calculation method;
the first similarity calculation method includes: the first similarity S1= A1 × first similarity value A1+ A2 × second similarity value A2+ A3 × third similarity value A3+ b1 × similarity value I, wherein A1, A2, A3, b1 are all weighting coefficients greater than 0 and A1+ A2+ A3+ b1=1
If the first similarity value A1 is not larger than the first threshold value, processing image data in the potential competitive product data by using an image processing algorithm to obtain potential competitive product image data;
judging whether a fourth similarity value A4 of the potential competitive product image data is smaller than a fourth threshold value by using a fourth similarity judgment model, and judging whether a fifth similarity value A5 of the potential competitive product image data is smaller than a fifth threshold value by using a fifth similarity judgment model;
if the fourth similarity value A4 is not less than the fourth threshold or the fifth similarity value A5 is not less than the fifth threshold, adding 1 to the similarity identification value I, and calculating a second similarity S2 by using a second similarity calculation method;
if the fourth similarity value A4 is smaller than the fourth threshold or the fifth similarity value A5 is smaller than the fifth threshold, calculating the second similarity S2 by using the second similarity calculation method;
the second similarity calculation method comprises the following steps: a second similarity S2= a6 × first similarity value A1+ A4 × fourth similarity value A4+ A5 × fifth similarity value A5+ b2 × similarity value I, where A4, A5, a6, b2 are all weighting coefficients greater than 0 and A4+ A5+ a6+ b2=1;
judging whether the first similarity S1 or the second similarity S2 is not smaller than the preset threshold value, if so, marking the potential competitive product data as similar, and if not, marking the potential competitive product data as dissimilar;
extracting all data marked as similar in the potential bid data as the bid data.
3. The artificial intelligence based keyword generation system of claim 2, wherein in the step of obtaining commodity description data and extracting a first search term from the commodity description data, the extraction module is specifically configured to:
the method comprises the following steps: classifying the commodity description data according to commodity names and commodity attributes, and performing text preprocessing on the classified commodity description data to generate a candidate search word sequence;
step two: extracting characteristic data of the candidate search word sequence, and labeling the characteristic data to obtain a labeled sample set and a non-labeled sample set;
step three: taking the labeled sample set as a training set, and training a search term classification model by utilizing a neural network;
step four: carrying out classification prediction on candidate search words in the label-free sample set by using the trained search word classification model, and calculating the matching degree of each label-free sample;
step five: selecting the corresponding unlabeled sample with the matching degree exceeding a preset matching degree value, adding the unlabeled sample into the training set, and retraining the search term classification model;
step six: repeating the fourth step to the fifth step until the proportion of the matching degree of each unlabeled sample, which is higher than the preset matching degree value, exceeds a preset proportion, so as to obtain a final search term classification model;
step seven: and inputting the characteristic data of the commodity description data into the final search term classification model for processing, and extracting the first search term from a processing result.
4. The artificial intelligence based keyword generation system of claim 3, wherein the first step: classifying the commodity description data according to commodity names and commodity attributes, performing text preprocessing on the classified commodity description data, and generating a candidate search word sequence, wherein the extraction module is specifically configured to:
extracting text data from the commodity description data;
counting and numbering all sentences in the text data;
dividing the sentence into a plurality of words, and recording the position information of the words in the sentence;
analyzing and labeling the part of speech of the word;
deleting a first word with a preset part of speech from the words to obtain a modified word set;
carrying out duplication removal operation on the modified word set to obtain a candidate word set;
classifying the candidate word set according to the commodity name and the commodity attribute;
and performing text preprocessing on the classified candidate word set to generate the candidate search word sequence.
5. The artificial intelligence based keyword generation system of claims 1 to 4, wherein in the operation of extracting the feature data of the candidate search word sequence in the second step, the extraction module is specifically configured to:
generating a first word vector table by using the trained word vector model;
generating a candidate search word vector sequence corresponding to the candidate search word sequence according to the first word vector table;
dividing the candidate search word vector sequence into n clusters according to the distance between the candidate search word vector sequences;
generating clustering center vectors of the n clusters according to a clustering algorithm;
quantifying the relation between the candidate search word sequence and the clustering center vector according to a distance formula to obtain the semantic features of the candidate search word sequence;
and extracting language features, word frequency features, length features and position features from the semantic features as the feature data.
6. A keyword generation method based on artificial intelligence is characterized in that the keyword generation method based on artificial intelligence comprises the following steps:
acquiring commodity description data, and extracting a first search term from the commodity description data;
acquiring potential competitive product data of the commodity according to the first search word;
processing the potential competitive product data by using an image processing algorithm, and filtering the data of the competitive products with the similarity lower than a preset threshold value to obtain the competitive product data;
extracting bidding data of the competitive products from the data of the competitive products;
extracting core commodity words from the bid topic data;
selecting a first core commodity word with the frequency higher than a preset frequency value from the core commodity words by combining a preset search word data set;
and generating keywords corresponding to the commodities by combining keyword generation rules according to the first core commodity words.
7. The method for generating keywords based on artificial intelligence according to claim 6, wherein the step of processing the potential bid data by using an image processing algorithm and filtering out the data of bids with similarity lower than a preset threshold to obtain the bid data comprises:
inputting the potential competitive product data, and recording a similarity identification value I as 0;
judging whether a first similarity value A1 of the potential competitive product data is larger than a first threshold value by using a first similarity judgment model;
if the first similarity value A1 is larger than the first threshold, judging whether a second similarity value A2 of the potential competitive product data is smaller than a second threshold by using a second similarity judgment model, and judging whether a third similarity value A3 of the potential competitive product data is smaller than a third threshold by using a third similarity judgment model;
if the second similarity value A2 is smaller than the second threshold value or the third similarity value A3 is smaller than the third threshold value, adding 1 to the similarity identification value I, and calculating a first similarity S1 by using a first similarity calculation method;
if the second similarity value A2 is not less than the second threshold or the third similarity value A3 is not less than the third threshold, calculating the first similarity S1 by using the first similarity calculation method;
the first similarity calculation method includes: the first similarity S1= A1 × first similarity value A1+ A2 × second similarity value A2+ A3 × third similarity value A3+ b1 × similarity value I, wherein A1, A2, A3, b1 are all weighting coefficients greater than 0 and A1+ A2+ A3+ b1=1
If the first similarity value A1 is not larger than the first threshold value, processing image data in the potential competitive product data by using an image processing algorithm to obtain potential competitive product image data;
judging whether a fourth similarity value A4 of the potential competition product image data is smaller than a fourth threshold value or not by using a fourth similarity judgment model, and judging whether a fifth similarity value A5 of the potential competition product image data is smaller than a fifth threshold value or not by using a fifth similarity judgment model;
if the fourth similarity value A4 is not less than the fourth threshold or the fifth similarity value A5 is not less than the fifth threshold, adding 1 to the similarity identification value I, and calculating a second similarity S2 by using a second similarity calculation method;
if the fourth similarity value A4 is smaller than the fourth threshold or the fifth similarity value A5 is smaller than the fifth threshold, calculating the second similarity S2 by using the second similarity calculation method;
the second similarity calculation method comprises the following steps: a second similarity S2= a6 × a first similarity value A1+ A4 × a fourth similarity value A4+ A5 × a fifth similarity value A5+ b2 × similarity value I, where A4, A5, a6, b2 are all weighting factors greater than 0 and A4+ A5+ a6+ b2=1;
judging whether the first similarity S1 or the second similarity S2 is not smaller than the preset threshold value, if so, marking the potential competitive product data as similar, and if not, marking the potential competitive product data as dissimilar;
and extracting all data marked as similar in the potential bid data as the bid data.
8. The artificial intelligence based keyword generation method according to claim 7, wherein the step of obtaining commodity description data and extracting the first search term from the commodity description data comprises:
the method comprises the following steps: classifying the commodity description data according to commodity names and commodity attributes, and performing text preprocessing on the classified commodity description data to generate a candidate search word sequence;
step two: extracting characteristic data of the candidate search word sequence, and labeling the characteristic data to obtain a labeled sample set and a non-labeled sample set;
step three: taking the labeled sample set as a training set, and training a search term classification model by utilizing a neural network;
step four: carrying out classification prediction on candidate search words in the label-free sample set by using the trained search word classification model, and calculating the matching degree of each label-free sample;
step five: selecting the corresponding unlabeled sample with the matching degree exceeding a preset matching degree value, adding the unlabeled sample into the training set, and retraining the search term classification model;
step six: repeating the fourth step to the fifth step until the proportion of the matching degree of each unmarked sample which is higher than the preset matching degree value exceeds a preset proportion, so as to obtain a final search term classification model;
step seven: and inputting the characteristic data of the commodity description data into the final search term classification model for processing, and extracting the first search term from a processing result.
9. The method for generating keywords based on artificial intelligence of claim 8, wherein the first step is: classifying the commodity description data according to commodity names and commodity attributes, performing text preprocessing on the classified commodity description data, and generating a candidate search word sequence, wherein the candidate search word sequence comprises the following steps:
extracting text data from the commodity description data;
counting and numbering all sentences in the text data;
dividing the sentence into a plurality of words, and recording the position information of the words in the sentence;
analyzing and labeling the part of speech of the word;
deleting a first word with a preset part of speech from the words to obtain a modified word set;
carrying out duplicate removal operation on the modified word set to obtain a candidate word set;
classifying the candidate word set according to the commodity name and the commodity attribute;
and performing text preprocessing on the classified candidate word set to generate the candidate search word sequence.
10. The method for generating keywords based on artificial intelligence according to claims 6-9, wherein the operation of extracting the feature data of the candidate search word sequence in the second step comprises:
generating a first word vector table by using the trained word vector model;
generating a candidate search word vector sequence corresponding to the candidate search word sequence according to the first word vector table;
dividing the candidate search word vector sequence into n clusters according to the distance between the candidate search word vector sequences;
generating clustering center vectors of the n clusters according to a clustering algorithm;
quantifying the relation between the candidate search word sequence and the clustering center vector according to a distance formula to obtain the semantic features of the candidate search word sequence;
and extracting language features, word frequency features, length features and position features from the semantic features as the feature data.
CN202211294577.9A 2022-10-21 2022-10-21 Keyword generation system and method based on artificial intelligence Active CN115470322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211294577.9A CN115470322B (en) 2022-10-21 2022-10-21 Keyword generation system and method based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211294577.9A CN115470322B (en) 2022-10-21 2022-10-21 Keyword generation system and method based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN115470322A true CN115470322A (en) 2022-12-13
CN115470322B CN115470322B (en) 2023-05-05

Family

ID=84336356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211294577.9A Active CN115470322B (en) 2022-10-21 2022-10-21 Keyword generation system and method based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN115470322B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160260033A1 (en) * 2014-05-09 2016-09-08 Peter Keyngnaert Systems and Methods for Similarity and Context Measures for Trademark and Service Mark Analysis and Repository Searchess
CN108984554A (en) * 2017-06-01 2018-12-11 北京京东尚科信息技术有限公司 Method and apparatus for determining keyword
CN111191022A (en) * 2019-12-27 2020-05-22 苏宁云计算有限公司 Method and device for generating short titles of commodities
CN113343684A (en) * 2021-06-22 2021-09-03 广州华多网络科技有限公司 Core product word recognition method and device, computer equipment and storage medium
CN113468414A (en) * 2021-06-07 2021-10-01 广州华多网络科技有限公司 Commodity searching method and device, computer equipment and storage medium
CN113570413A (en) * 2021-07-28 2021-10-29 杭州王道控股有限公司 Method and device for generating advertisement keywords, storage medium and electronic equipment
CN114579896A (en) * 2022-03-04 2022-06-03 拉扎斯网络科技(上海)有限公司 Generation method and display method of recommended label, corresponding device and electronic equipment
CN114663164A (en) * 2022-04-12 2022-06-24 广州欢聚时代信息科技有限公司 E-commerce site popularization and configuration method and device, equipment, medium and product thereof
WO2022134759A1 (en) * 2020-12-21 2022-06-30 深圳壹账通智能科技有限公司 Keyword generation method and apparatus, and electronic device and computer storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160260033A1 (en) * 2014-05-09 2016-09-08 Peter Keyngnaert Systems and Methods for Similarity and Context Measures for Trademark and Service Mark Analysis and Repository Searchess
CN108984554A (en) * 2017-06-01 2018-12-11 北京京东尚科信息技术有限公司 Method and apparatus for determining keyword
CN111191022A (en) * 2019-12-27 2020-05-22 苏宁云计算有限公司 Method and device for generating short titles of commodities
WO2022134759A1 (en) * 2020-12-21 2022-06-30 深圳壹账通智能科技有限公司 Keyword generation method and apparatus, and electronic device and computer storage medium
CN113468414A (en) * 2021-06-07 2021-10-01 广州华多网络科技有限公司 Commodity searching method and device, computer equipment and storage medium
CN113343684A (en) * 2021-06-22 2021-09-03 广州华多网络科技有限公司 Core product word recognition method and device, computer equipment and storage medium
CN113570413A (en) * 2021-07-28 2021-10-29 杭州王道控股有限公司 Method and device for generating advertisement keywords, storage medium and electronic equipment
CN114579896A (en) * 2022-03-04 2022-06-03 拉扎斯网络科技(上海)有限公司 Generation method and display method of recommended label, corresponding device and electronic equipment
CN114663164A (en) * 2022-04-12 2022-06-24 广州欢聚时代信息科技有限公司 E-commerce site popularization and configuration method and device, equipment, medium and product thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘加新: "数据驱动的用户画像构建研究与***设计" *

Also Published As

Publication number Publication date
CN115470322B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN108509465B (en) Video data recommendation method and device and server
CN110472090B (en) Image retrieval method based on semantic tags, related device and storage medium
CN102855268B (en) Image ranking method and system based on attribute correlation
CN110909164A (en) Text enhancement semantic classification method and system based on convolutional neural network
CN110633373A (en) Automobile public opinion analysis method based on knowledge graph and deep learning
CN112667794A (en) Intelligent question-answer matching method and system based on twin network BERT model
CN107833082B (en) Commodity picture recommendation method and device
JP2004038606A (en) Method for evaluating specificity of document
CA3166094A1 (en) Commodity short title generation method and apparatus
CN111090763A (en) Automatic picture labeling method and device
Homoceanu et al. Will I like it? Providing product overviews based on opinion excerpts
CN114238573A (en) Information pushing method and device based on text countermeasure sample
CN113177102B (en) Text classification method and device, computing equipment and computer readable medium
CN113570413A (en) Method and device for generating advertisement keywords, storage medium and electronic equipment
CN112527958A (en) User behavior tendency identification method, device, equipment and storage medium
CN116737922A (en) Tourist online comment fine granularity emotion analysis method and system
CN114943285B (en) Intelligent auditing system for internet news content data
CN108717637B (en) Automatic mining method and system for E-commerce safety related entities
CN116579351A (en) Analysis method and device for user evaluation information
CN113065329A (en) Data processing method and device
CN115033799B (en) Commodity searching method, system and storage medium
CN114048294B (en) Similar population extension model training method, similar population extension method and device
CN115470322B (en) Keyword generation system and method based on artificial intelligence
CN115017264A (en) Model effect verification method and device
Hoiriyah et al. Lexicon-Based and Naive Bayes Sentiment Analysis for Recommending the Best Marketplace Selection as a Marketing Strategy for MSMEs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant