CN112183069B - Keyword construction method and system based on historical keyword put-in data - Google Patents

Keyword construction method and system based on historical keyword put-in data Download PDF

Info

Publication number
CN112183069B
CN112183069B CN202011079017.2A CN202011079017A CN112183069B CN 112183069 B CN112183069 B CN 112183069B CN 202011079017 A CN202011079017 A CN 202011079017A CN 112183069 B CN112183069 B CN 112183069B
Authority
CN
China
Prior art keywords
keywords
keyword
candidate
construction method
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011079017.2A
Other languages
Chinese (zh)
Other versions
CN112183069A (en
Inventor
陈嘉真
徐凯波
张琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202011079017.2A priority Critical patent/CN112183069B/en
Publication of CN112183069A publication Critical patent/CN112183069A/en
Application granted granted Critical
Publication of CN112183069B publication Critical patent/CN112183069B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a keyword construction method and a keyword construction system based on historical keyword input data, wherein the keyword construction method comprises the steps of obtaining given data; constructing a plurality of alternative keywords according to a preset rule according to given data; performing feature processing on a plurality of candidate keywords through a word2vec model to obtain features of each candidate keyword; evaluating and sorting the plurality of candidate keywords according to the characteristics of each candidate keyword and the given data; and outputting recommended keywords according to the plurality of candidate keywords which are ordered. According to the method, the expected display quantity and click quantity of the structure of the keywords are learned through the historical data of the keywords, the composition of the keywords is guided through the model, the newly constructed keywords are ensured to be reasonable, and good throwing performance is achieved.

Description

Keyword construction method and system based on historical keyword put-in data
Technical Field
The invention relates to the technical field of data processing, in particular to a keyword construction method and system based on historical keyword delivery data.
Background
With the rapid development of network technology and information products, various network platforms attract more and more users, the Internet has become one of the most important information transmission media at present, and in the activity of a shopping platform, electronic commerce can realize thousands of clients' drainage by purchasing keywords, and the keywords are constructed by extracting words or characters from known corpus to form new vocabulary, so that accurate summarized text information is formed.
The current common keyword construction method mainly comprises the steps of randomly constructing some keywords according to a pattern set manually: firstly, the vocabulary with higher heat is divided into long tail words and core words (most of the current words are manually screened, or small samples are marked and then classified by a classification model). Then constructing according to the modes of brand word class word and core word, brand word class word and long tail word and core word, etc.
The rationality can be barely guaranteed by the above method, but because pattern is manufactured manually, only a small number of reasonable words can be covered. Secondly, the performance of the words cannot be judged.
Disclosure of Invention
Aiming at the technical problem that the keyword construction lacks good delivery performance, the invention provides a keyword construction method and system based on historical keyword delivery data.
In a first aspect, an embodiment of the present application provides a keyword construction method based on historical keyword delivery data, including:
S1, acquiring given data;
s2, constructing a plurality of alternative keywords according to a preset rule according to the given data;
s3, performing feature processing on the plurality of candidate keywords through a word2vec model to obtain features of each candidate keyword;
S4, evaluating and sequencing a plurality of candidate keywords according to the characteristics of each candidate keyword and the given data;
S5, outputting recommended keywords according to the plurality of candidate keywords which are ordered.
According to the keyword construction method based on the historical keyword put-in data, the given data comprise scene environment variables, candidate hotness root words, evaluation indexes and recommended keyword numbers.
The above keyword construction method based on historical keyword input data, wherein the step S2 includes:
step S21: randomly constructing a plurality of keywords according to the candidate hotness root words;
Step S22: and screening a plurality of candidate keywords from the plurality of keywords according to a preset rule.
The method for constructing keywords based on historical keyword delivery data, wherein the step S3 further includes: and performing word segmentation processing on the alternative keywords by using jieba in advance.
The method for constructing keywords based on historical keyword delivery data, wherein the step S3 further includes: the word2vec model is adopted to pretrain the candidate keywords after word segmentation to obtain word vectors, and the word vector average sum of the candidate keywords is taken as the characteristic of the candidate keywords.
The above keyword construction method based on historical keyword input data, wherein the step S4 includes:
Step S41: obtaining index expression through a prediction model according to the characteristics of the alternative keywords, the scene environment variables and the evaluation indexes;
step S42: and sequencing the plurality of candidate keywords through the evaluation model according to the index performance.
The method for constructing keywords based on historical keyword delivery data, wherein the step S5 further comprises outputting recommended keywords according to the recommended keyword quantity.
In a second aspect, an embodiment of the present application provides a keyword construction system based on historical keyword delivery data, including:
An input unit that inputs given data;
a keyword construction unit for randomly constructing a plurality of keywords according to the given data;
a primary screening unit for screening a plurality of candidate keywords from the plurality of keywords according to a preset rule;
the keyword feature acquisition unit is used for carrying out feature processing on a plurality of candidate keywords through a word2vec model to acquire the features of each candidate keyword;
the secondary screening unit evaluates and sorts the plurality of candidate keywords through a prediction model and an evaluation model according to the characteristics of each candidate keyword, the environment variable and the evaluation index;
and the output unit is used for outputting recommended keywords according to the plurality of candidate keywords which are ordered.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the keyword construction method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the keyword construction method as described in the first aspect above.
Compared with the prior art, the invention has the advantages and positive effects that:
1. A large number of keywords are randomly constructed, and then a reasonable part of the keywords are screened out through manual rules, so that the phenomenon that data volume limitation is caused in modeling due to the fact that the proportion of historical put data in the randomly constructed keywords is smaller is avoided.
2. The expected display amount or click amount of the structure of the keywords is learned through the historical data of the keywords, so that the prediction model can be utilized to evaluate and sort a large number of relatively reasonable keywords which are built, the keywords which are well represented are screened as final results, and the newly-built keywords are guaranteed to have good throwing performance.
3. Each recommended keyword has a corresponding display amount or return on investment (Return On Investment, abbreviated as ROI) performance prediction, so that the interpretability of the model is improved, and the user experience is further improved.
4. Both constructing keywords based on rationality and evaluating keywords based on model can be modularized. Each part can adopt different models to achieve the purpose, for example, the models for evaluating the keywords can use traditional statistical methods and other commonly used machine learning models, such as neural networks, tree models and the like; the rationality evaluation can be performed by taking historically put keywords as reference training models, giving scores, performing primary screening, and then inputting the keywords into a prediction model and an evaluation model for secondary screening.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a flow chart of a keyword construction method based on historical keyword placement data according to an embodiment of the present application;
FIG. 2 is a framework diagram of a keyword construction system based on historical keyword placement data in accordance with an embodiment of the present application;
Fig. 3 is a frame diagram of a computer device according to an embodiment of the application.
Wherein, the reference numerals are as follows:
81. A processor; 82. a memory; 83. a communication interface; 80. a bus.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
The present invention will be described in detail below with reference to the embodiments shown in the drawings, but it should be understood that the embodiments are not limited to the present invention, and functional, method, or structural equivalents and alternatives according to the embodiments are within the scope of protection of the present invention by those skilled in the art.
Before explaining the various embodiments of the invention in detail, the core inventive concepts of the invention are summarized and described in detail by the following examples.
The technical scheme is used for keyword construction based on historical keyword input data, and comprises the steps of constructing a reasonable large number of keywords for candidate word roots according to a preset rule, and screening reserved keywords through an index prediction model. The method solves the problems that the rationality between the word root and the word root in the keyword can not be completely captured only by the method for manufacturing the new word through the pattern, and further the expected performance of the keyword can not be ensured.
Embodiment one:
referring to fig. 1, this example discloses a specific embodiment of a keyword construction method (hereinafter referred to as "method") based on historical keyword placement data.
Specifically, the method disclosed in this embodiment mainly includes the following steps:
And S1, acquiring given data.
Specifically, the given data includes scene environment variables, candidate hotness roots, evaluation indexes and recommended keyword numbers.
The environment scene variables comprise brand words, category words, time points, activity type names and the like; the evaluation index includes ROI, display amount, click rate, or the like.
ROI means return on investment (Return On Investment) which refers to the value that should be returned by investment, i.e., the economic return an enterprise receives from an investment campaign. The return on investment can reflect the comprehensive profitability of the investment centers, and the incomparable factors of profit difference caused by different investment amounts are removed, so that the investment centers have transverse comparability and are beneficial to judging the quality of the operation performance of each investment center; in addition, the return on investment can be used as the basis for selecting investment opportunities, which is beneficial to optimizing resource allocation.
When searching and inquiring the netizen, if the keyword meeting the searching requirement of the netizen in the account is triggered, the creative corresponding to the keyword appears on a searching result page, and is called as one-time association display of the keyword and the creative. The number of presentations obtained over a period of time is referred to as the "presentation amount". The display quantity shows the quality measurement of the key words and the quality of the creative. For websites, the display amount is the number of times the website is triggered to display when the user searches for related keywords, the display times of the website in a period of time are collectively called as the display amount, and the display amount of one website reflects the quality of the website keywords and the quality of the website optimization.
The display amount is helpful for knowing how many netizens are covered by the popularization result, and is a concept on quantity. Through the display amount data provided by the statistical report, the display opportunities of which keywords are associated with the creative are larger, and the exposure opportunities can be brought for many times each day, so that the number of potential clients covered by popularization activities can be estimated.
If the netizen is interested in the promotion, further knowledge of the product/service is desired, and the website may be clicked to be accessed when the promotion is displayed. The number of clicks obtained over a period of time is referred to as the "click volume". In short, the click volume refers to the number of times that is clicked. The click rate is the percentage of the website clicked when the user searches, the algorithm is click rate/display quantity=click rate, and the click rate of one website reflects the title and description of the website, and reflects whether the creative of the website has attractive force to the client.
The data such as consumption, average price, clicking, display, clicking rate, thousand times of display cost and the like can be seen in various advertising promotion background, and are the basis for comprehensively evaluating promotion effects and deeply developing promotion optimization.
However, the wide enterprise popularization range does not mean that the enterprise can be oriented to an explicit target client, and only the large-scale popularization oriented to the explicit target client can be reasonably displayed, and the blind improvement of the display quantity only can improve the enterprise popularization cost. The popularization of the target crowd is to select reasonable area delivery, precisely select keywords and reasonably set a keyword matching mode. Through careful operation, the potential target clients can more accurately search the enterprise website by various related search words, and the final transaction is achieved.
And S2, constructing a plurality of alternative keywords according to a preset rule according to the given data.
The step S2 specifically includes the following:
step S21: randomly constructing a plurality of keywords according to the candidate hotness root words;
Step S22: and screening a plurality of candidate keywords from the plurality of keywords according to a preset rule.
Specifically, a plurality of candidate keywords with lengths within 5 are constructed according to a preset rule, and the preset rule can refer to 'brand word/category word+1 to 4 candidate hot roots' or 'brand word+category word+1 to 3 candidate hot roots', and the like, and in addition, the candidate hot roots cannot be repeated.
The method for generating the alternative keywords in the invention is not limited to pattern manufacturing, the purpose of the pattern manufacturing keywords is to ensure the rationality of the keywords, models can be used for modeling the rationality of the keywords independently, for example, words appearing in a historical word stock can be used as positive samples, randomly generated words can be used as negative samples for classification modeling, and the models can use common classification models of machine learning. And then screening randomly manufactured keywords by using the model, and leaving the keywords with higher rank to be input into a next index prediction model for prediction.
And S3, performing feature processing on the plurality of candidate keywords through a word2vec model to obtain the features of each candidate keyword.
Wherein, the step S3 further includes: and performing word segmentation processing on the alternative keywords by using jieba in advance.
Specifically, "jieba" word segmentation is a Python chinese word segmentation component, which can perform functions such as word segmentation, part of speech tagging, keyword extraction, etc. on a chinese text, and support a custom dictionary. Three word segmentation modes are supported: precisely segmenting words, trying to cut the sentence most precisely, and being suitable for text analysis; the full mode scans all words which can form words in sentences, so that the speed is very high, but ambiguity cannot be resolved; and the search engine mode is used for re-segmenting the long word based on the accurate mode, so that recall is improved, and the method is suitable for a search engine.
Wherein, the step S3 further includes: the word2vec model is adopted to pretrain the candidate keywords after word segmentation to obtain word vectors, and the word vector average sum of the candidate keywords is taken as the characteristic of the candidate keywords.
Specifically, in order for a computer to process natural language, it is first necessary to model the natural language. Natural language modeling methods have undergone a transition from rule-based methods to statistical-based methods. The natural language model resulting from the statistical-based modeling method is referred to as a statistical language model. In the process of modeling natural language, problems such as dimension disasters, word similarity, model generalization capability, model performance and the like can occur. The solution to the above problem is an inherent impetus to drive the development of statistical language models. In the context of research into statistical language models, *** corporation opened a software tool for training Word2vec in 2013. The Word2Vec model is a model for converting words into vector representations, and the Word2Vec model is evolved from a neural probability language model and is a typical distributed coding mode. The method has the advantages that the neural probability language model is improved, and the calculation efficiency is improved. Specifically, there are two main implementations of the Word2Vec model: continuous Bag of words Model (CBOW Model) and skip-gram Model. Word2vec can quickly and effectively express a Word into a vector form through an optimized training model according to a given corpus, and provides a new tool for application research in the field of natural language processing. Word2vec relies on skip-grams or continuous Word bags (CBOW) to establish neuropord embedding.
And S4, evaluating and sequencing the plurality of candidate keywords according to the characteristics of each candidate keyword and the given data.
The step S4 specifically includes the following:
Step S41: obtaining index expression through a prediction model according to the characteristics of the alternative keywords, the scene environment variables and the evaluation indexes;
step S42: and sequencing the plurality of candidate keywords through the evaluation model according to the index performance.
Specifically, the index representation is mainly ROI, display amount, click-through amount, or a combination thereof, or the like; the predictive model for evaluating the keyword performance may be a tree model that is more common in machine learning, but may also be evaluated by other models, such as a neural network model (the characteristics of the keyword may still use the word vector average of the root word, or the characteristic vector of the keyword may be generated by weighting the root word with an environmental variable using an Attention mechanism), other statistical models, and so on.
S5, outputting recommended keywords according to the plurality of candidate keywords which are ordered.
The step S5 further includes outputting recommended keywords according to the number of recommended keywords.
Specifically, after the keywords are ranked according to the quality of the index performance, the recommended keywords of top recommended keyword number performance are output.
Embodiment two:
In combination with the method for constructing keywords based on the historical keyword delivery data disclosed in the first embodiment, the present embodiment discloses a specific implementation example of a keyword construction system (hereinafter referred to as "system") based on the historical keyword delivery data.
Referring to fig. 2, the system includes:
An input unit that inputs given data;
a keyword construction unit for randomly constructing a plurality of keywords according to the given data;
a primary screening unit for screening a plurality of candidate keywords from the plurality of keywords according to a preset rule;
the keyword feature acquisition unit is used for carrying out feature processing on a plurality of candidate keywords through a word2vec model to acquire the features of each candidate keyword;
the secondary screening unit evaluates and sorts the plurality of candidate keywords through a prediction model and an evaluation model according to the characteristics of each candidate keyword, the environment variable and the evaluation index;
and the output unit is used for outputting recommended keywords according to the plurality of candidate keywords which are ordered.
Specifically, given data including scene environment variables, candidate hotness roots, evaluation indexes, and the number of recommended keywords is input in an input unit.
Specifically, in the keyword construction unit, a large number of keywords are randomly constructed according to the candidate hotness root words.
Specifically, in the primary screening unit, a plurality of candidate keywords are screened from the plurality of keywords according to a preset rule, and the preset rule may refer to "brand word/category word+1 to 4 candidate hot roots" or "brand word+category word+1 to 3 candidate hot roots" and the like.
Specifically, in the keyword feature acquiring unit, the candidate keywords are subjected to word segmentation in advance by using jieba, then word vectors are obtained by pre-training the candidate keywords subjected to word segmentation by adopting a word2vec model, and the average sum of the word vectors of the candidate keywords is taken as the feature of the candidate keywords.
Specifically, in the secondary screening unit, firstly, according to the characteristics of the candidate keywords, the scene environment variables and the evaluation indexes, index performances are obtained through a prediction model, and then, according to the index performances, the candidate keywords are ranked through the evaluation model.
Specifically, in the output unit, recommended keywords are output according to the recommended keyword number, that is, after the keywords are ranked according to the quality of the index performance, the recommended keywords represented by top "recommended keyword number" are output.
The technical scheme of the same part of the keyword construction system based on the historical keyword delivery data disclosed in the present embodiment and the other part of the keyword construction method based on the historical keyword delivery data disclosed in the first embodiment is described in the first embodiment, and is not repeated here.
Embodiment III:
in connection with FIG. 3, this embodiment discloses a specific implementation of a computer device. The computer device may include a processor 81 and a memory 82 storing computer program instructions.
In particular, the processor 81 may include a Central Processing Unit (CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may comprise a hard disk drive (HARD DISK DRIVE, abbreviated HDD), floppy disk drive, solid state drive (Solid STATE DRIVE, abbreviated SSD), flash memory, optical disk, magneto-optical disk, magnetic tape, or universal serial bus (Universal Serial Bus, abbreviated USB) drive, or a combination of two or more of these. The memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 82 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated EPROM), an electrically erasable PROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, abbreviated EEPROM), an electrically rewritable ROM (ELECTRICALLY ALTERABLE READ-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be a Static Random-Access Memory (SRAM) or a dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory, FPMDRAM), an extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory, EDODRAM), a synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory, SDRAM), or the like, as appropriate.
Memory 82 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 81.
The processor 81 implements any of the keyword construction methods of the above embodiments by reading and executing the computer program instructions stored in the memory 82.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 3, the processor 81, the memory 82, and the communication interface 83 are connected to each other through the bus 80 and perform communication with each other.
The communication interface 83 is used to enable communication between modules, devices, units and/or units in embodiments of the application. Communication port 83 may also enable communication with other components such as: and the external equipment, the image/data acquisition equipment, the database, the external storage, the image/data processing workstation and the like are used for data communication.
Bus 80 includes hardware, software, or both, coupling components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 80 may include a graphics acceleration interface (ACCELERATED GRAPHICS Port, abbreviated as AGP) or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) Bus, a Front Side Bus (Front Side Bus, abbreviated as FSB), a HyperTransport (abbreviated as HT) interconnect, an industry standard architecture (Industry Standard Architecture, abbreviated as ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated as MCA) Bus, a peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (SERIAL ADVANCED Technology Attachment, abbreviated as SATA) Bus, a video electronics standards Association local (Video Electronics Standards Association Local Bus, abbreviated as VLB) Bus, or other suitable Bus, or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.
In addition, in combination with the keyword construction method in the above embodiment, the embodiment of the present application may be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the keyword construction methods of the above embodiments.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
In summary, the method has the advantages that the expected display amount or click amount of the keyword structure is learned through the history data of the keywords, so that the prediction model can be utilized to evaluate and sort a large number of relatively reasonable keywords, and the keywords with good performances are screened as final results, thereby ensuring that the newly constructed keywords have good throwing performance; both constructing keywords based on rationality and evaluating keywords based on model can be modularized. Each part can adopt different models to achieve the purpose, for example, the models for evaluating the keywords can use traditional statistical methods and other commonly used machine learning models, such as neural networks, tree models and the like; the rationality evaluation can be performed by taking historically put keywords as reference training models, giving scores, performing primary screening, and then inputting the keywords into a prediction model and an evaluation model for secondary screening.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (9)

1. The keyword construction method based on historical keyword input data is characterized by comprising the following steps:
S1, acquiring given data; the given data comprises scene environment variables, candidate hotness root words, evaluation indexes and recommended keyword quantity;
s2, constructing a plurality of alternative keywords according to a preset rule according to the given data; the method for constructing the candidate keywords selects words appearing in a historical word stock as positive samples, randomly generated words are used as negative samples to conduct classification modeling, then a machine learning classification model is used for screening randomly manufactured keywords, and keywords with higher ranks are left for being input into an index prediction model in a subsequent step to conduct prediction;
S3, performing feature processing on the plurality of candidate keywords through a word2vec model to obtain features of each candidate keyword;
S4, evaluating and sequencing a plurality of candidate keywords according to the characteristics of each candidate keyword and the given data;
Step S41: obtaining index expression through a prediction model according to the characteristics of the alternative keywords, the scene environment variables and the evaluation indexes;
S5, outputting recommended keywords according to the plurality of candidate keywords which are ordered.
2. The keyword construction method according to claim 1, wherein the step S2 includes:
step S21: randomly constructing a plurality of keywords according to the candidate hotness root words;
Step S22: and screening a plurality of candidate keywords from the plurality of keywords according to a preset rule.
3. The keyword construction method according to claim 1, wherein the step S3 further comprises: and performing word segmentation processing on the alternative keywords by using jieba in advance.
4. The keyword construction method according to claim 3, wherein the step S3 further comprises: the word2vec model is adopted to pretrain the candidate keywords after word segmentation to obtain word vectors, and the word vector average sum of the candidate keywords is taken as the characteristic of the candidate keywords.
5. The keyword construction method according to claim 1, wherein the step S4 further comprises:
step S42: and sorting the plurality of candidate keywords through an evaluation model according to the index performance.
6. The keyword construction method according to claim 1, wherein the step S5 further comprises outputting recommended keywords according to the number of recommended keywords.
7. The keyword construction system based on historical keyword input data is characterized by comprising:
An input unit that inputs given data; the given data comprises scene environment variables, candidate hotness root words, evaluation indexes and recommended keyword quantity;
A keyword construction unit for randomly constructing a plurality of keywords according to the given data; the method for constructing the candidate keywords selects words appearing in a historical word stock as positive samples, randomly generated words are used as negative samples to conduct classification modeling, then a machine learning classification model is used for screening randomly manufactured keywords, and keywords with higher ranks are left for being input into an index prediction model in a subsequent step to conduct prediction;
a primary screening unit for screening a plurality of candidate keywords from the plurality of keywords according to a preset rule;
the keyword feature acquisition unit is used for carrying out feature processing on a plurality of candidate keywords through a word2vec model to acquire the features of each candidate keyword;
The secondary screening unit evaluates and sorts the plurality of candidate keywords through a prediction model and an evaluation model according to the characteristics of each candidate keyword, the environment variable and the evaluation index, wherein index expression is obtained through the prediction model according to the characteristics of the candidate keywords, the scene environment variable and the evaluation index;
and the output unit is used for outputting recommended keywords according to the plurality of candidate keywords which are ordered.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the keyword construction method of any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium having stored thereon a computer program, which when executed by a processor implements the keyword construction method as claimed in any one of claims 1 to 6.
CN202011079017.2A 2020-10-10 2020-10-10 Keyword construction method and system based on historical keyword put-in data Active CN112183069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011079017.2A CN112183069B (en) 2020-10-10 2020-10-10 Keyword construction method and system based on historical keyword put-in data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011079017.2A CN112183069B (en) 2020-10-10 2020-10-10 Keyword construction method and system based on historical keyword put-in data

Publications (2)

Publication Number Publication Date
CN112183069A CN112183069A (en) 2021-01-05
CN112183069B true CN112183069B (en) 2024-06-28

Family

ID=73947586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011079017.2A Active CN112183069B (en) 2020-10-10 2020-10-10 Keyword construction method and system based on historical keyword put-in data

Country Status (1)

Country Link
CN (1) CN112183069B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158136A (en) * 2021-04-23 2021-07-23 北京明略软件***有限公司 Keyword recommendation effect evaluation method and system, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778161A (en) * 2015-04-30 2015-07-15 车智互联(北京)科技有限公司 Keyword extracting method based on Word2Vec and Query log
CN108255881A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 It is a kind of to generate the method and device for launching keyword
CN111368171A (en) * 2020-02-27 2020-07-03 腾讯科技(深圳)有限公司 Keyword recommendation method, related device and storage medium
CN111581495A (en) * 2020-04-08 2020-08-25 西窗科技(苏州)有限公司 Keyword generation recommendation method and device based on search engine advertisement data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013185300A1 (en) * 2012-06-12 2013-12-19 Google Inc. Obtaining alternative keywords
CN108229991B (en) * 2016-12-15 2022-04-29 北京奇虎科技有限公司 Method and device for displaying aggregation promotion information, browser and terminal equipment
CN110991180A (en) * 2019-11-28 2020-04-10 同济人工智能研究院(苏州)有限公司 Command identification method based on keywords and Word2Vec

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778161A (en) * 2015-04-30 2015-07-15 车智互联(北京)科技有限公司 Keyword extracting method based on Word2Vec and Query log
CN108255881A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 It is a kind of to generate the method and device for launching keyword
CN111368171A (en) * 2020-02-27 2020-07-03 腾讯科技(深圳)有限公司 Keyword recommendation method, related device and storage medium
CN111581495A (en) * 2020-04-08 2020-08-25 西窗科技(苏州)有限公司 Keyword generation recommendation method and device based on search engine advertisement data

Also Published As

Publication number Publication date
CN112183069A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN106649818B (en) Application search intention identification method and device, application search method and server
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
WO2020108608A1 (en) Search result processing method, device, terminal, electronic device, and storage medium
CN108304439B (en) Semantic model optimization method and device, intelligent device and storage medium
CN111414479B (en) Label extraction method based on short text clustering technology
CN106709040B (en) Application search method and server
CN112163424B (en) Data labeling method, device, equipment and medium
CN112508609B (en) Crowd expansion prediction method, device, equipment and storage medium
CN110674312B (en) Method, device and medium for constructing knowledge graph and electronic equipment
CN112347778A (en) Keyword extraction method and device, terminal equipment and storage medium
CN105095210A (en) Method and apparatus for screening promotional keywords
CN113590796B (en) Training method and device for ranking model and electronic equipment
CN111444304A (en) Search ranking method and device
CN112199602B (en) Post recommendation method, recommendation platform and server
CN110910175B (en) Image generation method for travel ticket product
CN115495555A (en) Document retrieval method and system based on deep learning
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN115048505A (en) Corpus screening method and device, electronic equipment and computer readable medium
CN113806510B (en) Legal provision retrieval method, terminal equipment and computer storage medium
CN115905489A (en) Method for providing bid and bid information search service
CN116737922A (en) Tourist online comment fine granularity emotion analysis method and system
CN112183069B (en) Keyword construction method and system based on historical keyword put-in data
CN112215629A (en) Multi-target advertisement generation system and method based on construction countermeasure sample
CN108428234B (en) Interactive segmentation performance optimization method based on image segmentation result evaluation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant