CN116362236A - Target word mining method and device and storage medium - Google Patents

Target word mining method and device and storage medium Download PDF

Info

Publication number
CN116362236A
CN116362236A CN202310331970.9A CN202310331970A CN116362236A CN 116362236 A CN116362236 A CN 116362236A CN 202310331970 A CN202310331970 A CN 202310331970A CN 116362236 A CN116362236 A CN 116362236A
Authority
CN
China
Prior art keywords
commodity
word
target
words
store
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310331970.9A
Other languages
Chinese (zh)
Inventor
李志平
赵贤宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202310331970.9A priority Critical patent/CN116362236A/en
Publication of CN116362236A publication Critical patent/CN116362236A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a target word mining method and device and a storage medium, wherein the method comprises the following steps: acquiring a plurality of search words when historically searching for commodities; determining a commodity store having an associated operational behavior with the plurality of search terms; and determining the target word according to commodity information corresponding to the commodity store, the sales volume of the commodity and the page browsing volume of the commodity.

Description

Target word mining method and device and storage medium
Technical Field
The application relates to the technical field of electronic commerce, in particular to a target word mining method and device and a storage medium.
Background
In the E-commerce scene, commodity feature words refer to description words of commodity sets which have features in the market segment and meet personalized demands of the minor population. The method has the advantages that the commodity feature words can be identified to be used for a plurality of scenes such as platform marketing activity feature word selection, search recommended flow field feature word selection, feature word traction merchant supply optimization and the like, and commodity feature words are operated in the scenes, so that commodity richness and user experience are improved.
In the related technology, information such as commodity information, search words, user comments, user consultation and the like is obtained, words with high occurrence frequency are searched from the information to serve as commodity feature words of corresponding commodities, and under the condition that the number of the commodities is large, the number of the information words corresponding to the commodities is large, but the information words are not feature words of the commodities, so that accuracy in determining the commodity feature words is reduced, namely accuracy in determining target words is reduced.
Disclosure of Invention
In order to solve the technical problems, it is desirable in the embodiments of the present application to provide a target word mining method, a target word mining device, and a storage medium, which can improve accuracy in determining a target word.
The technical scheme of the application is realized as follows:
the embodiment of the application provides a target word mining method, which comprises the following steps:
acquiring a plurality of search words when historically searching for commodities;
determining a commodity store having an associated operational behavior with the plurality of search terms;
and determining the target word according to commodity information corresponding to the commodity store, sales of the commodity and page browsing quantity of the commodity.
The embodiment of the application provides a target word mining device, which comprises:
an acquisition unit configured to acquire a plurality of search terms when historically searching for a commodity;
a determining unit configured to determine a commodity store having an associated operation behavior with the plurality of search terms; and determining the target word according to commodity information corresponding to the commodity store, sales of the commodity and page browsing quantity of the commodity.
The embodiment of the application provides a target word mining device, which comprises:
the system comprises a memory, a processor and a communication bus, wherein the memory is communicated with the processor through the communication bus, the memory stores a target word mining program executable by the processor, and the target word mining method is executed by the processor when the target word mining program is executed.
The embodiment of the application provides a storage medium, on which a computer program is stored and which is applied to a target word mining device, and is characterized in that the computer program is executed by a processor to realize the target word mining method.
The embodiment of the application provides a target word mining method and device and a storage medium, wherein the target word mining method comprises the following steps: acquiring a plurality of search words when historically searching for commodities; determining a commodity store having an associated operational behavior with the plurality of search terms; and determining the target word according to commodity information corresponding to the commodity store, the sales volume of the commodity and the page browsing volume of the commodity. According to the method, the target word mining device determines the target word corresponding to the commodity by determining the commodity store with the related operation behaviors with the plurality of search words and determining the target word corresponding to the commodity according to commodity information, commodity sales and page browsing quantity of the commodity corresponding to the commodity store, namely, commodity feature words (target words) of the commodity are deeply mined by utilizing the sales and page browsing quantity of the commodity, and the commodity feature words are not required to be determined according to words with high frequency in the information corresponding to the commodity, so that accuracy in determining the commodity feature words is improved, namely, accuracy in determining the target words is improved.
Drawings
FIG. 1 is a flowchart of a target word mining method provided in an embodiment of the present application;
fig. 2 is a schematic diagram of an exemplary waist tail requirement recognition module for recognizing a requirement word according to an embodiment of the present application;
FIG. 3 is a schematic diagram of automatic mining of exemplary commodity feature words according to an embodiment of the present application;
fig. 4 is a schematic diagram of a composition structure of a target word mining device according to an embodiment of the present application;
fig. 5 is a schematic diagram of a second component structure of the target word mining apparatus according to the embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The embodiment of the application provides a target word mining method, the target word mining method is applied to a target word mining device, fig. 1 is a flowchart of the target word mining method provided by the embodiment of the application, and as shown in fig. 1, the target word mining method may include:
s101, acquiring a plurality of search words when historical search commodities.
The target word mining method is suitable for a scene of mining target words corresponding to commodities.
In the embodiments of the present application, the target word mining apparatus may be implemented in various forms. For example, the target word mining devices described in the present application may include devices such as cell phones, cameras, tablet computers, notebook computers, palm computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), navigation devices, wearable devices, smart bracelets, pedometers, and the like, as well as devices such as digital TVs, desktop computers, servers, and the like.
It should be noted that the search term is information input by the user in the commodity search box in the e-commerce shopping platform, such as lace cardigan, one-piece dress, bowknot hairpin, etc.
In the embodiment of the application, the target word mining device can acquire a plurality of search words when the historical search commodity is acquired from the database; multiple search words when the historical search commodity is obtained from the log file can also be obtained; multiple search terms when historically searching for merchandise may also be obtained in other ways; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
In the embodiment of the application, the target word mining device can acquire a plurality of search words when the history searches for the commodity under the condition that a target word mining instruction is received; multiple search words can be obtained at regular time when historical search commodities are searched; the method can also acquire a plurality of search words when the history searches for the commodity under the condition of triggering other occasions or events; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
It should be noted that, the target mining device may acquire a plurality of search terms when searching for the commodity in the history within a preset time period; the preset number of search words can also be obtained, so that a plurality of search words are obtained; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
It should be further noted that, the preset time period may be a plurality of search words when searching for the commodity in the history of 3 months with the current time point as the end time point; the method can also be a plurality of search words when searching commodities for history in 1 year with the current time point as the ending time point; the preset time period can be other time periods; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
S102, determining a commodity shop with associated operation behaviors with a plurality of search words.
In the embodiment of the application, after the target word mining device acquires the plurality of search words when the commodity is searched historically, the commodity store with the plurality of search words having the associated operation behaviors can be determined.
It should be noted that, after the user inputs the search word in the commodity search box in the e-commerce shopping platform, the e-commerce shopping platform will display the commodities related to the search word, and the user may click on some of the commodities, may purchase the commodities additionally, may purchase the commodities, and may perform other operations on the commodities.
It should be noted that, operations such as clicking, purchasing (adding to shopping cart) and purchasing some of these commodities are the operation behavior associated with the search term. When the user clicks, purchases (adds to shopping cart), purchases and the like some of the commodities, the store to which the commodity belongs is the commodity store having the associated operation behavior with the search term.
In the embodiment of the present application, the process of obtaining a plurality of search terms when searching for goods in history by the target word mining device further includes: acquiring a plurality of search frequencies of a plurality of search words; accordingly, a process for determining a commodity store having an associated operational behavior with a plurality of search terms, comprising: grouping the plurality of search words according to the distance among the plurality of search words, the plurality of search frequencies and a preset search frequency threshold value to obtain a first group and a second group; determining the requirements corresponding to the search words in the first group to obtain first requirement words; determining the requirements corresponding to the search words in the second group to obtain second requirement words; determining a first store to which the commodity corresponding to the first demand word belongs and a second store to which the commodity corresponding to the second demand word belongs; and screening the commodity stores from the first store and the second store.
It should be noted that, the plurality of search words corresponds to the plurality of search frequencies one by one, and specifically, one search word corresponds to one search frequency.
In the embodiment of the application, the target word mining device acquires a plurality of search words and simultaneously acquires a plurality of search frequencies corresponding to the plurality of search words.
In this embodiment of the present application, the preset search frequency threshold may be a threshold configured in the target word mining device, may also be a threshold that is transmitted to the target word mining device by other devices, may also be a threshold that is obtained by the target word mining device in other manners, and a manner in which the specific target word mining device obtains the preset search frequency threshold may be determined according to an actual situation, which is not limited in this embodiment of the present application.
In this embodiment of the present application, the first requirement word is a head requirement word, and the second requirement word is a waist and tail requirement word.
The number of the first requirement words is at least one, and the number of the second requirement words is at least one.
In the embodiment of the application, the search word corresponding to the first requirement word can be determined, and then the commodity store with the related operation behavior with the search word is determined, so that the first store is obtained. And determining a search word corresponding to the first demand word, and determining a commodity shop with the related operation behavior of the search word, thereby obtaining a second shop.
In the embodiment of the present application, the process of grouping a plurality of search words according to a distance between the plurality of search words, a plurality of search frequencies, and a preset search frequency threshold by a target word mining device to obtain a first group and a second group includes: grouping the plurality of search words according to the distance between the plurality of search words to obtain at least two groups of search words; determining at least two search frequencies corresponding to at least two groups of search words according to the plurality of search frequencies; and grouping at least two groups of search words according to the at least two search frequencies and a preset search frequency threshold value to obtain a first group and a second group.
In the embodiment of the present application, the distance between the plurality of search words may be a similarity between the plurality of search words; the distance between the plurality of search terms may also be determined in other ways; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
In this embodiment of the present application, in the process of determining at least two search frequencies corresponding to at least two groups of search words according to a plurality of search frequencies, the first search frequencies corresponding to each search word in the first group of search words may be obtained in the plurality of search frequencies for determining a first group of search words in the at least two groups of search words, and a sum of the first search frequencies may be determined (where the number of the first search frequencies is the same as the number of the search words in the first group of search words), so as to obtain the first search frequency; then determining a second group of search words in at least two groups of search words, and acquiring a second search frequency corresponding to each search word in the second group of search words in a plurality of search frequencies (the number of the second search frequencies is the same as that of the search words in the second group of search words), and determining the sum of the second search frequencies so as to obtain a second search frequency; and so on to at least two search frequencies.
Illustratively, the at least two sets of search terms are specifically three sets of search terms, the first set of search terms including search term 1, search term 2, and search term 3; the second set of search terms includes search term 4 and search term 5; the third set of search terms includes search term 6, search term 7, search term 8, and search term 9. Among the plurality of search frequencies, the search frequency of the search word 1 is 2 times; the search frequency of the search word 2 is 3 times; the search frequency of the search word 3 is 1 time; the search frequency of the search word 4 is 2 times; the search frequency of the search word 5 is 1 time; the search frequency of the search word 6 is 4 times; the search frequency of the search word 7 is 3 times; the search frequency of the search word 8 is 2 times; the search frequency of the search word 9 is 4 times; the first search frequency corresponding to the first group of search words is the sum of the search frequency (2 times) of the search word 1, the search frequency (3 times) of the search word 2 and the search frequency (1 time) of the search word 3, namely the first search frequency is 6 times; the second search frequency corresponding to the second group of search words is the sum of the search frequency (2 times) of the search word 4 and the search frequency (1 time) of the search word 5, namely the second search frequency is 3 times; the third search frequency corresponding to the third group of search words is the sum of the search frequency (4 times) of the search word 6, the search frequency (3 times) of the search word 7, the search frequency (2 times) of the search word 8 and the search frequency (4 times) of the search word 9, namely the third search frequency is 13 times.
In the embodiment of the application, each of at least two groups of search words is a search word expressing the same requirement.
Illustratively, any one of the at least two sets of search terms includes a primer and a primer, the two search terms belonging to a same requirement, the requirement representative term (i.e., the requirement term) being a primer; if any group of search words comprise a lace primer shirt and a primer shirt lace, the two search words also belong to the same requirement, and the requirement representative word (namely the requirement word) is the lace primer shirt; if any group of search words comprises a lady sand coat primer shirt, a lady yarn coat primer shirt and a yarn coat primer shirt, the three search words also belong to the same requirement, and the requirement representative word (namely the requirement word) is the lady sand coat primer shirt.
In this embodiment of the present application, the process of grouping at least two groups of search words according to at least two search frequencies and a preset search frequency threshold to obtain a first group and a second group may be that a target search frequency, where the search frequency is greater than or equal to the preset search frequency threshold, is obtained from at least two search frequencies, a target group search word corresponding to the target search frequency is determined from the at least two groups of search words, and the target group search word is used as the first group, and then a group other than the first group is determined from the at least two groups of search words, that is, the second group.
In an embodiment of the present application, a process for grouping a plurality of search words according to a distance between the plurality of search words by a target word mining device to obtain at least two groups of search words includes: sorting the plurality of search words according to the plurality of search frequencies to obtain a search word sequence; determining the similarity between a first word in the search word sequence and a first group of phrases in at least two groups of phrases in the aggregation area; and under the condition that the similarity is greater than or equal to a preset similarity threshold value, adding the first word into the first group of word groups until each word in the search word sequence is added into the aggregation area, so as to obtain at least two groups of search words in the aggregation area.
In this embodiment of the present application, the target word mining device includes a waist-tail requirement recognition module, where the waist-tail requirement recognition module uses a zipper type aggregation algorithm to divide a processing space into a processing area and an aggregation area, and after a plurality of search words are ordered according to a plurality of search frequencies to obtain a search word sequence, the search word sequence may be placed in the processing area, and then a similarity between a first word in the search word sequence and a first group of words in at least two groups of words in the aggregation area is determined, and a plurality of search words are placed in the aggregation area in turn according to the similarity.
It should be noted that, the first word may be a word at the first position in the search word sequence, or may be a word at the last position in the search word sequence, which may be specifically determined according to the actual situation, which is not limited in the embodiment of the present application.
It should be noted that the first group of phrases is any one of at least two groups of phrases.
In the embodiment of the present application, the manner of sorting the plurality of search words according to the plurality of search frequencies to obtain the search word sequence may be to sort the plurality of search words according to the plurality of search frequencies in order from high to low search frequencies to obtain the search word sequence.
In this embodiment of the present application, a plurality of search words may be ranked according to a plurality of search frequencies according to a sequence from low to high of the search frequencies, to obtain a search word sequence; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
Specifically, if the search words are ordered according to the search frequencies from high to low to obtain a search word sequence, the first word is the word at the first position in the search word sequence. And if the search words are sequenced according to the search frequencies from low to high to obtain a search word sequence, the first word is the word at the last position in the search word sequence.
In the embodiment of the present application, the manner of determining the similarity between the first word in the search word sequence and the first group of words in the at least two groups of words in the aggregation area may be that the similarity between the first word and the first group of words is obtained by subtracting the minimum value of the jekcard distance, the edit distance, the hamming distance, and the like from 1; the similarity between the first word and the first group of phrases may also be determined in other manners, and specifically may be determined according to practical situations, which is not limited in the embodiment of the present application.
It should be noted that, the preset similarity threshold may be a threshold configured in the target word mining device, the preset similarity threshold may also be a threshold transmitted to the target word mining device by other devices, or may be a threshold obtained in other manners in the target word mining device, and a specific manner of obtaining the preset similarity threshold may be determined according to an actual situation, which is not limited in this embodiment of the present application.
In the embodiment of the application, after the target word mining device determines the similarity between the first word in the search word sequence and the first group of phrases in the at least two groups of phrases in the aggregation area, under the condition that the similarity is smaller than a preset similarity threshold value and the similarity between the first word and other groups of phrases in the at least two groups of phrases is smaller than the preset similarity threshold value, creating a target group of phrases according to the first word; and adding the target group of phrases into at least two groups of phrases.
In the embodiments of the present application, the other groups of phrases are the rest groups of phrases except the first group of phrases in at least two groups of phrases.
It should be noted that, when the similarity is smaller than the preset similarity threshold, and the similarity between the first word and the other groups of the at least two groups of words is smaller than the preset similarity threshold, the target group of words is a word group including the first word, and specifically, is composed of words having a similarity with the first word greater than or equal to the preset similarity threshold.
Here, the search term is used as a representative form of the user's demand. If the search word is directly used as a user demand representative, dirty data exists, and the dirty data cannot be directly used, but the problems can be optimally relieved by utilizing a zipper type aggregation algorithm. Generally, the probability of searching high-frequency words is relatively low, and the words are more suitable to be used as representative words required by users. The construction mode of the zipper type aggregation algorithm in the application comprises the following steps: the process space is divided into a process zone and an aggregation zone. And sorting the plurality of search words from top to bottom according to the search frequency, and placing the search words into a processing area. 1 word a is fetched in the processing area. If no phrase exists in the aggregation area at this time, the word a is directly put into the aggregation area, and a phrase is generated. If the word group exists in the aggregation area, calculating the similarity between the word a and the word group in the aggregation area from front to back. In the case that the similarity between the word a and the group of groups of the aggregation area (e.g., the first group of groups) is higher than a specified threshold (a preset similarity threshold), the word a is incorporated into the group of groups (the first group of groups). After the similarity of all the phrases in the aggregation area is calculated, the threshold condition (namely, the similarity between the word a and any group of phrases in the aggregation area is smaller than the preset similarity threshold value) is not met, and then a new phrase is created by using the word a and is placed at the tail end of the phrases in the aggregation area. And the method reciprocates until the phrase in the treatment area is treated. I.e. the search terms with any similarity are put together as much as possible. Finally, the phrase of the polymerization zone is taken out, and each phrase is a typical requirement. In this way, search term based user demand word regularity is achieved. At this point, similar user needs are brought together and representative words of the set of user needs words can be seen. Such as: the primer shirt and the primer shirt belong to the same requirement, and the requirement represents the primer shirt; the lace primer coat and the primer coat lace belong to one requirement, and the representative word of the requirement is the lace primer coat; women's sand coat primer, women's yarn coat primer and women's yarn coat primer belong to one demand, and the demand represents words as women's sand coat primer.
After the user demand word normalization is performed on the plurality of search words by using the zipper type aggregation algorithm, a plurality of groups of phrases in the aggregation area are at least two groups of search words, and then segmentation grouping of head user demand words (first grouping) and waist tail user demand words (second grouping) is needed. The processing mode in the prior art is to directly use the first x% words of the frequency ordering of the search words under the category as head demand words, and x is a proportion parameter. This approach has a problem in that the search term concentration varies from category to category, which results in a different granularity of the required term at the head of each category. The present application deals with using the cumulative duty cycle method. And accumulating the front x% part of the total searching frequency of the phrase required by the user, wherein x is a proportion parameter, and the front part belongs to the head required word and the rear part belongs to the waist and tail required word as a cutting point. In this way, the head demand words (first grouping) and the waist demand words (second grouping) can be adaptively segmented and grouped.
As shown in fig. 2, the target word mining device may identify at least two groups of search words through the waist tail requirement identification module, that is, sort the plurality of search words according to the plurality of search frequencies to obtain a search word sequence; determining the similarity between a first word in the search word sequence and a first group of phrases in at least two groups of phrases in the aggregation area; and under the condition that the similarity is greater than or equal to a preset similarity threshold value, adding the first word into the first group of word groups until each word in the search word sequence is added into the aggregation area, so as to obtain at least two groups of search words in the aggregation area. Under the condition that the similarity is smaller than a preset similarity threshold value and the similarity between the first word and other groups of words in at least two groups of words is smaller than the preset similarity threshold value, creating a target group of words according to the first word; and adding the target group of phrases into at least two groups of phrases. Thereby determining a first demand word (head demand word) and a second demand word (waist demand word) from at least two groups of search words.
It can be understood that the waist and tail demand identification method based on the zipper type aggregation algorithm and the accumulated duty ratio distinguishing method can well group and sort the demands (search words) of users. The problem that the user search word cannot be directly used for representing the user requirement due to the fact that wrongly written characters, wrongly ordered expressions and the like exist is solved. Based on the mode of preferential aggregation in the zipper type aggregation algorithm, search words which better represent the requirements of users in the search word groups can be selected in a self-adaptive mode and used as a group of requirement representative words of the search words after aggregation. And the head, waist and tail segmentation grouping (a first grouping and a second grouping) with more consistent granularity can be performed on search words with different head effects under different classes in a systematic self-adaptive manner. By the method, the characteristic words (target words) are subjected to information mining in the view angle of requirements, and the recognition accuracy of the characteristic words is improved.
In the embodiment of the application, the number of the first stores is a plurality of the second stores; the target word excavating device screens out commodity stores from the first store and the second store, and comprises: determining a plurality of operation numbers for operating the plurality of target commodities in the target store through the first requirement word and the second requirement word, and a plurality of total operation numbers for operating the plurality of target commodities; determining a plurality of first commodity numbers in the target store, and determining a plurality of proportion parameters between the plurality of first commodity numbers and a plurality of total commodity numbers in the target store; screening a first type commodity shop and a second type commodity shop from target shops according to the operation numbers, the total operation numbers and the proportion parameters; the first type commodity store and the second type commodity store are taken as commodity stores.
It should be noted that, the plurality of operation numbers and the plurality of total operation numbers are in one-to-one correspondence, and a specific one operation number corresponds to one total operation number. The plurality of target commodities are in one-to-one correspondence with the plurality of operation numbers, and specifically, one target commodity corresponds to one operation number.
In an embodiment of the present application, the target store includes a plurality of first stores and a plurality of second stores.
The first commodity number corresponds to the total commodity number one by one, and one of the target stores corresponds to the first commodity number.
In this embodiment of the present application, the target word mining apparatus further includes a lumbar store generating module, configured to determine, according to the first requirement word and the second requirement word, a first type commodity store and a second type commodity store.
In the embodiment of the present application, the plurality of operation numbers include the operation number of click operations, the operation number of purchase operations (operations of joining shopping carts), the operation number of purchase operations, ….
It should be noted that, the first commodity includes a waist-tail commodity and a head commodity, and when the waist-tail demand ratio corresponding to a part of commodities reaches a preset threshold value, the commodity is confirmed to be the commodity required by the waist-tail, namely the first commodity; and under the condition that the head demand ratio corresponding to part of commodities reaches a preset threshold value, confirming that the commodity is the commodity with the head demand, namely the first commodity.
In this embodiment of the present application, the process of screening the first type of commodity store and the second type of commodity store from the target store according to the plurality of operation numbers, the plurality of total operation numbers, and the plurality of scale parameters may be a process of obtaining, from the plurality of operation numbers, a plurality of first store operation numbers corresponding to the first store and a plurality of second store operation numbers corresponding to the second store; acquiring a plurality of first store total operation numbers corresponding to a first store and a plurality of second store total operation numbers corresponding to a second store from the plurality of total operation numbers; acquiring a plurality of first proportional parameters corresponding to a first store and a plurality of second proportional parameters corresponding to a second store from the proportional parameters; acquiring a plurality of preset operation weights; determining a plurality of first store ratios between the plurality of first store operation numbers and a total plurality of first store operation numbers; determining a plurality of product sums among a plurality of preset operation weights, a plurality of first proportional parameters and a plurality of first shop ratios to obtain a plurality of first shops and parameters (wherein one first shop corresponds to one first shop and parameter), and acquiring a first target and parameter with a sum parameter value larger than or equal to a preset parameter threshold value from the plurality of first shops and parameters; and the store corresponding to the first object and the parameter is regarded as a first type of commodity store (head store). Determining a plurality of second store ratios between the plurality of second store operation numbers and a total plurality of second store operation numbers; determining a plurality of products among a plurality of preset operation weights, a plurality of second proportion parameters and a plurality of second shop ratios to obtain a plurality of second shops and parameters (wherein one second shop corresponds to one second shop and parameter), and acquiring second targets and parameters with sum parameter values larger than or equal to a preset parameter threshold value from the plurality of second shops and parameters; and the store corresponding to the second target and the parameter is regarded as a second type of commodity store (a waist-tail store).
It should be noted that, the plurality of preset operation weights may be weights configured in the target word mining apparatus; the plurality of preset operation weights can also be weights transmitted to the target word mining device by other equipment; the weights obtained by other modes for the target word mining device can also be obtained; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
In an embodiment of the present application, any one of the plurality of first store ratios includes a first ratio, a second ratio, and a third ratio. Wherein the first ratio is a ratio between the number of click operations in the first store and the total number of click operations in the first store; the second ratio is the ratio between the number of the purchasing operations in the first store and the total number of the purchasing operations in the first store; the third ratio is a ratio between the number of purchase operations in the first store and the total number of purchase operations in the first store.
In this embodiment of the present application, a plurality of preset operation weights, a plurality of first ratio parameters, and a plurality of third preset operation weights corresponding to a plurality of third ratio values in a plurality of first shop ratios, a plurality of fourth preset operation weights corresponding to a plurality of first ratio values in a plurality of first shop ratios, a plurality of second preset operation weights corresponding to a plurality of second ratio values in a plurality of first shop ratios, a plurality of third preset operation weights corresponding to a plurality of third ratio values in a plurality of first shop ratios, a plurality of fourth preset operation weights corresponding to a plurality of first ratio parameters, and a plurality of first product of a first ratio value in a plurality of first ratio values and a first preset operation weight in a plurality of first preset operation weights, a second product of a first ratio value in a plurality of second ratio values and a first preset operation weight in a plurality of second preset operation weights, a plurality of first ratio values in a plurality of first ratio values, a plurality of first product of first ratio values in a plurality of first ratio values and a first product of first ratio values in a plurality of first ratio values, a plurality of first product values in a plurality of first ratio values, a plurality of first product values in a plurality of first product values and a plurality of first product values, a plurality of first product values in a plurality of first product values, and a plurality of first product values, respectively, are obtained.
In an embodiment of the present application, any one of the plurality of second store ratios includes a fourth ratio, a fifth ratio, and a sixth ratio. Wherein the fourth ratio is the ratio between the number of click operations in the second store and the total number of click operations in the second store; the fifth ratio is a ratio between the number of additional purchases in the second store and the total number of additional purchases in the second store; the sixth ratio is a ratio between the number of purchase operations in the second store and the total number of purchase operations in the second store.
In this embodiment of the present application, in the process of determining a plurality of preset operation weights, a plurality of product sums between a plurality of second ratio parameters and a plurality of second shop ratios, and obtaining a plurality of second shop sum parameters, a fifth product of a first ratio of a plurality of second shop ratios and a plurality of fifth preset operation weights corresponding to a plurality of fourth ratio of a plurality of second shop ratios, a sixth product of a plurality of fifth ratio of a plurality of second shop ratios, a plurality of seventh preset operation weights corresponding to a plurality of sixth ratio of a plurality of second shop ratios, a plurality of eighth preset operation weights corresponding to a plurality of second ratio parameters, and a fifth product of a first ratio of a plurality of fourth ratio and a first preset operation weight of a plurality of fifth preset operation weights, a seventh product of a plurality of ratio of first ratio of a plurality of fifth ratio values and a plurality of preset operation weights, a seventh product of a plurality of preset ratio values and a plurality of fifth ratio values, a seventh product of a plurality of first ratio values and a plurality of fifth ratio values, a seventh product of first ratio values and a plurality of fifth ratio values, and a seventh product of fifth ratio values, and a fifth product of fifth product values are determined, and the fifth product of fifth product values.
The first stores and parameters and the second stores and parameters constitute a plurality of sum parameters.
It should be noted that, the preset parameter threshold may be a threshold configured in the target word mining apparatus; the preset parameter threshold value can also be a threshold value transmitted to the target word mining device by other equipment; the threshold value obtained by other modes for the target word mining device can also be obtained; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
In an embodiment of the present application, a process of determining, by a target word mining apparatus, a plurality of operation numbers for operating a plurality of target commodities in a target store by a first requirement word and a second requirement word, and a plurality of total operation numbers for operating the plurality of target commodities, includes: determining a plurality of first operation numbers for performing clicking operations on a plurality of first target commodities in the target store through the first requirement word and the second requirement word, and a plurality of first total operation numbers for performing clicking operations on the plurality of first target commodities; determining a plurality of second operation numbers for purchasing a plurality of second target commodities in the target store through the first demand word and the second demand word, and a plurality of second total operation numbers for purchasing the plurality of second target commodities; determining a plurality of third operation amounts for performing purchase operations on a plurality of third target commodities in the target store by the first demand word and the second demand word, and a plurality of third total operation amounts for performing purchase operations on the plurality of third target commodities; taking the plurality of first operation numbers, the plurality of second operation numbers and the plurality of third operation numbers as a plurality of operation numbers; the first total operation number, the second total operation number, and the third total operation number are set as the total operation numbers.
In the embodiment of the application, the lumbar store generating module constructs a lumbar store set based on the lumbar demand words produced by the lumbar demand recognition module. Firstly, according to the behavior link data of user searching, clicking, purchasing and the like, the waist and tail demand words of the user are mapped to specific stores (a first store to which the commodity corresponding to the first demand word belongs and a second store to which the commodity corresponding to the second demand word belongs are determined). The key indexes of the entering proportion, the purchasing proportion and the commodity quantity proportion of the waist-tail demand are 4 in detail. And weighting the 4 key indexes according to weights (a plurality of preset operation weights) to generate a comprehensive waist-tail demand proportion index (a plurality of sum parameters). The larger the index, the better the store waist tail demand will be satisfied. The top n% stores with the index forward are selected, n is a scale parameter, and the scale parameter is used as the lumbar store set. Correspondingly, in this manner, a head store set may be generated based on the head demand word. Thereby completing the production process of the waist-tail store set and the head store set.
S103, determining target words according to commodity information corresponding to commodity shops, sales of commodities and page browsing quantity of the commodities.
In the embodiment of the application, after determining the commodity store having the related operation behaviors with the plurality of search words, the target word mining device may determine the target word according to commodity information corresponding to the commodity store, sales of the commodity and page browsing amount of the commodity.
In the embodiment of the present application, the target word may be a commodity feature word (i.e. a description word of a commodity set that has features in a market segment and meets personalized requirements of the minor population), and the target word may also be another word, which may be specifically determined according to an actual situation.
In the embodiment of the present application, the target word mining device determines a target word according to commodity information corresponding to a commodity store, sales of the commodity, and page browsing of the commodity, including: acquiring commodity information of commodities in a commodity store; constructing a plurality of initial target words according to commodity information; and acquiring sales volume and page browsing volume of the commodity, and determining target words from a plurality of initial target words by utilizing the sales volume and the page browsing volume.
In this embodiment of the present application, the commodity information includes information such as a commodity title, an industrial attribute of the commodity, a marketing attribute, a selling point, a brand, a product word category, and the like, and the commodity information may also include information such as a color, an appearance, and the like of the commodity, where specific commodity information may be determined according to an actual situation, which is not limited in this embodiment of the present application.
In the embodiment of the application, the commodity information can be obtained from a commodity information base, and the commodity information can also be obtained from an electronic commerce shopping platform; the specific manner of acquiring the commodity information may be determined according to the actual situation, which is not limited in the embodiment of the present application.
The commodity information is obtained from all commodities in the commodity store.
In the embodiment of the application, the number of commodity stores is a plurality of; the number of the commodities is a plurality; the target word mining device obtains sales and page browsing amounts of commodities, and determines target words from a plurality of initial target words by using the sales and page browsing amounts, the target word mining device comprises: acquiring a plurality of sales and a plurality of page browsing amounts of a plurality of commodities corresponding to a plurality of initial target words in a plurality of commodity stores; determining a plurality of initial weights of a plurality of initial target words in a plurality of commodity stores respectively according to a plurality of sales volumes and a plurality of page browsing volumes; determining a plurality of weights corresponding to a plurality of initial target words according to the plurality of initial weights; acquiring a plurality of first weights corresponding to a first type commodity shop and a plurality of second weights corresponding to a second type commodity shop from the plurality of weights; determining a plurality of distribution deviations of a plurality of initial target words in a first type commodity store and a second type commodity store according to the first weights and the second weights; and determining a first initial word corresponding to the first distribution deviation as a target word under the condition that the first distribution deviation in the plurality of distribution deviations is greater than or equal to a preset deviation threshold value.
In the embodiment of the application, an initial target word corresponds to an initial weight in a commodity store.
In the embodiment of the application, the first-type commodity stores and the second-type commodity stores are stores screened from the commodity stores according to a plurality of search words.
In the embodiment of the present application, the first initial word is a partial word in the plurality of initial words.
It should be noted that, the preset deviation threshold may be a threshold configured in the target word mining apparatus; the preset deviation threshold value can also be a threshold value transmitted to the target word mining device by other equipment; the target word mining device can also acquire a preset deviation threshold value in other modes; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
The amount of sales of the plurality of products corresponding to the plurality of initial target words in the plurality of product stores is specifically a total amount (Gross Merchandise Volume, GMV) of transactions of the plurality of products corresponding to the plurality of initial target words in the plurality of product stores.
In the embodiment of the application, the sales and the page browsing amounts of the commodities corresponding to the initial target words in the commodity stores can be obtained from the database, and the sales and the page browsing amounts of the commodities corresponding to the initial target words in the commodity stores can be obtained in other manners; the determination may be specifically determined according to actual situations, which is not limited in the embodiment of the present application.
In an embodiment of the present application, a process of determining, by a target word mining apparatus, a plurality of distribution deviations of a plurality of initial target words in a first type of commodity store and a second type of commodity store according to a plurality of first weights and a plurality of second weights includes: acquiring a plurality of first target weights of a first initial word in a plurality of first type commodity stores from the plurality of first weights; determining the total value of a plurality of first target weights to obtain a first numerical value; determining the sum of the first weights to obtain a first parameter; acquiring a plurality of second target weights of the first initial word in a plurality of first commodity stores from a plurality of second weights; determining the total value of a plurality of second target weights to obtain a second value; determining the sum of the second weights to obtain a second parameter; and determining a first distribution deviation corresponding to the first initial word according to the first numerical value, the second numerical value, the first parameter and the second parameter until a plurality of distribution deviations are determined according to a plurality of first weights and a plurality of second weights.
In this embodiment of the present application, the determining, according to the first value, the second value, the first parameter, and the second parameter, the first distribution deviation corresponding to the first initial word may be a quotient between the first value and the first parameter, so as to obtain a first quotient; determining a quotient between the second numerical value and the second parameter to obtain a second quotient; and determining a difference between the first quotient and the second quotient to obtain a first distribution deviation.
In this embodiment of the present application, after determining the first distribution deviation corresponding to the first initial word in the plurality of initial target words, a plurality of distribution deviations of the plurality of initial target words in the first type commodity store and the second type commodity store may be determined according to the plurality of first weights and the plurality of second weights in a manner of determining the first distribution deviation corresponding to the first initial word.
In this embodiment of the present application, a manner of determining a plurality of distribution deviations of a plurality of initial target words in a first type of merchandise store and a second type of merchandise store according to a plurality of first weights and a plurality of second weights is as shown in formula 1:
Figure BDA0004158491500000171
note that i represents a keyword (initial target word) i and j represents a store (commodity store) j. k is the number of the second type commodity stores or the first type commodity stores. The set of lumbar stores (second type of commodity stores) is set as A group, the set of head stores (first type of commodity stores) is set as B group, deviation difference of word distribution of two store groups is calculated, and distribution deviation W is generated i 。N i,j,a N representing group A i,j Namely a plurality of second weights corresponding to the second type of commodity stores; n (N) i,j,b N representing group B i,j I.e., a plurality of first weights corresponding to the first type of merchandise store. Sigma (sigma) k N i,k,a Is of a second value, sigma mn N m,n,a Is the second parameter. Sigma (sigma) k N i,k,b Is of a first value, sigma mn N m,n,b Is the first parameter.
In this case, W i Words with higher indices may be considered to belong to feature words (target words).
In the embodiment of the present application, the process of determining, by the target word mining device, a plurality of weights corresponding to a plurality of initial target words according to the plurality of initial weights includes: determining a plurality of initial word frequencies of a plurality of initial target words in a plurality of commodity stores according to the plurality of initial weights; determining the number of stores of the plurality of commodity stores; determining first quantity of stores when the initial weights of each initial target word in the plurality of initial target words are greater than preset weights in the plurality of commodity stores, and obtaining a plurality of first quantities; determining a plurality of word frequencies corresponding to a plurality of initial target words according to the number of stores and a plurality of first numbers; a plurality of weights is determined based on the plurality of initial word frequencies and the plurality of word frequencies.
It should be noted that an initial target word corresponds to an initial word frequency in a commodity store.
It should be noted that, the plurality of initial weights are in one-to-one correspondence with the plurality of initial word frequencies, and a specific one of the initial weights corresponds to one of the initial word frequencies. The plurality of initial target words corresponds to the plurality of first numbers one-to-one.
In this embodiment of the present application, in the process of determining, according to the multiple initial weights, multiple initial word frequencies of multiple initial target words in multiple commodity stores, the first initial weight corresponding to a first initial target word in a first commodity store may be obtained from the multiple initial weights, then a sum of a first set of initial weights corresponding to the multiple initial target words in the first commodity store is determined, and finally a ratio of the first initial weight to the sum of the first set of initial weights is used as the initial word frequency of the first initial target word in the first commodity store; firstly, obtaining a second initial weight corresponding to a second initial target word in a first commodity store from a plurality of initial weights, then determining a sum of a second group of initial weights corresponding to the plurality of initial target words in the first commodity store, and finally taking the ratio of the second initial weight to the sum of the second initial weights as the initial word frequency of the second initial target word in the first commodity store; …; firstly, acquiring the last initial weight of the last initial target word corresponding to the first commodity store from a plurality of initial weights, then determining the sum of the last initial weights of the plurality of initial target words corresponding to the first commodity store, and finally taking the ratio of the last initial weight to the sum of the last initial weights as the initial word frequency of the last initial target word in the first commodity store; firstly, acquiring a first initial store weight corresponding to a first initial target word in a second commodity store from a plurality of initial weights, then determining a sum of a first group of initial store weights corresponding to the plurality of initial target words in the second commodity store, and finally taking a ratio of the first initial store weight to the sum of the first group of initial store weights as an initial word frequency of the first initial target word in the second commodity store; …; firstly, acquiring the weight of a last initial store corresponding to a first initial target word in the last commodity store from a plurality of initial weights, then determining the sum of the weights of a last initial store corresponding to the plurality of initial target words in the last commodity store, and finally taking the ratio of the weight of the last initial store to the sum of the weights of the last initial store as the initial word frequency of the first initial target word in the last commodity store; thereby obtaining a plurality of initial word frequencies.
In the embodiment of the present application, a manner of determining a plurality of initial word frequencies of a plurality of initial target words in a plurality of commodity stores according to a plurality of initial weights is as shown in formula 2:
Figure BDA0004158491500000181
note that i represents a keyword (initial target word) i and j represents a store (commodity store) j. P is p i,j Weights for each word (initial target word) on the corresponding matched items. k is the number of initial target words, PTF i,j And (5) the initial word frequency corresponding to each initial target word is obtained.
It should be noted that, the preset weight may be a weight configured in the target word mining apparatus; the preset weight can also be the weight transmitted to the target word mining device by other equipment; the preset weight can also be the weight obtained by the target word mining device in other modes; the specific manner of obtaining the preset weight may be determined according to the actual situation, which is not limited in the embodiment of the present application.
In this embodiment of the present application, the process of determining the first number of stores when the initial weights of each initial target word in the plurality of initial target words in the plurality of commodity stores are greater than the preset weights to obtain the first number may be a process of obtaining the corresponding initial weights of each initial target word in the plurality of commodity stores, and obtaining the first number greater than or equal to the preset weights from the initial weights, so as to obtain each first number corresponding to each initial target word, that is, obtain the first number.
It should be noted that, one initial target word corresponds to a plurality of initial commodity weights in a plurality of commodity stores, and a plurality of initial target words corresponds to a plurality of sets of initial commodity weights, that is, a plurality of initial weights, in a plurality of commodity stores.
In this embodiment of the present application, the determining, according to the number of stores and the first number, the plurality of word frequencies corresponding to the plurality of initial target words may be a determining of a sum number between the first number and a preset number, and then determining a ratio between the number of stores and the number of stores, so as to obtain the plurality of word frequencies.
It should be noted that, the preset value may be 1, or may be another value, and the specific preset value may be determined according to an actual situation, which is not limited in this embodiment of the present application.
It should be noted that a first number corresponds to an initial target word.
In this embodiment of the present application, according to the number of stores and the first number, a manner of determining a plurality of word frequencies corresponding to a plurality of initial target words is as shown in formula 3:
Figure BDA0004158491500000191
note that |s| represents the total store number (store number). Alpha is a threshold parameter (preset weight). I { j: p i,j >Alpha } | represents the weight p of keyword i on store j i,j Store quantity greater than alpha (first quantity), PIDF i And (5) the weighted reverse text frequency index of each initial target word is the word frequency of the initial target word.
In the embodiment of the present application, the determining the plurality of weights according to the plurality of initial word frequencies and the plurality of word frequencies may be a method of determining products of the plurality of initial word frequencies and the plurality of word frequencies, so as to obtain the plurality of weights.
In the embodiment of the present application, the manner of determining the weights according to the initial word frequencies and the word frequencies is as shown in formula 4:
PTFIDF i,j =PTF i,j *PIDF i (equation 4)
PTFIDF is used for the purpose of illustration i,j Weighted TF-IDF for each of a plurality of initial target words.
In an embodiment of the present application, a process for obtaining, by a target word mining device, a plurality of first weights corresponding to a first type of commodity store and a plurality of second weights corresponding to a second type of commodity store from a plurality of weights includes: grouping the weights according to a plurality of initial target words contained in a plurality of commodity stores to obtain a plurality of groups of initial weights; respectively carrying out normalization processing on a plurality of groups of initial weights to obtain a plurality of groups of weights; and acquiring a plurality of first weights corresponding to the first type commodity stores and a plurality of second weights corresponding to the second type commodity stores from the plurality of groups of weights.
The commodity stores are in one-to-one correspondence with the multiple sets of weights.
In the embodiment of the present application, normalization processing is performed on multiple sets of initial weights, so as to obtain a manner of multiple sets of weights, as shown in formula 5:
Figure BDA0004158491500000201
note that i represents a keyword (initial target word) i and j represents a store (commodity store) j. N (N) i,j Is the weight of the initial target word i in the commodity store j. PTFIDF i,j Weighted TF-IDF in store j for initial target word i. k is the number of initial target words.
In the embodiment of the present application, a process for acquiring, from a plurality of sets of weights, a plurality of first weights corresponding to a first type of commodity store and a plurality of second weights corresponding to a second type of commodity store includes: acquiring a first store weight corresponding to a first initial target word in a plurality of initial target words in a first type of commodity store and a second store weight corresponding to the first initial target word in a second type of commodity store from a plurality of groups of weights; then, obtaining a third store weight corresponding to a second initial target word in the plurality of initial target words in the first commodity store and a fourth store weight corresponding to the first initial target word in the second commodity store; …; and finally, obtaining the corresponding penultimate store weight of the last initial target word in the first type commodity stores and the corresponding penultimate store weight of the last initial target word in the second type commodity stores, thereby obtaining a plurality of first weights corresponding to the first type commodity stores and a plurality of second weights corresponding to the second type commodity stores.
In the embodiment of the present application, the process of determining, by the target word mining apparatus, a plurality of initial weights of a plurality of initial target words in a plurality of commodity stores according to a plurality of sales volumes and a plurality of page browsing volumes, includes: acquiring a first sales amount of a first initial word in a plurality of initial target words in a first commodity store in a plurality of commodity stores from a plurality of sales amounts; acquiring a first browsing amount of a first initial word in a first commodity store from a plurality of browsing amounts; determining a target quantity of goods including a first initial word in a first goods store; and determining a first weight of the first initial word in the first commodity store according to the first sales volume, the first browsing volume and the target number until a plurality of initial weights of a plurality of initial target words in a plurality of commodity stores are respectively determined according to the plurality of sales volumes and the plurality of page browsing volumes.
In this embodiment of the present application, the determining, according to the first sales volume, the first browsing volume, and the target number, the first weight of the first initial word in the first commodity store may be a method of obtaining a first weight adjustment coefficient corresponding to the first browsing volume and a second weight adjustment coefficient corresponding to the target number; determining a first product between the first browsing amount and the first weight adjustment coefficient, determining a second product between the target amount and the second weight adjustment coefficient, and determining a sum of the first product, the second product and the first sales amount, thereby obtaining a first weight.
In this embodiment of the present application, the manner of determining the first weight of the first initial word in the first commodity store according to the first sales volume, the first browsing volume and the target volume is as shown in formula 6:
p i,j =logG i,j +λlogV i,j +μlogC i,j (equation 6)
Note that i represents an initial target word and j represents a commodity store. G i,j The total GMV representing the merchandise in store j that contains keyword i. V (V) i,j Total Page View (PV) representing the merchandise in store j containing keyword i. C (C) i,j Representing the total number of items in store j that contain keyword i. λ and μ are weighting coefficients. P is p i,j Weights for each word on the corresponding matched items.
In this embodiment of the present application, the target word mining apparatus further includes a feature word recognition module, configured to perform feature word refinement recognition, that is, determine a target word from the initial target word. A plurality of feature words (initial target words) extracted in coarse granularity exist under each commodity store, but a plurality of words are only the commodity in the store appears the word and do not belong to the feature words, and the feature words need to be further selected. The method combines the characteristic that characteristic words (target words) in the lumbar store are inclined on GMV and PV, designs a tf-idf algorithm with weight, and can better distinguish the difference of commodity sales and flow when the characteristic words are extracted. Furthermore, through the bias analysis of the distribution of the characteristic words of the lumbar store and the distribution of the characteristic words of the head store, the characteristic words with more obvious distribution of the lumbar store are preferably selected, and the head word bias is eliminated. Thereby realizing the fine recognition of the characteristic words.
It can be understood that the characteristic word recognition weighted TF-IDF algorithm module realizes the characteristic word distribution bias analysis module of the head store and the waist and tail store on the basis. The frequency processing of the feature words is optimized to be comprehensive processing based on sales, flow and frequency. The module can effectively identify the characteristic words (target words) with a certain sales volume and flow in the store, and is different from related words without any expression. The occurrence of only cold words and biased words with a large commodity number is avoided. And the bias of the lumbar store on the feature words is identified based on the condition of the supply side by the distribution bias analysis module of the lumbar store feature words. The problem that in the past, supply side signal analysis cannot be performed deeply by only carrying out supply side quantity statistical analysis is solved. According to the method and the device, the information of the feature words on the view angle of the supply side is distributed and biased to deep mining, and the recognition accuracy of the feature words is improved.
In the embodiment of the present application, the target word mining device constructs a plurality of initial target words according to commodity information, including: acquiring commodity title information from commodity information; dividing the commodity title information into a plurality of words to obtain a plurality of divided words; and combining the rest information except the commodity title information in the commodity information with a plurality of segmentation words according to a preset combination mode to obtain a plurality of initial target words.
In the embodiment of the application, the method for word segmentation is performed on the commodity title information to obtain a plurality of word segments, namely the method for word segmentation is performed on the commodity title information by using a barker word segmentation tool to obtain a plurality of word segments; other word segmentation modes can be adopted to segment the commodity title information to obtain a plurality of word segments; the specific word segmentation is performed on the commodity title information, and the manner of obtaining a plurality of word segments can be determined according to actual conditions, which is not limited in the embodiment of the present application.
The preset combination mode may be a combination mode configured in the target word mining device, or may be a combination mode transmitted to the target word mining device by other devices, and a mode that the specific target word mining device obtains the preset combination mode may be determined according to actual situations, which is not limited in the embodiment of the present application.
For example, if the commodity information includes information such as a commodity title, an industrial attribute of the commodity, a marketing attribute, a selling point, a brand, a product word category, etc., the preset combination mode (word forming strategy) may be a commodity style+a product word, an attribute (industrial attribute of the commodity, marketing attribute) +a product word, a brand+a product word, etc.
In this embodiment of the present application, the method of combining the remaining information except the title information of the commodity with the plurality of word segments according to the preset combination manner to obtain the plurality of initial target words may be a method of combining the plurality of word segments with the remaining information in pairs to obtain the plurality of initial target words.
The number of pieces of remaining information is plural.
Illustratively, the combined rough feature words (initial target words) are a netting, a striped, half-collar, black, etc.
In the embodiment of the application, the target word mining device further comprises a feature word extraction module, wherein the feature word extraction module is used for acquiring commodity information of commodities in a commodity store and constructing a plurality of initial target words according to the commodity information.
It can be appreciated that the present application converts the mining problem of commodity feature words (target words) into a lumbar user demand and lumbar store supply differential distribution identification problem. The waist-tail user demand set (first demand word and second demand word) and the waist-tail store set (first commodity store and second commodity store) are explicitly mined through the waist-tail demand identification module and the waist-tail store generation module. And extracting and distributing the commodity keywords of the waist store and the head store differently through a characteristic word extracting and identifying module, and finally mining a carefully selected characteristic word (target word) set.
Exemplary, as shown in fig. 3: the target word mining device specifically comprises a waist tail requirement recognition (waist tail requirement recognition module), a waist tail store generation (waist tail store generation module), a characteristic word extraction (characteristic word extraction module) and a characteristic word recognition (characteristic word recognition module). And processing a plurality of search words in the process of searching commodities in history by utilizing the four parts of waist tail requirement recognition, waist tail store generation, characteristic word extraction and characteristic word recognition in sequence, so that characteristic words (target words) can be obtained. The method comprises the steps of sequentially processing a plurality of search words through a zipper type aggregation algorithm and an accumulated duty ratio segmentation method to obtain waist-tail demand words (second demand words) and first demand words, sequentially processing the first demand words and the second demand words through store waist-tail demand mapping and store waist-tail demand distribution to obtain waist-tail stores (second type commodity stores) and first type commodity stores, obtaining commodity information of commodities in the second type commodity stores and the first type commodity stores, and segmenting commodity title information in the commodity information to obtain a plurality of segmentation words; combining the rest information except the commodity title information in the commodity information with a plurality of word segmentation according to a word forming strategy, and performing feature word roughing to obtain a plurality of initial target words; and sequentially processing a plurality of initial target words by using a weighted TFIDF algorithm and a store distribution comparison method, and carefully selecting the characteristic words so as to obtain the characteristic words (target words).
It can be understood that the target word mining device determines the target word corresponding to the commodity by determining the commodity store with the related operation behaviors of the plurality of search words and determining the target word corresponding to the commodity according to commodity information, commodity sales and page browsing quantity of the commodity corresponding to the commodity store, namely, the commodity feature word (target word) of the commodity is deeply mined by utilizing the sales and page browsing quantity of the commodity, and the commodity feature word is not required to be determined according to the word with high frequency in the information corresponding to the commodity, so that the accuracy in determining the commodity feature word is improved, namely, the accuracy in determining the target word is improved.
Based on the same inventive concept as the above-described target word mining method, the present embodiment provides a target word mining apparatus 1, corresponding to a target word mining method; fig. 4 is a schematic diagram of a composition structure of a target word mining device provided in an embodiment of the present application, where the target word mining device 1 may include:
an acquisition unit 11 for acquiring a plurality of search terms at the time of historically searching for a commodity;
a determining unit 12 for determining a commodity store having an associated operation behavior with the plurality of search words; and determining the target word according to commodity information corresponding to the commodity store, sales of the commodity and page browsing quantity of the commodity.
In some embodiments of the present application, the apparatus further comprises a setup unit;
the acquiring unit 11 is configured to acquire commodity information of a commodity in the commodity store; acquiring sales volume and page browsing volume of the commodity;
the establishing unit is used for establishing a plurality of initial target words according to the commodity information;
the determining unit 12 is configured to determine the target word from the plurality of initial target words by using the sales amount and the page browsing amount.
In some embodiments of the present application, the number of commodity stores is a plurality; the number of the commodities is a plurality;
the acquiring unit 11 is configured to acquire a plurality of sales amounts and a plurality of page browsing amounts of a plurality of commodities corresponding to the plurality of initial target words in a plurality of commodity stores; acquiring a plurality of first weights corresponding to a first type of commodity store and a plurality of second weights corresponding to a second type of commodity store from the plurality of weights; the first type commodity stores and the second type commodity stores are stores which are obtained by screening from the commodity stores according to the search words;
the determining unit 12 is configured to determine a plurality of initial weights of the plurality of initial target words in the plurality of commodity stores according to the plurality of sales volumes and the plurality of page browsing volumes, respectively; an initial target word corresponds to an initial weight in a commodity store; determining a plurality of weights corresponding to the plurality of initial target words according to the plurality of initial weights; determining a plurality of distribution deviations of the initial target words in the first type commodity stores and the second type commodity stores according to the first weights and the second weights; determining a first initial word corresponding to a first distribution deviation as the target word under the condition that the first distribution deviation in the plurality of distribution deviations is larger than or equal to a preset deviation threshold; the first initial word is a partial word of the plurality of initial words.
In some embodiments of the present application, the obtaining unit 11 is configured to obtain, from the plurality of first weights, a plurality of first target weights of a first initial word of the plurality of initial target words in a plurality of first type commodity stores; acquiring a plurality of second target weights of the first initial word in a plurality of first-class commodity stores from the plurality of second weights;
the determining unit 12 is configured to determine a total value of the plurality of first target weights, to obtain a first numerical value; determining the sum of the first weights to obtain a first parameter; determining the total value of the second target weights to obtain a second value; determining the sum of the second weights to obtain a second parameter; and determining a first distribution deviation corresponding to the first initial word according to the first numerical value, the second numerical value, the first parameter and the second parameter until the distribution deviations are determined according to the first weights and the second weights.
In some embodiments of the present application, the determining unit 12 is configured to determine, according to the plurality of initial weights, a plurality of initial word frequencies of the plurality of initial target words in the plurality of commodity stores, where one initial target word corresponds to one initial word frequency in one commodity store; determining a store number of the plurality of commodity stores; determining first quantity of stores when the initial weights of each initial target word in the plurality of commodity stores are greater than preset weights, so as to obtain a plurality of first quantities; the initial target words are in one-to-one correspondence with the first quantity; determining a plurality of word frequencies corresponding to the plurality of initial target words according to the store quantity and the plurality of first quantities; and determining the weights according to the initial word frequencies and the word frequencies.
In some embodiments of the present application, the apparatus further includes a grouping unit, a processing unit;
the grouping unit is used for grouping the weights according to the initial target words contained in the commodity stores to obtain a plurality of groups of initial weights; the commodity stores are in one-to-one correspondence with the multiple groups of weights;
the processing unit is used for respectively carrying out normalization processing on the plurality of groups of initial weights to obtain a plurality of groups of weights;
the acquiring unit 11 is configured to acquire, from the multiple sets of weights, the multiple first weights corresponding to the first type of commodity stores and the multiple second weights corresponding to the second type of commodity stores.
In some embodiments of the present application, the obtaining unit 11 is configured to obtain, from the plurality of sales volumes, a first sales volume of a first initial word of the plurality of initial target words in a first commodity store of the plurality of commodity stores; acquiring a first browsing amount of the first initial word in the first commodity store from the plurality of browsing amounts;
the determining unit 12 is configured to determine, in the first commodity store, a target number of commodities including the first initial word; and determining a first weight of the first initial word in the first commodity store according to the first sales volume, the first browsing volume and the target volume until a plurality of initial weights of the initial target words in the commodity stores are respectively determined according to the sales volumes and the page browsing volumes.
In some embodiments of the present application, the apparatus further includes a word segmentation unit and a combination unit;
the acquiring unit 11 is configured to acquire commodity title information from the commodity information;
the word segmentation unit is used for segmenting the commodity title information to obtain a plurality of segmented words;
the combination unit is used for combining the rest information except the commodity title information in the commodity information with the plurality of segmented words according to a preset combination mode to obtain the plurality of initial target words.
In some embodiments of the present application, the obtaining unit 11 is configured to obtain a plurality of search frequencies of the plurality of search terms;
correspondingly, the device also comprises a screening unit;
the grouping unit is used for grouping the plurality of search words according to the distance among the plurality of search words, the plurality of search frequencies and a preset search frequency threshold value to obtain a first grouping and a second grouping;
the determining unit 12 is configured to determine a requirement corresponding to the search term in the first group, so as to obtain a first requirement term; determining the requirements corresponding to the search words in the second group to obtain second requirement words; determining a first store to which the commodity corresponding to the first demand word belongs and a second store to which the commodity corresponding to the second demand word belongs;
The screening unit is used for screening the commodity shops from the first shops and the second shops.
In some embodiments of the present application, the grouping unit is configured to group the plurality of search terms according to distances between the plurality of search terms, to obtain at least two groups of search terms; grouping the at least two groups of search words according to the at least two search frequencies and a preset search frequency threshold value to obtain a first group and a second group;
the determining unit 12 is configured to determine at least two search frequencies corresponding to the at least two groups of search terms according to the plurality of search frequencies.
In some embodiments of the present application, the apparatus further comprises a sorting unit and an adding unit;
the ordering unit is used for ordering the plurality of search words according to the plurality of search frequencies to obtain a search word sequence;
the determining unit 12 is configured to determine a similarity between a first word in the search word sequence and a first group of at least two groups of phrases in the aggregation area;
the adding unit is configured to add the first word to the first group of words until each word in the search word sequence is added to the aggregation area to obtain the at least two groups of search words in the aggregation area when the similarity is greater than or equal to a preset similarity threshold.
In some embodiments of the present application, the establishing unit is configured to establish a target group phrase according to the first word when the similarity is smaller than the preset similarity threshold and the similarity between the first word and other groups of at least two groups of phrases is smaller than the preset similarity threshold;
the adding unit is used for adding the target group word group into the at least two groups of word groups.
In some embodiments of the present application, the determining unit 12 is configured to determine a plurality of operation amounts for operating the plurality of target commodities in the target store by the first requirement word and the second requirement word, and a plurality of total operation amounts for operating the plurality of target commodities; the plurality of operation numbers and the plurality of total operation numbers are in one-to-one correspondence; the plurality of target commodities are in one-to-one correspondence with the plurality of operation quantities; the target store includes the plurality of first stores and the plurality of second stores; determining a plurality of first commodity numbers in the target store, and determining a plurality of proportion parameters between the plurality of first commodity numbers and a plurality of total commodity numbers in the target store; the first commodity numbers correspond to the total commodity numbers one by one, and one of the target stores corresponds to one first commodity number;
The screening unit is used for screening a first type commodity shop and a second type commodity shop from the target shops according to the operation numbers, the total operation numbers and the proportion parameters; and taking the first type commodity store and the second type commodity store as the commodity stores.
In some embodiments of the present application, the determining unit 12 is configured to determine a plurality of first operation amounts of performing clicking operations on a plurality of first target commodities in a target store by using the first requirement word and the second requirement word, and a plurality of first total operation amounts of performing clicking operations on the plurality of first target commodities; determining a plurality of second operation numbers for performing additional purchase operation on a plurality of second target commodities in a target store through the first requirement word and the second requirement word, and a plurality of second total operation numbers for performing additional purchase operation on the plurality of second target commodities; determining a plurality of third operation amounts for performing purchase operations on a plurality of third target commodities in a target store by the first demand word and the second demand word, and a plurality of third total operation amounts for performing purchase operations on the plurality of third target commodities; taking the plurality of first operation numbers, the plurality of second operation numbers and the plurality of third operation numbers as the plurality of operation numbers; the first total operation number, the second total operation number, and the third total operation number are set as the total operation numbers.
It should be noted that, in practical applications, the acquiring unit 11 and the determining unit 12 may be implemented by the processor 13 on the target word mining device 1, specifically, a CPU (Central Processing Unit ), an MPU (Microprocessor Unit, microprocessor), a DSP (Digital Signal Processing, digital signal processor), a field programmable gate array (FPGA, field Programmable Gate Array), or the like; the above data storage may be implemented by the memory 14 on the target word mining apparatus 1.
The embodiment of the application also provides a target word mining device 1, as shown in fig. 5, where the target word mining device 1 includes: a processor 13, a memory 14 and a communication bus 15, said memory 14 being in communication with said processor 13 via said communication bus 15, said memory 14 storing a program executable by said processor 13, said program, when executed, performing a target word mining method as described above by said processor 13.
In practical applications, the Memory 14 may be a volatile Memory (RAM), such as a Random-Access Memory (RAM); or a nonvolatile Memory (non-volatile Memory), such as a Read-Only Memory (ROM), a flash Memory (flash Memory), a Hard Disk (HDD) or a Solid State Drive (SSD); or a combination of memories of the above kind and providing instructions and data to the processor 13.
The present embodiment provides a computer-readable storage medium having thereon a computer program which, when executed by the processor 13, implements the target word mining method as described above.
It can be understood that the target word mining device determines the target word corresponding to the commodity by determining the commodity store with the related operation behaviors of the plurality of search words and determining the target word corresponding to the commodity according to commodity information, commodity sales and page browsing quantity of the commodity corresponding to the commodity store, namely, the commodity feature word (target word) of the commodity is deeply mined by utilizing the sales and page browsing quantity of the commodity, and the commodity feature word is not required to be determined according to the word with high frequency in the information corresponding to the commodity, so that the accuracy in determining the commodity feature word is improved, namely, the accuracy in determining the target word is improved.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application.

Claims (17)

1. A method of target word mining, the method comprising:
acquiring a plurality of search words when historically searching for commodities;
determining a commodity store having an associated operational behavior with the plurality of search terms;
and determining the target word according to commodity information corresponding to the commodity store, sales of the commodity and page browsing quantity of the commodity.
2. The method of claim 1, wherein the determining the target word according to the commodity information corresponding to the commodity store, the sales amount of the commodity, and the page view amount of the commodity comprises:
acquiring commodity information of commodities in the commodity store;
Constructing a plurality of initial target words according to the commodity information;
and acquiring sales volume and page browsing volume of the commodity, and determining the target words from the plurality of initial target words by utilizing the sales volume and the page browsing volume.
3. The method of claim 2, wherein the number of commodity stores is a plurality; the number of the commodities is a plurality; the obtaining the sales volume and the page browsing volume of the commodity, and determining the target word from the plurality of initial target words by using the sales volume and the page browsing volume comprises the following steps:
acquiring a plurality of sales volumes and a plurality of page browsing volumes of a plurality of commodities corresponding to the plurality of initial target words in a plurality of commodity stores; determining a plurality of initial weights of the initial target words in the commodity stores respectively according to the sales volumes and the page browsing volumes; an initial target word corresponds to an initial weight in a commodity store;
determining a plurality of weights corresponding to the plurality of initial target words according to the plurality of initial weights;
acquiring a plurality of first weights corresponding to a first type of commodity store and a plurality of second weights corresponding to a second type of commodity store from the plurality of weights; the first type commodity stores and the second type commodity stores are stores which are obtained by screening from the commodity stores according to the search words;
Determining a plurality of distribution deviations of the initial target words in the first type commodity stores and the second type commodity stores according to the first weights and the second weights;
determining a first initial word corresponding to a first distribution deviation as the target word under the condition that the first distribution deviation in the plurality of distribution deviations is larger than or equal to a preset deviation threshold; the first initial word is a partial word of the plurality of initial words.
4. The method of claim 3, wherein determining a plurality of distribution deviations of the plurality of initial target words in the first type of merchandise store and the second type of merchandise store based on the plurality of first weights and the plurality of second weights comprises:
acquiring a plurality of first target weights of a first initial word in the plurality of initial target words in a plurality of first type commodity stores from the plurality of first weights; determining the total value of the first target weights to obtain a first numerical value;
determining the sum of the first weights to obtain a first parameter;
acquiring a plurality of second target weights of the first initial word in a plurality of first-class commodity stores from the plurality of second weights; determining the total value of the second target weights to obtain a second value;
Determining the sum of the second weights to obtain a second parameter;
and determining a first distribution deviation corresponding to the first initial word according to the first numerical value, the second numerical value, the first parameter and the second parameter until the distribution deviations are determined according to the first weights and the second weights.
5. The method of claim 3, wherein the determining a plurality of weights corresponding to the plurality of initial target words from the plurality of initial weights comprises:
determining a plurality of initial word frequencies of the initial target words in the commodity stores according to the initial weights, wherein one initial word frequency corresponds to one initial word frequency in one commodity store;
determining a store number of the plurality of commodity stores; determining first quantity of stores when the initial weights of each initial target word in the plurality of commodity stores are greater than preset weights, so as to obtain a plurality of first quantities; the initial target words are in one-to-one correspondence with the first quantity;
determining a plurality of word frequencies corresponding to the plurality of initial target words according to the store quantity and the plurality of first quantities;
And determining the weights according to the initial word frequencies and the word frequencies.
6. The method of claim 3, wherein the obtaining, from the plurality of weights, a plurality of first weights corresponding to a first type of merchandise store and a plurality of second weights corresponding to a second type of merchandise store comprises:
grouping the weights according to the initial target words contained in the commodity stores to obtain a plurality of groups of initial weights; the commodity stores are in one-to-one correspondence with the multiple groups of weights;
respectively carrying out normalization processing on the plurality of groups of initial weights to obtain a plurality of groups of weights;
and acquiring the first weights corresponding to the first type commodity stores and the second weights corresponding to the second type commodity stores from the plurality of groups of weights.
7. The method of claim 3, wherein the determining a plurality of initial weights of the plurality of initial target words in the plurality of merchandise stores based on the plurality of sales volumes and the plurality of page views, respectively, comprises:
acquiring a first sales volume of a first initial word in the plurality of initial target words in a first commodity store in the plurality of commodity stores from the plurality of sales volumes;
Acquiring a first browsing amount of the first initial word in the first commodity store from the plurality of browsing amounts;
determining a target quantity of goods including the first initial word in the first goods store;
and determining a first weight of the first initial word in the first commodity store according to the first sales volume, the first browsing volume and the target volume until a plurality of initial weights of the initial target words in the commodity stores are respectively determined according to the sales volumes and the page browsing volumes.
8. The method of claim 2, wherein constructing a plurality of initial target words from the merchandise information comprises:
acquiring commodity title information from the commodity information;
the commodity title information is segmented to obtain a plurality of segmented words;
and combining the rest information except the commodity title information in the commodity information with the plurality of segmented words according to a preset combination mode to obtain the plurality of initial target words.
9. The method of claim 1, wherein the obtaining the plurality of search terms in searching for the item historically further comprises:
Acquiring a plurality of search frequencies of the plurality of search words;
accordingly, the determining a commodity store having an associated operational behavior with the plurality of search terms includes:
grouping the plurality of search words according to the distances among the plurality of search words, the plurality of search frequencies and a preset search frequency threshold value to obtain a first group and a second group;
determining the requirements corresponding to the search words in the first group to obtain first requirement words; determining the requirements corresponding to the search words in the second group to obtain second requirement words;
determining a first store to which the commodity corresponding to the first demand word belongs and a second store to which the commodity corresponding to the second demand word belongs;
and screening the commodity stores from the first store and the second store.
10. The method of claim 9, wherein grouping the plurality of search terms according to the distance between the plurality of search terms, the plurality of search frequencies, and a preset search frequency threshold value, results in a first group and a second group, comprising:
grouping the plurality of search words according to the distance between the plurality of search words to obtain at least two groups of search words;
Determining at least two search frequencies corresponding to the at least two groups of search words according to the plurality of search frequencies;
grouping the at least two groups of search words according to the at least two search frequencies and a preset search frequency threshold value to obtain a first group and a second group.
11. The method of claim 10, wherein grouping the plurality of search terms according to the distance between the plurality of search terms results in at least two groups of search terms, comprising:
sorting the plurality of search words according to the plurality of search frequencies to obtain a search word sequence;
determining the similarity between a first word in the search word sequence and a first group of phrases in at least two groups of phrases in an aggregation area; and under the condition that the similarity is greater than or equal to a preset similarity threshold value, adding the first word into the first group of words until each word in the search word sequence is added into the aggregation area, so as to obtain the at least two groups of search words in the aggregation area.
12. The method of claim 11, wherein after determining the similarity between the first word in the sequence of search words and the first group of at least two groups of phrases in the aggregation area, the method further comprises:
Under the condition that the similarity is smaller than the preset similarity threshold value and the similarity between the first word and other groups of at least two groups of groups is smaller than the preset similarity threshold value, creating a target group of groups according to the first word;
and adding the target group phrase into the at least two groups of phrases.
13. The method of claim 9, wherein the number of first stores is a plurality and the number of second stores is a plurality; the screening the commodity shop from the first shop and the second shop includes:
determining a plurality of operation numbers for operating a plurality of target commodities in a target store through the first requirement word and the second requirement word, and a plurality of total operation numbers for operating a plurality of target commodities; the plurality of operation numbers and the plurality of total operation numbers are in one-to-one correspondence; the plurality of target commodities are in one-to-one correspondence with the plurality of operation quantities; the target store includes the plurality of first stores and the plurality of second stores;
determining a plurality of first commodity numbers in the target store, and determining a plurality of proportion parameters between the plurality of first commodity numbers and a plurality of total commodity numbers in the target store; the first commodity numbers correspond to the total commodity numbers one by one, and one of the target stores corresponds to one first commodity number;
Screening a first type commodity shop and a second type commodity shop from the target shops according to the operation numbers, the total operation numbers and the proportion parameters;
and taking the first type commodity store and the second type commodity store as the commodity stores.
14. The method of claim 13, wherein the determining a plurality of operation amounts for operating a plurality of target items in a target store by the first demand word and the second demand word, and a plurality of total operation amounts for operating a plurality of target items comprises:
determining a plurality of first operation numbers of clicking operations on a plurality of first target commodities in a target store through the first requirement word and the second requirement word, and a plurality of first total operation numbers of clicking operations on the plurality of first target commodities;
determining a plurality of second operation numbers for performing additional purchase operation on a plurality of second target commodities in a target store through the first requirement word and the second requirement word, and a plurality of second total operation numbers for performing additional purchase operation on the plurality of second target commodities;
determining a plurality of third operation amounts for performing purchase operations on a plurality of third target commodities in a target store by the first demand word and the second demand word, and a plurality of third total operation amounts for performing purchase operations on the plurality of third target commodities;
Taking the plurality of first operation numbers, the plurality of second operation numbers and the plurality of third operation numbers as the plurality of operation numbers; the first total operation number, the second total operation number, and the third total operation number are set as the total operation numbers.
15. A target word mining apparatus, the apparatus comprising:
an acquisition unit configured to acquire a plurality of search terms when historically searching for a commodity;
a determining unit configured to determine a commodity store having an associated operation behavior with the plurality of search terms; and determining the target word according to commodity information corresponding to the commodity store, sales of the commodity and page browsing quantity of the commodity.
16. A target word mining apparatus, the apparatus comprising:
a memory, a processor, and a communication bus, the memory in communication with the processor through the communication bus, the memory storing a program of target word mining executable by the processor, the program of target word mining, when executed, performing the method of any of claims 1 to 14 by the processor.
17. A storage medium having stored thereon a computer program for use in a target word mining apparatus, the computer program, when executed by a processor, implementing the method of any one of claims 1 to 14.
CN202310331970.9A 2023-03-30 2023-03-30 Target word mining method and device and storage medium Pending CN116362236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310331970.9A CN116362236A (en) 2023-03-30 2023-03-30 Target word mining method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310331970.9A CN116362236A (en) 2023-03-30 2023-03-30 Target word mining method and device and storage medium

Publications (1)

Publication Number Publication Date
CN116362236A true CN116362236A (en) 2023-06-30

Family

ID=86921402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310331970.9A Pending CN116362236A (en) 2023-03-30 2023-03-30 Target word mining method and device and storage medium

Country Status (1)

Country Link
CN (1) CN116362236A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271869A (en) * 2023-11-22 2023-12-22 深圳市灵智数字科技有限公司 User search word recommendation method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271869A (en) * 2023-11-22 2023-12-22 深圳市灵智数字科技有限公司 User search word recommendation method and device and electronic equipment
CN117271869B (en) * 2023-11-22 2024-03-29 深圳市灵智数字科技有限公司 User search word recommendation method and device and electronic equipment

Similar Documents

Publication Publication Date Title
US11062372B2 (en) Method for relevancy ranking of products in online shopping
CN104866474B (en) Individuation data searching method and device
CN108665333B (en) Commodity recommendation method and device, electronic equipment and storage medium
Cheng et al. Personalized click prediction in sponsored search
TWI512653B (en) Information providing method and apparatus, method and apparatus for determining the degree of comprehensive relevance
CN109087177A (en) To the method, apparatus and computer readable storage medium of target user's Recommendations
US20090076927A1 (en) Distinguishing accessories from products for ranking search results
KR20090130774A (en) Method for recommendation to user and storage medium storing program for realizing the method
CN101661487A (en) Method and system for searching information items
CN104615779A (en) Method for personalized recommendation of Web text
CN113724042A (en) Commodity recommendation method, commodity recommendation device, commodity recommendation medium and commodity recommendation equipment
CN104252456A (en) Method, device and system for weight estimation
Valkanas et al. Mining competitors from large unstructured datasets
CN112100512A (en) Collaborative filtering recommendation method based on user clustering and project association analysis
CN111178949A (en) Service resource matching reference data determination method, device, equipment and storage medium
CN110852818A (en) Commodity sorting method and device and computer-readable storage medium
CN116362236A (en) Target word mining method and device and storage medium
Aggelis et al. Customer clustering using rfm analysis
CN110020918B (en) Recommendation information generation method and system
CN107609110B (en) Mining method and device for maximum multiple frequent patterns based on classification tree
Prabhu et al. FI-FCM algorithm for business intelligence
Meena et al. Product recommendation system using distance measure of product image features
CN106126664A (en) A kind of patent document lookup method and device
Jadhav et al. Customer Segmentation and Buyer Targeting Approach
Li Recommendation System Building based on CNN and TF-IDF Approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination