WO2018040762A1 - Data mining method and apparatus - Google Patents

Data mining method and apparatus Download PDF

Info

Publication number
WO2018040762A1
WO2018040762A1 PCT/CN2017/093145 CN2017093145W WO2018040762A1 WO 2018040762 A1 WO2018040762 A1 WO 2018040762A1 CN 2017093145 W CN2017093145 W CN 2017093145W WO 2018040762 A1 WO2018040762 A1 WO 2018040762A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
attribute
attributes
category
data
Prior art date
Application number
PCT/CN2017/093145
Other languages
French (fr)
Chinese (zh)
Inventor
党白璐
马添
宋丕宇
刘俊
郑超
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2018040762A1 publication Critical patent/WO2018040762A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions

Definitions

  • the present invention relates to the field of big data, and in particular, to a data mining method and apparatus.
  • the classification of objects mainly relies on the subjective management experience of managers.
  • the accuracy of classification is difficult to guarantee. How to accurately classify objects is a problem that needs to be solved.
  • One technical problem solved by the present invention is how to accurately classify objects.
  • a data mining method includes: extracting an attribute value of an object attribute in a user browsing data; and selecting an attribute of the object attribute according to a correlation between the attribute values of the object attribute in the user browsing data; The values are classified; the objects are classified according to the attribute value categories of the object attributes.
  • the method further includes: selecting a plurality of attributes of the object; classifying the object according to the attribute value category of the object attribute comprises: classifying the object according to the attribute value category of the plurality of attributes of the object.
  • the relevance of the plurality of attributes of the selected object is below a preset value.
  • classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data comprises: determining a correlation coefficient between the attribute values of the object attributes in the user browsing data, which will be related The attribute values whose coefficients are higher than the preset value are divided into one category.
  • classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data includes: clustering the user browsing data of the respective attribute values of the object attributes by using a clustering method According to the clustering result, the attribute values of the object attributes are classified.
  • the method further includes: determining, according to data of the user browsing the object, the corresponding pair of the user The category of the image; the push information of the object of the corresponding category is sent to the user.
  • determining a category of the object corresponding to the user includes: browsing data for an object of a certain category, determining a user corresponding category if the number of browsing times of the user is greater than an average number of browsing times of all users; or/and if the user is to If the number of views of the object of the category is greater than the preset ratio of the number of times the user views the objects of all categories, the corresponding category of the user is determined.
  • a data mining apparatus includes: an attribute value extraction module, configured to extract an attribute value of an object attribute in a user browsing data; and an attribute value classification module, configured to browse data according to a user
  • the correlation between the attribute values of the object attributes classifies the attribute values of the object attributes; the object classification module is used to classify the objects according to the attribute value categories of the object attributes.
  • the apparatus further includes an attribute selection module for selecting a plurality of attributes of the object; the attribute value classification module is further configured to classify the objects according to the attribute value categories of the plurality of attributes of the object.
  • the relevance of the plurality of attributes of the object selected by the attribute selection module is lower than a preset value.
  • the attribute value classification module includes: a correlation coefficient determining unit, configured to determine a correlation coefficient between the attribute values of the object attributes in the user browsing data; and a classifying unit configured to use the correlation coefficient to be higher than a preset value The attribute value is divided into one category.
  • the attribute value classification module includes: a clustering unit configured to cluster the user browsing data of each attribute value of the object attribute by using a clustering method; and the classification unit is configured to perform the object attribute according to the clustering result. The attribute values are classified.
  • the apparatus further includes: an object category determining module, configured to determine a category of the object corresponding to the user according to the data of the user browsing object; and an information sending module, configured to send the pushing information of the object of the corresponding category to the user .
  • the object category determining module is configured to: for browsing data of a certain category of objects, determine a user corresponding category if the number of browsing times of the user is greater than an average number of browsing times of all users; or/and if the user is for a certain category The proportion of the number of views of the object and the number of times the user browses the objects of all categories is greater than a preset ratio, and the corresponding category of the user is determined.
  • a data mining apparatus comprising: a memory; and a processor coupled to the memory, the processor being configured to execute the above based on an instruction stored in the memory Data mining method.
  • the invention performs the attribute value of the attribute according to the correlation between the attribute values of the object attributes in the user browsing data. Classification, and classifying objects according to attribute value categories of object attributes, achieving accurate classification of objects.
  • FIG. 1 is a flow chart showing an embodiment of a data mining method of the present invention.
  • FIG. 2 is a flow chart showing another embodiment of the data mining method of the present invention.
  • FIG. 3 is a flow chart showing still another embodiment of the data mining method of the present invention.
  • FIG. 4 is a block diagram showing the structure of an embodiment of the data mining apparatus of the present invention.
  • Fig. 5 is a block diagram showing the structure of another embodiment of the data mining device of the present invention.
  • Fig. 6 is a block diagram showing the structure of still another embodiment of the data mining device of the present invention.
  • Fig. 7 is a block diagram showing the structure of still another embodiment of the data mining device of the present invention.
  • This embodiment mines the category information of the object from the user browsing data of the object, thereby realizing accurate classification of the object.
  • FIG. 1 is a flow chart showing an embodiment of a data mining method of the present invention. This embodiment can accurately classify objects according to user browsing data. As shown in FIG. 1, the data mining method of this embodiment includes:
  • Step S102 Extract an attribute value of an object attribute in the user browsing data.
  • the object properties can be different and can be selected according to the nature of the object to be mined.
  • the corresponding attribute An example of a user browsing data about a television object is shown in Table 1.
  • the object attributes in the user browsing data include brand and size.
  • the attribute values of the brand of the object include brand 1, brand 2, ... brand i; the attribute values of the size of the object include size 1, size 2, ... size j.
  • Step S104 classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data.
  • attribute values whose correlation is higher than the preset value can be divided into one category.
  • brand 1 has a strong correlation with brand 2
  • brand 1 and brand 2 are classified into one category.
  • size 1 has a strong correlation with size 2
  • size 1 has a weak correlation with size i
  • size 2 has a weak correlation with size i
  • size 1 and size 2 Divided into one category.
  • Each category forms a unique attribute of the object, and therefore, the category combination of the attribute value or the attribute value of the plurality of object attributes may be referred to as an exclusive attribute.
  • An example in which the category combination of the attribute values of the plurality of object attributes form an exclusive attribute may be referred to the example of the flat panel television shown in the embodiment shown in FIG. 2.
  • step S106 the objects are classified according to the attribute value categories of the object attributes.
  • objects with brand 1 and brand 2 can be classified into one category.
  • the data mining method is used to know how to classify the object, and the classification has objectivity, avoiding the disadvantage of classifying the object by empirical subjective judgment, and realizing the object to the object. Accurate classification.
  • the present invention also provides two exemplary methods of classifying attribute values of object attributes.
  • the correlation coefficient between the attribute values of the object attributes in the user browsing data is determined, and the attribute values whose correlation coefficients are higher than the preset value are divided into one category.
  • the correlation coefficient between the attribute values can be calculated, for example, by a method such as a pearson correlation coefficient and a spearman rank correlation coefficient.
  • a method such as a pearson correlation coefficient and a spearman rank correlation coefficient.
  • the pearson correlation coefficient ⁇ XY of the attributes X and Y as variables is calculated as follows:
  • the correlation coefficient between the values of the respective object attributes calculated by the above method is as shown in Table 2.
  • the correlation coefficient ranges from 0 to 1.
  • the correlation coefficient between two attributes is less than a certain threshold, for example, 0.3, the two attributes may be considered as unrelated or related.
  • the clustering method is used to cluster the user browsing data of each attribute value of the object attribute, and the attribute value of the object attribute is classified according to the clustering result.
  • the KMEANS or K-N-N algorithm is used to cluster the user browsing data of each attribute value of the object attribute, and the attribute value of the attribute is classified according to the clustering result. For example, clustering of attribute values for an object attribute There are three types of results, and the attribute values of the object attributes are divided into three categories.
  • FIG. 2 is a flow chart showing another embodiment of the data mining method of the present invention.
  • the method of this embodiment includes:
  • Step S201 selecting a plurality of attributes of the object.
  • object attributes such as brand, size, price, material, resolution, etc.
  • object attributes such as brand, size, price, material, resolution, etc.
  • a plurality of attributes of the object whose relevance is lower than the preset value are selected.
  • the attribute coefficient correlation coefficients of the two attributes are consistent, it is considered that the two attributes are highly correlated and there is collinearity, so one of them can be selected.
  • the correlation between size and resolution is higher, and the larger the size, the higher the resolution, so that only one attribute can be selected from the size; the correlation between the brand and the size is low, Select these two attributes to subdivide the object.
  • the correlation of multiple attributes of the selected object is lower than the preset value, which helps to improve the efficiency of object class division, reduce collinearity, and increase comprehensiveness.
  • step S202 and step S204 are performed.
  • Step S202 extracting an attribute value of an object attribute in the user browsing data. For specific implementation, reference may be made to step S102.
  • Step S204 classify the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data. For specific implementation, reference may be made to step S104.
  • step S206 may be performed.
  • Step S206 classifying the objects according to the attribute value categories of the plurality of attributes of the object.
  • flat-panel TVs can be divided into high-end, mid-range, and low-end categories.
  • objects can be divided into four categories: large living room, large living room, small living room, and bedroom type.
  • All flat-panel TVs can be divided into 12 categories, including high-end large living room, high-end large living room, high-end small living room, high-end bedroom, mid-range large living room, and medium-sized living room. , small indoor living room, mid-level bedroom, low-end large living room, low-end large living room, low-end small living room, low-end bedroom.
  • FIG. 3 is a flow chart showing still another embodiment of the data mining method of the present invention. After determining the category of the object, the data mining method in this embodiment further includes:
  • Step S308 determining the category of the object corresponding to the user according to the data of the user browsing object. For example, the following three methods can be used to determine the category of the object corresponding to the user:
  • the percentage of the number of times the user browses a certain category of objects and the number of times the user views the number of views of the objects of all categories is greater than a preset ratio, it is determined that the user corresponds to the category. For example, if the user browses the data of the flat-panel TV, the number of browsing of the high-end large living room category flat-panel TV is large, and it can be determined that the user corresponds to the high-end large living room category flat-panel TV.
  • some special data can be selectively deleted, not as a basis for subsequent operations.
  • some store merchants may have too many views of the objects they sell, so they can be deleted.
  • Step S310 the push information of the object of the corresponding category is sent to the user. That is, the relevant information of the high-end large living room category flat panel television can be pushed to the user.
  • the object category corresponding to the user is determined, and the targeted pushing is performed, which can save system resources and improve the conversion rate of the promotion.
  • a data mining apparatus according to an embodiment of the present invention will now be described with reference to FIG.
  • the data mining device 40 of this embodiment includes:
  • the attribute value extraction module 402 is configured to extract an attribute value of an object attribute in the user browsing data.
  • the attribute value classification module 404 is configured to classify the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data.
  • the object classification module 406 is configured to classify the objects according to the attribute value categories of the object attributes.
  • the data mining method is used to know how to classify the object, and the classification has objectivity, avoiding the disadvantage of classifying the object by empirical subjective judgment, and realizing the object to the object. Accurate classification.
  • a data mining apparatus according to an embodiment of the present invention will be described below with reference to FIG.
  • FIG. 5 is a block diagram showing the structure of an embodiment of the data mining device of the present invention. As shown in FIG. 5, the data mining device 50 of this embodiment includes:
  • the attribute selection module 501 is configured to select a plurality of attributes of the object.
  • the attribute value classification module 404 is further configured to classify the objects according to the attribute value categories of the plurality of attributes of the object.
  • Selecting multiple attributes of an object and classifying the objects according to the attribute value categories of multiple attributes of the object can perform more detailed classification of the objects, further realizing accurate classification of the objects.
  • the correlation of the multiple attributes of the object selected by the attribute selection module 501 is lower than a preset value.
  • the attribute value classification module 404 includes: a correlation coefficient determining unit 4042, configured to determine a correlation coefficient between the attribute values of the object attributes in the user browsing data; and a classifying unit 4044, configured to use the correlation coefficient to be higher than a preset value
  • the attribute values are divided into one category.
  • the attribute value classification module 404 includes: a clustering unit 4046, configured to cluster user browsing data of each attribute value of the object attribute by using a clustering method; and a classifying unit 4048, configured to perform object according to the clustering result The attribute values of the attributes are classified.
  • the data mining device 50 may further include:
  • the object category determining module 508 is configured to determine a category of the object corresponding to the user according to the data of the user browsing the object.
  • the information sending module 510 is configured to send the push information of the object of the corresponding category to the user.
  • the object category corresponding to the user is determined, and the targeted pushing is performed, which can save system resources and improve the conversion rate of the promotion.
  • the object category determining module 508 is configured to: for the browsing data of the object of a certain category, if the browsing times of the user are greater than the average number of browsing times of all the users, determine the corresponding category of the user; or/and if the user objects to a certain category The number of views and the user's percentage of views of all categories of objects are greater than a preset ratio, and the user corresponding category is determined.
  • Figure 6 is a block diagram of an embodiment of a data mining device of the present invention.
  • the apparatus 60 of this embodiment includes a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to perform any of the foregoing embodiments based on instructions stored in the memory 610.
  • Data mining method As shown in FIG. 6, the apparatus 60 of this embodiment includes a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to perform any of the foregoing embodiments based on instructions stored in the memory 610. Data mining method.
  • the memory 610 may include, for example, a system memory, a fixed non-volatile storage medium, or the like.
  • the system memory stores, for example, an operating system, an application, a boot loader, and other programs.
  • FIG. 7 is a structural diagram of still another embodiment of a data mining apparatus according to the present invention.
  • the apparatus 70 of this embodiment includes a memory 610 and a processor 620, and may further include an input/output interface 730, a network interface 740, a storage interface 750, and the like. These interfaces 730, 740, 750 and the memory 610 and the processor 620 can be connected, for example, via a bus 760.
  • the input/output interface 730 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • Network interface 740 provides a connection interface for various networked devices.
  • the storage interface 750 provides a connection interface for an external storage device such as an SD card or a USB flash drive.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code. .
  • the present invention also includes a computer readable storage medium having stored thereon computer instructions that, when executed by a processor, implement the data mining method of any of the foregoing embodiments.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Disclosed are a data mining method and apparatus, which relate to the field of big data. The method comprises: extracting attribute values of object attributes in user browsing data (S102); classifying, according to a correlation between the attribute values of the object attributes in the user browsing data, the attribute values of the object attributes (S104); and classifying objects according to classifications of the attribute values of the object attributes (S106). The method mines classification information of objects in user browsing data of the objects, thereby realizing accurate classification of the objects.

Description

数据挖掘方法及装置Data mining method and device 技术领域Technical field
本发明涉及大数据领域,特别涉及一种数据挖掘方法及装置。The present invention relates to the field of big data, and in particular, to a data mining method and apparatus.
背景技术Background technique
伴随着时代的发展,越来越多的物品对象逐渐出现在人们的生活当中。这些对象在给人们带来生活便利的同时,也带来了相应的对象管理问题。With the development of the times, more and more objects have gradually appeared in people's lives. These objects bring convenience to people, but they also bring corresponding object management problems.
在管理人员将对象进行管理的过程当中,将对象进行分类显得尤为重要。通过将对象更加合理的分类,可以提高对象的管理效率。It is especially important to classify objects as they are managed by managers. By classifying objects more reasonably, you can improve the management efficiency of objects.
现有技术中,将对象进行分类主要依靠管理人员的主观管理经验。然而,单纯依靠主观管理经验将对象进行分类,分类的精准度难以得到保证。如何将对象进行精准的分类,是需要解决的问题。In the prior art, the classification of objects mainly relies on the subjective management experience of managers. However, simply relying on subjective management experience to classify objects, the accuracy of classification is difficult to guarantee. How to accurately classify objects is a problem that needs to be solved.
发明内容Summary of the invention
本发明解决的一个技术问题是,如何将对象进行精准的分类。One technical problem solved by the present invention is how to accurately classify objects.
根据本发明实施例的一个方面,提供了一种数据挖掘方法,包括:提取用户浏览数据中对象属性的属性值;根据用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类;依据对象属性的属性值类别对对象进行分类。According to an aspect of the embodiments of the present invention, a data mining method includes: extracting an attribute value of an object attribute in a user browsing data; and selecting an attribute of the object attribute according to a correlation between the attribute values of the object attribute in the user browsing data; The values are classified; the objects are classified according to the attribute value categories of the object attributes.
在一些实施例中,该方法还包括:选择对象的多个属性;依据对象属性的属性值类别对对象进行分类包括:依据对象的多个属性的属性值类别对对象进行分类。In some embodiments, the method further includes: selecting a plurality of attributes of the object; classifying the object according to the attribute value category of the object attribute comprises: classifying the object according to the attribute value category of the plurality of attributes of the object.
在一些实施例中,选择的对象的多个属性的相关性低于预设值。In some embodiments, the relevance of the plurality of attributes of the selected object is below a preset value.
在一些实施例中,根据用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类包括:确定用户浏览数据中对象属性的属性值的之间的相关系数,将相关系数高于预设值的属性值划分为一个类别。In some embodiments, classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data comprises: determining a correlation coefficient between the attribute values of the object attributes in the user browsing data, which will be related The attribute values whose coefficients are higher than the preset value are divided into one category.
在一些实施例中,根据用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类包括:采用聚类的方法对对象属性的各个属性值的用户浏览数据进行聚类,按照聚类结果对对象属性的属性值进行分类。In some embodiments, classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data includes: clustering the user browsing data of the respective attribute values of the object attributes by using a clustering method According to the clustering result, the attribute values of the object attributes are classified.
在一些实施例中,该方法还包括:根据用户浏览对象的数据,确定用户对应的对 象的类别;将对应类别的对象的推送信息发送给用户。In some embodiments, the method further includes: determining, according to data of the user browsing the object, the corresponding pair of the user The category of the image; the push information of the object of the corresponding category is sent to the user.
在一些实施例中,确定用户对应的对象的类别包括:针对某类别的对象的浏览数据,如果用户的浏览次数大于所有用户的平均浏览次数,则确定用户对应类别;或/和如果用户对某类别的对象的浏览次数与用户对所有类别的对象的浏览次数的占比大于预设比例,则确定用户对应类别。In some embodiments, determining a category of the object corresponding to the user includes: browsing data for an object of a certain category, determining a user corresponding category if the number of browsing times of the user is greater than an average number of browsing times of all users; or/and if the user is to If the number of views of the object of the category is greater than the preset ratio of the number of times the user views the objects of all categories, the corresponding category of the user is determined.
根据本发明实施例的另一个方面,提供了一种数据挖掘装置,包括:属性值提取模块,用于提取用户浏览数据中对象属性的属性值;属性值分类模块,用于根据用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类;对象分类模块,用于依据对象属性的属性值类别对对象进行分类。According to another aspect of the embodiments of the present invention, a data mining apparatus includes: an attribute value extraction module, configured to extract an attribute value of an object attribute in a user browsing data; and an attribute value classification module, configured to browse data according to a user The correlation between the attribute values of the object attributes classifies the attribute values of the object attributes; the object classification module is used to classify the objects according to the attribute value categories of the object attributes.
在一些实施例中,该装置还包括属性选择模块,用于选择对象的多个属性;属性值分类模块还用于依据对象的多个属性的属性值类别对对象进行分类。In some embodiments, the apparatus further includes an attribute selection module for selecting a plurality of attributes of the object; the attribute value classification module is further configured to classify the objects according to the attribute value categories of the plurality of attributes of the object.
在一些实施例中,属性选择模块选择的对象的多个属性的相关性低于预设值。In some embodiments, the relevance of the plurality of attributes of the object selected by the attribute selection module is lower than a preset value.
在一些实施例中,属性值分类模块包括:相关系数确定单元,用于确定用户浏览数据中对象属性的属性值的之间的相关系数;分类单元,用于将相关系数高于预设值的属性值划分为一个分类。In some embodiments, the attribute value classification module includes: a correlation coefficient determining unit, configured to determine a correlation coefficient between the attribute values of the object attributes in the user browsing data; and a classifying unit configured to use the correlation coefficient to be higher than a preset value The attribute value is divided into one category.
在一些实施例中,属性值分类模块包括:聚类单元,用于采用聚类的方法对对象属性的各个属性值的用户浏览数据进行聚类;分类单元,用于按照聚类结果对对象属性的属性值进行分类。In some embodiments, the attribute value classification module includes: a clustering unit configured to cluster the user browsing data of each attribute value of the object attribute by using a clustering method; and the classification unit is configured to perform the object attribute according to the clustering result. The attribute values are classified.
在一些实施例中,该装置还包括:对象类别确定模块,用于根据用户浏览对象的数据,确定用户对应的对象的类别;信息发送模块,用于将对应类别的对象的推送信息发送给用户。In some embodiments, the apparatus further includes: an object category determining module, configured to determine a category of the object corresponding to the user according to the data of the user browsing object; and an information sending module, configured to send the pushing information of the object of the corresponding category to the user .
在一些实施例中,对象类别确定模块用于:针对某类别的对象的浏览数据,如果用户的浏览次数大于所有用户的平均浏览次数,则确定用户对应类别;或/和如果用户对某类别的对象的浏览次数与用户对所有类别的对象的浏览次数的占比大于预设比例,则确定用户对应类别。In some embodiments, the object category determining module is configured to: for browsing data of a certain category of objects, determine a user corresponding category if the number of browsing times of the user is greater than an average number of browsing times of all users; or/and if the user is for a certain category The proportion of the number of views of the object and the number of times the user browses the objects of all categories is greater than a preset ratio, and the corresponding category of the user is determined.
根据本发明实施例的又一个方面,提供了一种数据挖掘装置,其特征在于,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器中的指令,执行上述的数据挖掘方法。According to still another aspect of the embodiments of the present invention, a data mining apparatus is provided, comprising: a memory; and a processor coupled to the memory, the processor being configured to execute the above based on an instruction stored in the memory Data mining method.
本发明根据用户浏览数据中对象属性的属性值之间的相关性对属性的属性值进行 分类,并依据对象属性的属性值类别对对象进行分类,实现了对对象的精准分类。The invention performs the attribute value of the attribute according to the correlation between the attribute values of the object attributes in the user browsing data. Classification, and classifying objects according to attribute value categories of object attributes, achieving accurate classification of objects.
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。Other features and advantages of the present invention will become apparent from the Detailed Description of the <RTIgt;
附图说明DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.
图1示出本发明数据挖掘方法的一个实施例的流程示意图。FIG. 1 is a flow chart showing an embodiment of a data mining method of the present invention.
图2示出本发明数据挖掘方法的另一个实施例的流程示意图。2 is a flow chart showing another embodiment of the data mining method of the present invention.
图3示出本发明数据挖掘方法的又一个实施例的流程示意图。FIG. 3 is a flow chart showing still another embodiment of the data mining method of the present invention.
图4示出本发明数据挖掘装置的一个实施例的结构示意图。4 is a block diagram showing the structure of an embodiment of the data mining apparatus of the present invention.
图5示出本发明数据挖掘装置的另一个实施例的结构示意图。Fig. 5 is a block diagram showing the structure of another embodiment of the data mining device of the present invention.
图6示出本发明数据挖掘装置的又一个实施例的结构示意图。Fig. 6 is a block diagram showing the structure of still another embodiment of the data mining device of the present invention.
图7示出本发明数据挖掘装置的再一个实施例的结构示意图。Fig. 7 is a block diagram showing the structure of still another embodiment of the data mining device of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. The following description of the at least one exemplary embodiment is merely illustrative and is in no way All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
下面结合图1描述本发明数据挖掘方法的一个实施例。该实施例从对象的用户浏览数据中挖掘对象的类别信息,从而实现将对象进行精准的分类。One embodiment of the data mining method of the present invention is described below in conjunction with FIG. This embodiment mines the category information of the object from the user browsing data of the object, thereby realizing accurate classification of the object.
图1示出本发明数据挖掘方法的一个实施例的流程示意图。本实施例能够根据用户浏览数据实现将对象进行精准的分类。如图1所示,该实施例的数据挖掘方法包括:FIG. 1 is a flow chart showing an embodiment of a data mining method of the present invention. This embodiment can accurately classify objects according to user browsing data. As shown in FIG. 1, the data mining method of this embodiment includes:
步骤S102,提取用户浏览数据中对象属性的属性值。Step S102: Extract an attribute value of an object attribute in the user browsing data.
对于不同的对象而言,对象属性可以是不同的,可以根据待挖掘的对象的性质选取 相应的属性。一个关于电视对象的用户浏览数据的示例如表1所示。其中,用户浏览数据中对象属性包括品牌、尺寸。对象的品牌的属性值包括品牌1、品牌2、…品牌i;对象的尺寸的属性值包括尺寸1、尺寸2、…尺寸j。For different objects, the object properties can be different and can be selected according to the nature of the object to be mined. The corresponding attribute. An example of a user browsing data about a television object is shown in Table 1. Among them, the object attributes in the user browsing data include brand and size. The attribute values of the brand of the object include brand 1, brand 2, ... brand i; the attribute values of the size of the object include size 1, size 2, ... size j.
表1Table 1
Figure PCTCN2017093145-appb-000001
Figure PCTCN2017093145-appb-000001
步骤S104,根据用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类。Step S104, classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data.
通常情况下,可以将相关性高于预设值的属性值划分为一个类别。例如,对于对象的品牌而言,品牌1与品牌2具有较强的相关性,则将品牌1与品牌2划分为一类。又如,对于对象的尺寸而言,尺寸1与尺寸2具有较强的相关性,尺寸1与尺寸i具有弱相关性,尺寸2与尺寸i和具有弱相关性,则将尺寸1与尺寸2划分为一个类别。In general, attribute values whose correlation is higher than the preset value can be divided into one category. For example, for the brand of the object, brand 1 has a strong correlation with brand 2, and brand 1 and brand 2 are classified into one category. For another example, for the size of the object, size 1 has a strong correlation with size 2, size 1 has a weak correlation with size i, size 2 has a weak correlation with size i, and size 1 and size 2 Divided into one category.
每一个类别形成该对象的特有的属性,因此,可以将属性值的类别或者多个对象属性的属性值的类别组合称为排他属性。其中,多个对象属性的属性值的类别组合形成排他属性的示例可以参考图2所示实施例中示出的平板电视的示例。Each category forms a unique attribute of the object, and therefore, the category combination of the attribute value or the attribute value of the plurality of object attributes may be referred to as an exclusive attribute. An example in which the category combination of the attribute values of the plurality of object attributes form an exclusive attribute may be referred to the example of the flat panel television shown in the embodiment shown in FIG. 2.
步骤S106,依据对象属性的属性值类别对对象进行分类。In step S106, the objects are classified according to the attribute value categories of the object attributes.
例如,将品牌1与品牌2划分为一类子品牌之后,可以将具有品牌1与品牌2的对象划分为一类。For example, after dividing brand 1 and brand 2 into one type of sub-brand, objects with brand 1 and brand 2 can be classified into one category.
上述实施例中,根据用户浏览对象数据之间的相关性,通过数据挖掘方法得知如何对对象进行分类,分类具有客观性,避免了凭借经验主观臆断对对象分类的缺点,实现了对于对象的精准分类。In the above embodiment, according to the correlation between the user browsing object data, the data mining method is used to know how to classify the object, and the classification has objectivity, avoiding the disadvantage of classifying the object by empirical subjective judgment, and realizing the object to the object. Accurate classification.
本发明还提供了两种示例性的将对象属性的属性值进行分类的方法。The present invention also provides two exemplary methods of classifying attribute values of object attributes.
(将对象属性的属性值进行分类的第一种方法)(The first method of classifying attribute values of object attributes)
确定用户浏览数据中对象属性的属性值的之间的相关系数,将相关系数高于预设值的属性值划分为一个分类。The correlation coefficient between the attribute values of the object attributes in the user browsing data is determined, and the attribute values whose correlation coefficients are higher than the preset value are divided into one category.
属性值的之间的相关系数例如可以采用pearson相关系数、spearman秩相关系数 等方法进行计算。例如,作为变量的属性X、Y的pearson相关系数ρXY计算公式如下:The correlation coefficient between the attribute values can be calculated, for example, by a method such as a pearson correlation coefficient and a spearman rank correlation coefficient. For example, the pearson correlation coefficient ρ XY of the attributes X and Y as variables is calculated as follows:
Figure PCTCN2017093145-appb-000002
Figure PCTCN2017093145-appb-000002
其中,cov(X,Y)表示X、Y的协方差,D(X)表示X的方差,D(Y)表示Y的方差,xk表示第k个用户对于属性X的对象的浏览数量,yk表示第k个用户对于属性Y的对象的浏览数量,
Figure PCTCN2017093145-appb-000003
表示对于属性X的对象的浏览数量均值,
Figure PCTCN2017093145-appb-000004
表示对于属性Y的对象的浏览数量均值。
Where cov(X,Y) represents the covariance of X and Y, D(X) represents the variance of X, D(Y) represents the variance of Y, and x k represents the number of views of the kth user for the object of attribute X, y k represents the number of views of the kth user for the object of attribute Y,
Figure PCTCN2017093145-appb-000003
Indicates the average number of views for the object of attribute X,
Figure PCTCN2017093145-appb-000004
Represents the average number of views for the object of attribute Y.
通过上述方法计算得到各个对象属性值之间的相关系数如表2所示。其中,相关系数的取值范围在0~1之间,当两个属性之间的相关系数值小于某一阈值时,例如0.3,可认为这两个属性不相关或者相关程度较低。The correlation coefficient between the values of the respective object attributes calculated by the above method is as shown in Table 2. The correlation coefficient ranges from 0 to 1. When the correlation coefficient between two attributes is less than a certain threshold, for example, 0.3, the two attributes may be considered as unrelated or related.
表2Table 2
Figure PCTCN2017093145-appb-000005
Figure PCTCN2017093145-appb-000005
(将对象属性的属性值进行分类的第二种方法)(The second method of classifying attribute values of object attributes)
采用聚类的方法对对象属性的各个属性值的用户浏览数据进行聚类,按照聚类结果对对象属性的属性值进行分类。The clustering method is used to cluster the user browsing data of each attribute value of the object attribute, and the attribute value of the object attribute is classified according to the clustering result.
例如,采用KMEANS或K-N-N算法对对象属性的各个属性值的用户浏览数据进行聚类,并按照聚类结果对属性的属性值进行分类。例如,某种对象属性的属性值的聚类 结果有三类,则将该对象属性的属性值划分三类。For example, the KMEANS or K-N-N algorithm is used to cluster the user browsing data of each attribute value of the object attribute, and the attribute value of the attribute is classified according to the clustering result. For example, clustering of attribute values for an object attribute There are three types of results, and the attribute values of the object attributes are divided into three categories.
下面结合图2描述依据对象的多个对象属性的属性值类别对对象进行分类的情形。The case where the objects are classified according to the attribute value categories of the plurality of object attributes of the object will be described below with reference to FIG.
图2示出本发明数据挖掘方法的另一个实施例的流程示意图。该实施例的方法包括:2 is a flow chart showing another embodiment of the data mining method of the present invention. The method of this embodiment includes:
步骤S201,选择对象的多个属性。对象属性有很多,例如品牌、尺寸、价格、材质、分辨率等等,考虑到用户浏览行为的覆盖度,故仅选择用户使用频率较高的几个对象属性,如品牌、尺寸。Step S201, selecting a plurality of attributes of the object. There are many object attributes, such as brand, size, price, material, resolution, etc. Considering the coverage of the user's browsing behavior, only select several object attributes, such as brand and size, which are frequently used by the user.
优选的,选择相关性低于预设值的对象的多个属性。当两个属性的属性值相关系数分布一致时,则认为这两个属性相关性较高,存在共线性,故可从中选取其一。例如,对于平板电视的属性而言,尺寸和分辨率的相关性较高,尺寸越大其分辨率越高,因此可以从中仅选择尺寸这一个属性;而品牌和尺寸的相关性较低,可以选择这两个属性对对象进行后续的划分。Preferably, a plurality of attributes of the object whose relevance is lower than the preset value are selected. When the attribute coefficient correlation coefficients of the two attributes are consistent, it is considered that the two attributes are highly correlated and there is collinearity, so one of them can be selected. For example, for the properties of a flat-panel TV, the correlation between size and resolution is higher, and the larger the size, the higher the resolution, so that only one attribute can be selected from the size; the correlation between the brand and the size is low, Select these two attributes to subdivide the object.
选择的对象的多个属性的相关性低于预设值,有助于提高对象类别划分的效率,降低共线性,增加全面性。The correlation of multiple attributes of the selected object is lower than the preset value, which helps to improve the efficiency of object class division, reduce collinearity, and increase comprehensiveness.
然后,针对每个对象属性,执行步骤S202和步骤S204。Then, for each object attribute, step S202 and step S204 are performed.
步骤S202,提取用户浏览数据中对象属性的属性值。具体实现可以参考步骤S102。Step S202, extracting an attribute value of an object attribute in the user browsing data. For specific implementation, reference may be made to step S102.
步骤S204,根据用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类。具体实现可以参考步骤S104。Step S204: classify the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data. For specific implementation, reference may be made to step S104.
待所有的对象属性的属性值都已经分类完毕,则可以执行步骤S206。After the attribute values of all the object attributes have been classified, step S206 may be performed.
步骤S206,依据对象的多个属性的属性值类别对对象进行分类。Step S206, classifying the objects according to the attribute value categories of the plurality of attributes of the object.
通常情况下,可以依据多个属性的属性值的类别组合的数量,将对象划分为相应数量的类别。例如,根据品牌属性可以将平板电视划分为高端、中端、低端三类,根据尺寸属性可以将对象划分为超大客厅、大客厅、小客厅、卧室型四类。这两个属性的属性值的类别组合有12种,则所有的平板电视可以划分为12类,包括高端超大客厅、高端大客厅、高端小客厅、高端卧室、中端超大客厅、中端大客厅、中端小客厅、中端卧室、低端超大客厅、低端大客厅、低端小客厅、低端卧室。In general, you can divide an object into a corresponding number of categories based on the number of category combinations of attribute values for multiple attributes. For example, according to brand attributes, flat-panel TVs can be divided into high-end, mid-range, and low-end categories. According to the size attribute, objects can be divided into four categories: large living room, large living room, small living room, and bedroom type. There are 12 categories of attribute values for these two attributes. All flat-panel TVs can be divided into 12 categories, including high-end large living room, high-end large living room, high-end small living room, high-end bedroom, mid-range large living room, and medium-sized living room. , small indoor living room, mid-level bedroom, low-end large living room, low-end large living room, low-end small living room, low-end bedroom.
选择对象的多个属性,并依据对象的多个属性的属性值类别对对象进行分类,可以将对象进行更为详细的类别划分,进一步实现了对于对象的精准分类。Selecting multiple attributes of an object, and classifying the objects according to the attribute value categories of multiple attributes of the object, the objects can be classified into more detailed categories, and the precise classification of the objects is further realized.
下面结合图3描述本发明数据挖掘方法的一个应用例,基于数据挖掘结果精准地推送信息。 An application example of the data mining method of the present invention will be described below with reference to FIG. 3, and the information is accurately pushed based on the data mining result.
图3示出本发明数据挖掘方法的又一个实施例的流程示意图。确定对象的类别之后,本实施例的数据挖掘方法还包括:FIG. 3 is a flow chart showing still another embodiment of the data mining method of the present invention. After determining the category of the object, the data mining method in this embodiment further includes:
步骤S308,根据用户浏览对象的数据,确定用户对应的对象的类别。例如可以采用以下三种方法确定用户对应的对象的类别:Step S308, determining the category of the object corresponding to the user according to the data of the user browsing object. For example, the following three methods can be used to determine the category of the object corresponding to the user:
(1)针对某类别的对象的浏览数据,如果用户的浏览次数大于所有用户的平均浏览次数,则确定用户对应该类别。(1) For the browsing data of a certain category of objects, if the number of browsing times of the user is greater than the average number of browsing times of all users, it is determined that the user corresponds to the category.
(2)如果用户对某类别的对象的浏览次数与用户对所有类别的对象的浏览次数的占比大于预设比例,则确定用户对应该类别。例如,用户浏览平板电视的数据中,高端大客厅类别平板电视的浏览次数居多,则可以确定该用户对应高端大客厅类别平板电视。(2) If the percentage of the number of times the user browses a certain category of objects and the number of times the user views the number of views of the objects of all categories is greater than a preset ratio, it is determined that the user corresponds to the category. For example, if the user browses the data of the flat-panel TV, the number of browsing of the high-end large living room category flat-panel TV is large, and it can be determined that the user corresponds to the high-end large living room category flat-panel TV.
(3)同时满足上述(1)(2)两种情况,即针对某类别的对象的浏览数据,如果用户的浏览次数大于所有用户的平均浏览次数,且该用户对该类别的对象的浏览次数与该用户对所有类别的对象的浏览次数的占比大于预设比例,则确定用户对应该类别。(3) Simultaneously satisfy the above two cases (1) and (2), that is, for the browsing data of an object of a certain category, if the number of browsing times of the user is greater than the average number of browsing times of all users, and the number of times the user views the object of the category If the proportion of the number of browsing times of the user for all categories of objects is greater than a preset ratio, it is determined that the user corresponds to the category.
此外,还可以选择性的将某些特殊的数据删除,不作为后续运算的基础。例如,某些店铺商家对自己出售的对象的浏览次数可能过高,那么可以将这类数据删除。In addition, some special data can be selectively deleted, not as a basis for subsequent operations. For example, some store merchants may have too many views of the objects they sell, so they can be deleted.
步骤S310,将对应类别的对象的推送信息发送给用户。即,可以将高端大客厅类别平板电视的相关信息推送给用户。Step S310, the push information of the object of the corresponding category is sent to the user. That is, the relevant information of the high-end large living room category flat panel television can be pushed to the user.
根据用户的浏览数据确定用户对应的对象类别,并进行有针对性的推送,可以节约***资源,提高促销的转化率。According to the browsing data of the user, the object category corresponding to the user is determined, and the targeted pushing is performed, which can save system resources and improve the conversion rate of the promotion.
下面结合图4描述本发明一个实施例的数据挖掘装置。A data mining apparatus according to an embodiment of the present invention will now be described with reference to FIG.
图4示出本发明数据挖掘装置的一个实施例的结构示意图。如图4所示,该实施例的数据挖掘装置40包括:4 is a block diagram showing the structure of an embodiment of the data mining apparatus of the present invention. As shown in FIG. 4, the data mining device 40 of this embodiment includes:
属性值提取模块402,用于提取用户浏览数据中对象属性的属性值。The attribute value extraction module 402 is configured to extract an attribute value of an object attribute in the user browsing data.
属性值分类模块404,用于根据用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类。The attribute value classification module 404 is configured to classify the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data.
对象分类模块406,用于依据对象属性的属性值类别对对象进行分类。The object classification module 406 is configured to classify the objects according to the attribute value categories of the object attributes.
上述实施例中,根据用户浏览对象数据之间的相关性,通过数据挖掘方法得知如何对对象进行分类,分类具有客观性,避免了凭借经验主观臆断对对象分类的缺点,实现了对于对象的精准分类。 In the above embodiment, according to the correlation between the user browsing object data, the data mining method is used to know how to classify the object, and the classification has objectivity, avoiding the disadvantage of classifying the object by empirical subjective judgment, and realizing the object to the object. Accurate classification.
下面结合图5描述本发明一个实施例的数据挖掘装置。A data mining apparatus according to an embodiment of the present invention will be described below with reference to FIG.
图5示出本发明数据挖掘装置的一个实施例的结构示意图。如图5所示,该实施例的数据挖掘装置50包括:Figure 5 is a block diagram showing the structure of an embodiment of the data mining device of the present invention. As shown in FIG. 5, the data mining device 50 of this embodiment includes:
属性选择模块501,用于选择对象的多个属性。The attribute selection module 501 is configured to select a plurality of attributes of the object.
属性值分类模块404还用于依据对象的多个属性的属性值类别对对象进行分类。The attribute value classification module 404 is further configured to classify the objects according to the attribute value categories of the plurality of attributes of the object.
选择对象的多个属性,并依据对象的多个属性的属性值类别对对象进行分类,可以对对象进行更为详细的类别划分,进一步实现了对于对象的精准分类。Selecting multiple attributes of an object and classifying the objects according to the attribute value categories of multiple attributes of the object can perform more detailed classification of the objects, further realizing accurate classification of the objects.
可选的,属性选择模块501选择的对象的多个属性的相关性低于预设值。Optionally, the correlation of the multiple attributes of the object selected by the attribute selection module 501 is lower than a preset value.
可选的,属性值分类模块404包括:相关系数确定单元4042,用于确定用户浏览数据中对象属性的属性值的之间的相关系数;分类单元4044,用于将相关系数高于预设值的属性值划分为一个分类。Optionally, the attribute value classification module 404 includes: a correlation coefficient determining unit 4042, configured to determine a correlation coefficient between the attribute values of the object attributes in the user browsing data; and a classifying unit 4044, configured to use the correlation coefficient to be higher than a preset value The attribute values are divided into one category.
可选的,属性值分类模块404包括:聚类单元4046,用于采用聚类的方法对对象属性的各个属性值的用户浏览数据进行聚类;分类单元4048,用于按照聚类结果对对象属性的属性值进行分类。Optionally, the attribute value classification module 404 includes: a clustering unit 4046, configured to cluster user browsing data of each attribute value of the object attribute by using a clustering method; and a classifying unit 4048, configured to perform object according to the clustering result The attribute values of the attributes are classified.
可选的,数据挖掘装置50还可以包括:Optionally, the data mining device 50 may further include:
对象类别确定模块508,用于根据用户浏览对象的数据,确定用户对应的对象的类别。The object category determining module 508 is configured to determine a category of the object corresponding to the user according to the data of the user browsing the object.
信息发送模块510,用于将对应类别的对象的推送信息发送给用户。The information sending module 510 is configured to send the push information of the object of the corresponding category to the user.
根据用户的浏览数据确定用户对应的对象类别,并进行有针对性的推送,可以节约***资源,提高促销的转化率。According to the browsing data of the user, the object category corresponding to the user is determined, and the targeted pushing is performed, which can save system resources and improve the conversion rate of the promotion.
可选的,对象类别确定模块508用于:针对某类别的对象的浏览数据,如果用户的浏览次数大于所有用户的平均浏览次数,则确定用户对应类别;或/和如果用户对某类别的对象的浏览次数与用户对所有类别的对象的浏览次数的占比大于预设比例,则确定用户对应类别。Optionally, the object category determining module 508 is configured to: for the browsing data of the object of a certain category, if the browsing times of the user are greater than the average number of browsing times of all the users, determine the corresponding category of the user; or/and if the user objects to a certain category The number of views and the user's percentage of views of all categories of objects are greater than a preset ratio, and the user corresponding category is determined.
图6为本发明数据挖掘装置的一个实施例的结构图。如图6所示,该实施例的装置60包括:存储器610以及耦接至该存储器610的处理器620,处理器620被配置为基于存储在存储器610中的指令,执行前述任意一个实施例中的数据挖掘方法。Figure 6 is a block diagram of an embodiment of a data mining device of the present invention. As shown in FIG. 6, the apparatus 60 of this embodiment includes a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to perform any of the foregoing embodiments based on instructions stored in the memory 610. Data mining method.
其中,存储器610例如可以包括***存储器、固定非易失性存储介质等。***存储器例如存储有操作***、应用程序、引导装载程序(Boot Loader)以及其他程序等。 The memory 610 may include, for example, a system memory, a fixed non-volatile storage medium, or the like. The system memory stores, for example, an operating system, an application, a boot loader, and other programs.
图7为本发明数据挖掘装置的又一个实施例的结构图。如图7所示,该实施例的装置70包括:存储器610以及处理器620,还可以包括输入输出接口730、网络接口740、存储接口750等。这些接口730,740,750以及存储器610和处理器620之间例如可以通过总线760连接。其中,输入输出接口730为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口740为各种联网设备提供连接接口。存储接口750为SD卡、U盘等外置存储设备提供连接接口。FIG. 7 is a structural diagram of still another embodiment of a data mining apparatus according to the present invention. As shown in FIG. 7, the apparatus 70 of this embodiment includes a memory 610 and a processor 620, and may further include an input/output interface 730, a network interface 740, a storage interface 750, and the like. These interfaces 730, 740, 750 and the memory 610 and the processor 620 can be connected, for example, via a bus 760. The input/output interface 730 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, and a touch screen. Network interface 740 provides a connection interface for various networked devices. The storage interface 750 provides a connection interface for an external storage device such as an SD card or a USB flash drive.
本领域内的技术人员应明白,本发明的实施例可提供为方法、***、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code. .
本发明还包括一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现前述任意一个实施例中的数据挖掘方法。The present invention also includes a computer readable storage medium having stored thereon computer instructions that, when executed by a processor, implement the data mining method of any of the foregoing embodiments.
本发明是参照根据本发明实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则 之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above is only the preferred embodiment of the present invention and is not intended to limit the present invention, and the spirit and principles of the present invention. Any modifications, equivalent substitutions, improvements, etc. made therein are intended to be included within the scope of the present invention.

Claims (16)

  1. 一种数据挖掘方法,其特征在于,包括:A data mining method, comprising:
    提取用户浏览数据中对象属性的属性值;Extracting attribute values of object attributes in the user browsing data;
    根据所述用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类;Classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data;
    依据对象属性的属性值类别对对象进行分类。Objects are classified according to their attribute value categories.
  2. 如权利要求1所述的方法,其特征在于,还包括:选择对象的多个属性;The method of claim 1 further comprising: selecting a plurality of attributes of the object;
    所述依据对象属性的属性值类别对所述对象进行分类包括:The classifying the objects according to the attribute value categories of the object attributes includes:
    依据对象的多个属性的属性值类别对所述对象进行分类。The objects are classified according to attribute value categories of a plurality of attributes of the object.
  3. 如权利要求2所述的方法,其特征在于,选择的对象的多个属性的相关性低于预设值。The method of claim 2 wherein the correlation of the plurality of attributes of the selected object is below a predetermined value.
  4. 如权利要求1所述的方法,其特征在于,所述根据所述用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类包括:The method according to claim 1, wherein the classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data comprises:
    确定所述用户浏览数据中对象属性的属性值的之间的相关系数,将相关系数高于预设值的属性值划分为一个类别。Determining a correlation coefficient between the attribute values of the object attributes in the user browsing data, and dividing the attribute values whose correlation coefficients are higher than the preset value into one category.
  5. 如权利要求1所述的方法,其特征在于,所述根据所述用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类包括:The method according to claim 1, wherein the classifying the attribute values of the object attributes according to the correlation between the attribute values of the object attributes in the user browsing data comprises:
    采用聚类的方法对对象属性的各个属性值的用户浏览数据进行聚类,按照聚类结果对对象属性的属性值进行分类。The clustering method is used to cluster the user browsing data of each attribute value of the object attribute, and the attribute value of the object attribute is classified according to the clustering result.
  6. 如权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1 wherein the method further comprises:
    根据用户浏览所述对象的数据,确定所述用户对应的所述对象的类别;Determining, according to data of the user browsing the object, a category of the object corresponding to the user;
    将对应类别的所述对象的推送信息发送给所述用户。Push information of the object of the corresponding category is sent to the user.
  7. 如权利要求6所述的方法,其特征在于,所述确定所述用户对应的所述对象的类别包括:The method according to claim 6, wherein the determining the category of the object corresponding to the user comprises:
    针对某类别的所述对象的浏览数据,如果所述用户的浏览次数大于所有用户的平均浏览次数,则确定所述用户对应所述类别;或/和For browsing data of the object of a certain category, if the browsing times of the user are greater than the average number of browsing times of all users, determining that the user corresponds to the category; or/and
    如果所述用户对某类别的所述对象的浏览次数与所述用户对所有类别的所述对象的浏览次数的占比大于预设比例,则确定所述用户对应所述类别。 If the ratio of the number of times the user browses the object of a certain category to the number of times the user browses the object of all categories is greater than a preset ratio, it is determined that the user corresponds to the category.
  8. 一种数据挖掘装置,其特征在于,包括:A data mining device, comprising:
    属性值提取模块,用于提取用户浏览数据中对象属性的属性值;An attribute value extraction module, configured to extract an attribute value of an object attribute in a user browsing data;
    属性值分类模块,用于根据所述用户浏览数据中对象属性的属性值之间的相关性对对象属性的属性值进行分类;An attribute value classification module, configured to classify an attribute value of the object attribute according to a correlation between attribute values of the object attribute in the user browsing data;
    对象分类模块,用于依据对象属性的属性值类别对对象进行分类。An object classification module for classifying objects according to attribute value categories of object attributes.
  9. 如权利要求8所述的装置,其特征在于,还包括属性选择模块,用于选择对象的多个属性;The device according to claim 8, further comprising an attribute selection module for selecting a plurality of attributes of the object;
    所述属性值分类模块还用于依据对象的多个属性的属性值类别对所述对象进行分类。The attribute value classification module is further configured to classify the object according to attribute value categories of multiple attributes of the object.
  10. 如权利要求9所述的装置,其特征在于,所述属性选择模块选择的对象的多个属性的相关性低于预设值。The apparatus according to claim 9, wherein the correlation of the plurality of attributes of the object selected by the attribute selection module is lower than a preset value.
  11. 如权利要求8所述的装置,其特征在于,所述属性值分类模块包括:The apparatus according to claim 8, wherein said attribute value classification module comprises:
    相关系数确定单元,用于确定所述用户浏览数据中对象属性的属性值的之间的相关系数;a correlation coefficient determining unit, configured to determine a correlation coefficient between attribute values of object attributes in the user browsing data;
    分类单元,用于将相关系数高于预设值的属性值划分为一个分类。A classification unit for dividing an attribute value whose correlation coefficient is higher than a preset value into one classification.
  12. 如权利要求8所述的装置,其特征在于,所述属性值分类模块包括:The apparatus according to claim 8, wherein said attribute value classification module comprises:
    聚类单元,用于采用聚类的方法对对象属性的各个属性值的用户浏览数据进行聚类;a clustering unit, configured to cluster user browsing data of each attribute value of the object attribute by using a clustering method;
    分类单元,用于按照聚类结果对对象属性的属性值进行分类。A classification unit for classifying attribute values of object attributes according to clustering results.
  13. 如权利要求8所述的装置,其特征在于,所述装置还包括:The device of claim 8 further comprising:
    对象类别确定模块,用于根据用户浏览所述对象的数据,确定所述用户对应的所述对象的类别;An object category determining module, configured to determine, according to data that the user browses the object, a category of the object corresponding to the user;
    信息发送模块,用于将对应类别的所述对象的推送信息发送给所述用户。And an information sending module, configured to send push information of the object of the corresponding category to the user.
  14. 如权利要求13所述的装置,其特征在于,所述对象类别确定模块用于:The apparatus according to claim 13, wherein said object class determining module is configured to:
    针对某类别的所述对象的浏览数据,如果所述用户的浏览次数大于所有用户的平均浏览次数,则确定所述用户对应所述类别;或/和For browsing data of the object of a certain category, if the browsing times of the user are greater than the average number of browsing times of all users, determining that the user corresponds to the category; or/and
    如果所述用户对某类别的所述对象的浏览次数与所述用户对所有类别的所述对象的浏览次数的占比大于预设比例,则确定所述用户对应所述类别。If the ratio of the number of times the user browses the object of a certain category to the number of times the user browses the object of all categories is greater than a preset ratio, it is determined that the user corresponds to the category.
  15. 一种数据挖掘装置,其特征在于,包括: A data mining device, comprising:
    存储器;以及Memory;
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行如权利要求1至7中任一项所述的数据挖掘方法。A processor coupled to the memory, the processor being configured to perform the data mining method of any one of claims 1 to 7 based on instructions stored in the memory.
  16. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,所述指令被处理器执行时实现权利要求1至7中任一项所述的数据挖掘方法。 A computer readable storage medium, characterized in that the computer readable storage medium stores computer instructions that, when executed by a processor, implement the data mining method of any one of claims 1 to 7.
PCT/CN2017/093145 2016-08-30 2017-07-17 Data mining method and apparatus WO2018040762A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610771130.4 2016-08-30
CN201610771130.4A CN106327266B (en) 2016-08-30 2016-08-30 Data mining method and device

Publications (1)

Publication Number Publication Date
WO2018040762A1 true WO2018040762A1 (en) 2018-03-08

Family

ID=57788877

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/093145 WO2018040762A1 (en) 2016-08-30 2017-07-17 Data mining method and apparatus

Country Status (2)

Country Link
CN (1) CN106327266B (en)
WO (1) WO2018040762A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339250A (en) * 2020-02-20 2020-06-26 北京百度网讯科技有限公司 Mining method of new category label, electronic equipment and computer readable medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106327266B (en) * 2016-08-30 2021-05-25 北京京东尚科信息技术有限公司 Data mining method and device
CN108985806A (en) * 2017-05-31 2018-12-11 北京京东尚科信息技术有限公司 Method and apparatus for selecting item property
CN109993617A (en) * 2017-12-29 2019-07-09 北京京东尚科信息技术有限公司 Data processing method, system and computer-readable medium
CN111199437A (en) * 2018-11-16 2020-05-26 北京京东尚科信息技术有限公司 Data processing method and device
CN110321363A (en) * 2019-04-19 2019-10-11 中国工商银行股份有限公司 Data retrieval method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699702A (en) * 2013-12-09 2015-06-10 ***股份有限公司 Data mining and classifying method
US20150169758A1 (en) * 2013-12-17 2015-06-18 Luigi ASSOM Multi-partite graph database
CN105677810A (en) * 2015-12-30 2016-06-15 天津盛购科技发展有限公司 Online shopping product searching system based on keyword analysis
CN105760446A (en) * 2016-02-03 2016-07-13 杭州驭猫科技有限公司 Big data analysis method for shopping website
CN106327266A (en) * 2016-08-30 2017-01-11 北京京东尚科信息技术有限公司 Data mining method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467726B (en) * 2010-11-04 2015-07-29 阿里巴巴集团控股有限公司 A kind of data processing method based on online trade platform and device
CN103514178A (en) * 2012-06-18 2014-01-15 阿里巴巴集团控股有限公司 Searching and sorting method and device based on click rate
CN104679771B (en) * 2013-11-29 2018-09-18 阿里巴巴集团控股有限公司 A kind of individuation data searching method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699702A (en) * 2013-12-09 2015-06-10 ***股份有限公司 Data mining and classifying method
US20150169758A1 (en) * 2013-12-17 2015-06-18 Luigi ASSOM Multi-partite graph database
CN105677810A (en) * 2015-12-30 2016-06-15 天津盛购科技发展有限公司 Online shopping product searching system based on keyword analysis
CN105760446A (en) * 2016-02-03 2016-07-13 杭州驭猫科技有限公司 Big data analysis method for shopping website
CN106327266A (en) * 2016-08-30 2017-01-11 北京京东尚科信息技术有限公司 Data mining method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339250A (en) * 2020-02-20 2020-06-26 北京百度网讯科技有限公司 Mining method of new category label, electronic equipment and computer readable medium
CN111339250B (en) * 2020-02-20 2023-08-18 北京百度网讯科技有限公司 Mining method for new category labels, electronic equipment and computer readable medium
US11755654B2 (en) 2020-02-20 2023-09-12 Beijing Baidu Netcom Science Technology Co., Ltd. Category tag mining method, electronic device and non-transitory computer-readable storage medium

Also Published As

Publication number Publication date
CN106327266A (en) 2017-01-11
CN106327266B (en) 2021-05-25

Similar Documents

Publication Publication Date Title
WO2018040762A1 (en) Data mining method and apparatus
US9348500B2 (en) Categorizing apparatus and categorizing method
TWI570577B (en) Slicer elements for filtering tabular data
KR102199786B1 (en) Information Obtaining Method and Apparatus
US20150278359A1 (en) Method and apparatus for generating a recommendation page
US20070186154A1 (en) Smart arrangement and cropping for photo views
KR20170041785A (en) Icon resizing
CN109635199B (en) Application list dynamic recommendation method and system based on user behaviors
CN109753601B (en) Method and device for determining click rate of recommended information and electronic equipment
JP2019519009A (en) Business customization device, method, system and storage medium based on data source
JP2016517599A5 (en)
JP2017500664A (en) Query construction for execution against multidimensional data structures
US20110231424A1 (en) Method and system for automated file aggregation on a storage device
CN105612511B (en) Identifying and structuring related data
CN110996154B (en) Video playing method and device and electronic equipment
CN103793381A (en) Sorting method and device
WO2020082376A1 (en) Desktop management method and apparatus, mobile terminal, and storage medium
JPWO2009031297A1 (en) Image search device, image classification device and method, and program
CN106372090B (en) Query clustering method and device
WO2020252751A1 (en) Resource pushing method, device, server, and storage medium
US9513794B2 (en) Event visualization and control
WO2018166499A1 (en) Text classification method and device, and storage medium
KR101483611B1 (en) Method and Terminal for Extracting a Object from Image
WO2017020719A1 (en) Method and device for classifying object information in page
CN106055688A (en) Search result display method and device and mobile terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17845065

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13.06.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17845065

Country of ref document: EP

Kind code of ref document: A1