CN113688926B - Website behavior classification method, system, storage medium and equipment - Google Patents
Website behavior classification method, system, storage medium and equipment Download PDFInfo
- Publication number
- CN113688926B CN113688926B CN202111014054.XA CN202111014054A CN113688926B CN 113688926 B CN113688926 B CN 113688926B CN 202111014054 A CN202111014054 A CN 202111014054A CN 113688926 B CN113688926 B CN 113688926B
- Authority
- CN
- China
- Prior art keywords
- data
- membership
- class
- filtering
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000001914 filtration Methods 0.000 claims abstract description 73
- 238000012216 screening Methods 0.000 claims abstract description 10
- 230000006399 behavior Effects 0.000 claims description 71
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims 3
- 230000001143 conditioned effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 5
- 238000003709 image segmentation Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of website behavior classification, and provides a website behavior classification method, a system, a storage medium and equipment. The method comprises the steps of obtaining a website behavior data set; wherein, one attribute of each data in the set is a dimension; screening neighbors of each data to determine a filtering window of the corresponding data; randomly selecting a preset number of pieces of data from the website behavior data set to be respectively used as class center data, and calculating membership degrees of all data in the website behavior data set to all class center data; based on the filtering window of each data, each dimension of each data is used as a guide to filter membership degrees, and weighted summation of membership degrees after multidimensional filtering is used as the membership degree after final filtering; updating the class center data of each class by utilizing the final filtered membership degree, and further updating the attribute weight of each dimension of each class; and (3) iteratively calculating and judging the termination condition of the step of updating the class center data, and finally outputting the website behavior classification result.
Description
Technical Field
The invention belongs to the field of website behavior classification, and particularly relates to a website behavior classification method, a system, a storage medium and equipment.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The guide filtering is an image filtering method that can effectively remove noise and maintain edge information of a guide image, and is widely used in image segmentation, enhancement, defogging, and the like. This technique generally takes an image to be processed as a guide image, and performs filtering processing on an input image using information of the guide image, resulting in a filtered image having gradient information of the guide image and effectively removing noise. In recent years, in order to solve the problem that the clustering segmentation result is not accurate enough due to the fact that the traditional clustering algorithm cannot well utilize the spatial information of the image, a plurality of students apply the guided filtering method to the clustering process, and a plurality of fuzzy clustering algorithms related to the guided filtering are provided. The method takes the image to be segmented as a guide image, filters the membership degree obtained through fuzzy C-means, so that the membership degree can contain more gradient information, and the accuracy of image segmentation is improved.
In recent years, research work for adding guided filtering to fuzzy clustering for image segmentation has gained increasing attention. However, the current fuzzy clustering algorithm based on the guided filtering is only limited to the problem of image segmentation, and the guided filtering is mainly used for processing the images and is not suitable for website behavior analysis data. The website behavior analysis data also has space information, and the mining of potential information of the data has important significance for more accurate classification. However, the current fuzzy clustering method with spatial information is difficult to calculate or information is easy to lose in the clustering process.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a website behavior classification method, a system, a storage medium and equipment, which can accurately classify website behaviors.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a first aspect of the present invention provides a website behavior classification method, including:
acquiring a website behavior data set; wherein, one attribute of each data in the set is a dimension;
screening neighbors of each data to determine a filtering window of the corresponding data;
randomly selecting a preset number of pieces of data from the website behavior data set to be respectively used as class center data, and calculating membership degrees of all data in the website behavior data set to all class center data;
based on the filtering window of each data, each dimension of each data is used as a guide to filter membership degrees, and weighted summation of membership degrees after multidimensional filtering is used as the membership degree after final filtering;
updating the class center data of each class by utilizing the final filtered membership degree, and further updating the attribute weight of each dimension of each class;
and (3) iteratively calculating and judging the termination condition of the step of updating the class center data, and finally outputting the website behavior classification result.
Further, each data in the set contains at least two attributes.
Further, use is made ofNearest neighbor method finds nearest ++for each data in website behavior data set>Stripe data, this->The stripe data is the neighbor of the corresponding data; />Is a positive integer greater than or equal to 1.
Further, find the nearest data for each data in the website behavior data setThe process of the bar data is as follows:
calculating a distance matrix of the data by using Euclidean distance;
find the nearest data including itself for each dataAnd the neighbors.
Further, the process of determining the filter window of the corresponding data is:
screening the neighbors of each data point by using subtraction or addition to ensure that each data point and the neighbors are neighbors;
each data point has its remaining neighbors with symmetry as a filtering window.
Further, the formula for filtering the membership degree by using each dimension of each data as a guide respectively is thatWherein->Represents the%>First->Post-dimensional filtering->The degree of membership of the class,represents->The data belong to->Class membership->Representing guide data->Person->Value of dimension->Indicating guidance data +.>A data-centric window,>and->Representation window->The (1) th part of the body>Linear coefficients of dimensions.
Further, the terminating condition of the step of updating the respective class center data is: and iteratively calculating that the difference value between the two adjacent set objective function values is smaller than a set value or the iteration number exceeds a set threshold.
A second aspect of the present invention provides a website behavior classification system comprising:
the website behavior data acquisition module is used for acquiring a website behavior data set; wherein, one attribute of each data in the set is a dimension;
a filtering window determining module, which is used for screening the neighbor of each data to determine the filtering window of the corresponding data;
the class center data initializing module is used for randomly selecting a preset number of pieces of data from the website behavior data set to serve as class center data respectively, and calculating membership degrees of all data in the website behavior data set to all class center data;
the membership calculation module is used for filtering membership based on a filtering window of each data, and then using each dimension of each data as a guide to filter the membership respectively, and taking weighted summation of the membership after multi-dimensional filtering as the membership after final filtering;
the attribute weight updating module is used for updating the class center data of each class by utilizing the final filtered membership degree so as to update the attribute weight of each dimension;
and the classification result output module is used for iteratively calculating and judging the termination condition of the step of updating the class center data of each class, and finally outputting the website behavior classification result.
A third aspect of the present invention provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a website behavior classification method as described above.
A fourth aspect of the invention provides a computer device.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the website behavior classification method as described above when the program is executed.
Compared with the prior art, the invention has the beneficial effects that:
the invention screens the neighbors of each data in the website behavior data set to determine the filtering window of the corresponding data, randomly selects a preset number of data from the website behavior data set to be respectively used as class center data, calculates the membership degree of each data in the website behavior data set to be respectively used as class center data, filters the membership degree by using each dimension of each data as a guide based on the filtering window of each data, and weights and sums the membership degrees after multidimensional filtering to be used as the membership degrees after final filtering, thereby being capable of more accurately mining the interests and the preferences of users in website behavior analysis by using the guide filtering, and further improving the accuracy of website behavior classification.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
FIG. 1 is a diagram of data selection in a filter window according to an embodiment of the present invention;
FIG. 2 is a graph of a guided filtering versus membership filtering process in accordance with an embodiment of the present invention;
FIG. 3 is a detailed process of guided filtering on a first class of membership filtering in accordance with an embodiment of the present invention;
FIG. 4 is a flowchart of a website behavior classification method according to an embodiment of the invention.
Detailed Description
The invention will be further described with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
As shown in fig. 4, the embodiment provides a website behavior classification method, which specifically includes the following steps:
s101: acquiring a website behavior data set; wherein one attribute of each data in the collection is a dimension.
Wherein each data in the set contains at least two attributes.
Reading in website behavior data to be clusteredWherein->Here->Is the number of samples of website behavior data to be clustered, < +.>The number of attributes contained in each piece of website behavior data is referred to as a dimension hereinafter, where the attributes include, but are not limited to, user ID, device type, gender, age, time of event, location, duration, specific time, specific operation, etc. It should be noted that, in this embodiment, the website behavior data sets are all data obtained by legal approaches.
S102: the neighbors of each data are filtered to determine the filter window for the corresponding data, as shown in fig. 1.
In a specific implementation, for example, the following are set: number of neighborsBy using->The nearest neighbor method finds the nearest +.>Stripe data, this->The stripe data is the neighbor of the data, and the filtering window of each data is determined by screening the neighbor of each data. />Is a positive integer greater than or equal to 1.
Wherein, find the nearest data for each data in the website behavior data setThe process of the bar data is as follows:
using Euclidean distanceCalculating a distance matrix of the data;
find the nearest data including itself for each dataAnd the neighbors.
Specifically, the method for determining the filter window includes the following steps:
taking into account the dataIs data->Neighbor, data->Not necessarily data->Is a neighbor of (c). It is therefore considered to filter the neighbors of each data point using subtraction (addition) to ensure that each data point and its neighbors are neighbors to each other. If data->Is data->Neighbor, data->Not data->Is to be +.>From->Is deleted from the neighbors of (2), and the addition filtering is to add +.>Added to->Is a neighbor of (2);
each data point has its remaining neighbors with symmetry as a filtering window.
S103: randomly selecting a preset number of pieces of data from the website behavior data set to be used as class center data respectively, and calculating membership degrees of all data in the website behavior data set to all class center data.
Presetting a cluster numberRandomly initialize->A clustering center, wherein the clustering center is selected from website behavior data to be clustered>The pieces of data are respectively used as class center data, wherein each piece of data has +.>Attribute, to iterate counterSet to 0, maximum number of iterations +.>Set to 150, the weight of each dimension is set to +.>Stop threshold of fuzzy clustering algorithm +.>Set to 10 -6 。
S104: based on the filtering window of each data, each dimension of each data is used as a guide to filter the membership degree, and weighted summation of the membership degrees after multi-dimensional filtering is used as the membership degree after final filtering, as shown in fig. 2.
Specifically, calculate the firstThe data belong to->Membership degree of individual cluster centers +.>The method comprises the steps of carrying out a first treatment on the surface of the Filtering the membership degree by using each dimension of each data in the website behavior data set as a guide, weighting and summing the membership degrees after multidimensional filtering to obtain the final filtered membership degree, and using the filtered membership degree for subsequent calculation.
Wherein, as shown in fig. 3, the filtering the membership degree by the guided filtering includes the following steps:
(1) To be obtainedThe membership matrix of (2) is divided into +.>Personal->Membership matrix of (a);
(2) Each dimension of the original data is respectively used as guiding data, and membership degree of each class is calculated according to a formulaFiltering, wherein->Represents the%>First->Post-dimensional filtering->Class membership->Represents->The data belong to->Class membership->Representing guide data->Person->Value of dimension->Indicating guidance data +.>A data-centric window,>and->Representation window->The (1) th part of the body>Linear coefficient of dimension,/->Is to prevent->The oversized pilot filter parameter, here typically takes a value of 10 -4 Using the formula +.>And->To obtain->And->Wherein->And->Indicate the->Dimension in window->Mean and variance of>Is window->The number of data in>Is input membership +.>In window->Is a mean value of (c).
Wherein, the membership calculation formula isIn the formula->Is->Class I->Attribute weight of dimension->Is the fuzzy coefficient, here generally takes the value 2, < >>Is->First->Value of dimension->Is->First of clustering centers>Values of dimensions.
S105: and updating the class center data of each class by utilizing the final filtered membership, and further updating the attribute weight of each dimension of each class.
Combining the obtained filtered membership degree updateCluster center of class->The obtained clustering center is used for subsequent calculation; updating the +.sup.th in combination with the membership and cluster center obtained above>Attribute weight of mth dimension of class +.>。
The calculation formula of the clustering center is as follows;
The result after multidimensional filtering is according toWeighted summation yields the final filtered membership, here +.>Indicate->Class I->The dimensions are weighted in two ways, one being by mean weighting, i.e. each dimension is weighted +.>The other is to use the EFWFCM weight update formula +.>Weights determined, here->Is a regularized scalar.
S106: and (3) iteratively calculating and judging the termination condition of the step of updating the class center data, and finally outputting the website behavior classification result.
Wherein the terminating conditions for the step of updating the respective class center data are: and iteratively calculating that the difference value between the two adjacent set objective function values is smaller than a set value or the iteration number exceeds a set threshold.
Calculate the firstObjective function value obtained by multiple iterations>;
Calculate the firstThe value of the objective function obtained by the iteration +.>And->Objective function value +.>The difference between them, if it is satisfied->Or->And (3) ending the iteration, outputting a clustering result, and repeatedly executing the steps S103 to S106 if the clustering result is not met until the iteration ending condition is met, and outputting the clustering result.
Wherein the formula is usedTo calculate +.>Objective function value obtained by multiple iterations>。
Example two
The embodiment provides a website behavior classification method, which specifically comprises the following modules:
the website behavior data acquisition module is used for acquiring a website behavior data set; wherein, one attribute of each data in the set is a dimension;
a filtering window determining module, which is used for screening the neighbor of each data to determine the filtering window of the corresponding data;
the class center data initializing module is used for randomly selecting a preset number of pieces of data from the website behavior data set to serve as class center data respectively, and calculating membership degrees of all data in the website behavior data set to all class center data;
the membership calculation module is used for filtering membership based on a filtering window of each data, and then using each dimension of each data as a guide to filter the membership respectively, and taking weighted summation of the membership after multi-dimensional filtering as the membership after final filtering;
the attribute weight updating module is used for updating the class center data of each class by utilizing the final filtered membership degree so as to update the attribute weight of each dimension;
and the classification result output module is used for iteratively calculating and judging the termination condition of the step of updating the class center data of each class, and finally outputting the website behavior classification result.
Here, each module of the present embodiment corresponds to each step in the first embodiment, and the implementation process is the same, which is not described here.
Example III
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the website behavior classification method as described in the above embodiment one.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random access Memory (Random AccessMemory, RAM), or the like.
Example IV
The present embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in the website behavior classification method according to the above embodiment when the processor executes the program.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A method for classifying website behaviors, comprising:
acquiring website behavior dataA collection; wherein, one attribute of each data in the set is a dimension; each data in the set at least comprises two attributes; reading in website behavior data to be clusteredWherein,/>Is the number of samples of website behavior data to be clustered, < +.>The number of the attributes contained in the behavior data of each website is called dimension, and the attributes comprise user ID, equipment type, gender, age, time, place, duration, specific time and specific operation of the event;
screening neighbors of each data to determine a filtering window of the corresponding data; the process of determining the filter window of the corresponding data is as follows: screening the neighbors of each data point by using subtraction or addition to ensure that each data point and the neighbors are neighbors; each data point takes the neighbor with symmetry reserved by the data point as a filtering window; usingNearest neighbor method finds nearest ++for each data in website behavior data set>Stripe data, this->The stripe data is the neighbor of the corresponding data; />Is a positive integer greater than or equal to 1;
randomly selecting a preset number of pieces of data from the website behavior data set to be respectively used as class center data, and calculating membership degrees of all data in the website behavior data set to all class center data; presetting a cluster numberRandomly initialize->A clustering center, wherein the clustering center is selected from website behavior data to be clustered>The pieces of data are respectively used as class center data, wherein each piece of data has +.>Attribute, iteration counter->Set to 0, the maximum number of iterations is +.>The weight of each dimension is set to +.>The stop threshold of the fuzzy clustering algorithm is set to +.>;
Based on the filtering window of each data, each dimension of each data is used as a guide to filter membership degrees, and weighted summation of membership degrees after multidimensional filtering is used as the membership degree after final filtering; wherein, the filtering the membership degree by the guiding filtering comprises the following steps:
(1) To be obtainedThe membership matrix of (2) is divided into +.>Personal->Membership matrix of (a);
(2) Each dimension of the original data is respectively used as guiding data, and membership degree of each class is calculated according to a formulaFiltering, wherein->Represents->First->Post-dimensional filtering->The degree of membership of the class,represents->The data belong to->Class membership->Representing guide data->Person->Value of dimension->Indicating guidance data +.>A data-centric window, using the formula +.>And->To obtain->And->Wherein->And->Representation window->The (1) th part of the body>Linear coefficient of dimension,/->Is to prevent->Oversized pilot filter parameters +.>And->Indicate the->Dimension in window->Mean and variance of>Is window->The number of data in>Is input membership +.>In window->Is the average value of (2);
wherein, the membership calculation formula isIn the formula->Is->Class I->Attribute weight of dimension->Is a fuzzy coefficient, +.>Is->First of clustering centers>Values of dimensions;
updating the class center data of each class by utilizing the final filtered membership degree, and further updating the attribute weight of each dimension of each class;
combining the obtained filtered membership degree updateCluster center of class->The obtained clustering center is used for subsequent calculation; updating the +.sup.th in combination with the membership and cluster center obtained above>Attribute weight of mth dimension of class +.>;
The calculation formula of the clustering center is as follows;
The result after multidimensional filtering is according toWeighted summation is carried out to obtain final membership after filtering; the weighting mode adopts mean weighting or an EFWFCM weight updating formula to calculate the weight; each dimension of the mean weighting is weighted asThe method comprises the steps of carrying out a first treatment on the surface of the The EFWFCM weight update formula is +.>,/>Is a regularized scalar;;
and (3) iteratively calculating and judging the termination condition of the step of updating the class center data, and finally outputting the website behavior classification result.
2. The web site activity classification method of claim 1 wherein a nearest is found for each data in the web site activity data setThe process of the bar data is as follows:
calculating a distance matrix of the data by using Euclidean distance;
find the nearest data including itself for each dataAnd the neighbors.
3. The web site activity classification method of claim 1 wherein the terminating of the step of updating the respective class center data is conditioned by: and iteratively calculating that the difference value between the two adjacent set objective function values is smaller than a set value or the iteration number exceeds a set threshold.
4. A web site behavior classification system, comprising:
the website behavior data acquisition module is used for acquiring a website behavior data set; wherein, one attribute of each data in the set is a dimension; each data in the set at least comprises two attributes; reading in website behavior data to be clusteredWherein->,/>Is the number of samples of website behavior data to be clustered, < +.>The number of the attributes contained in the behavior data of each website is called dimension, and the attributes comprise user ID, equipment type, gender, age, time, place, duration, specific time and specific operation of the event;
a filtering window determining module, which is used for screening the neighbor of each data to determine the filtering window of the corresponding data; the process of determining the filter window of the corresponding data is as follows: screening the neighbors of each data point by using subtraction or addition to ensure that each data point and the neighbors are neighbors; each data point takes the neighbor with symmetry reserved by the data point as a filtering window; usingNearest neighbor method finds nearest ++for each data in website behavior data set>Stripe data, this->The stripe data is the neighbor of the corresponding data; />Is a positive integer greater than or equal to 1;
the class center data initializing module is used for randomly selecting a preset number of pieces of data from the website behavior data set to be respectively used as class center data, and calculating that each piece of data in the website behavior data set belongs to each piece of dataClass center data membership; presetting a cluster numberRandomly initialize->A clustering center selected from the website behavior data to be clusteredThe pieces of data are respectively used as class center data, wherein each piece of data has +.>Attribute, iteration counter->Set to 0, the maximum number of iterations is +.>The weight of each dimension is set to +.>The stop threshold of the fuzzy clustering algorithm is set to +.>;
The membership calculation module is used for filtering membership based on a filtering window of each data, and then using each dimension of each data as a guide to filter the membership respectively, and taking weighted summation of the membership after multi-dimensional filtering as the membership after final filtering; wherein, the filtering the membership degree by the guiding filtering comprises the following steps:
(1) To be obtainedThe membership matrix of (2) is divided into +.>Personal->Membership matrix of (a);
(2) Each dimension of the original data is respectively used as guiding data, and membership degree of each class is calculated according to a formulaFiltering, wherein->Represents->First->Post-dimensional filtering->The degree of membership of the class,represents->The data belong to->Class membership->Representing guide data->Person->Value of dimension->Indicating guidance data +.>A data-centric window, using the formula +.>And->To obtain->And->Wherein->And->Representation window->The (1) th part of the body>Linear coefficient of dimension,/->Is to prevent->Oversized pilot filter parameters +.>And->Indicate the->Dimension in window->Mean and variance of>Is window->The number of data in>Is input membership +.>In window->Is the average value of (2);
wherein, the membership calculation formula isIn the formula->Is->Class I->Attribute weight of dimension->Is a fuzzy coefficient, +.>Is->First of clustering centers>Values of dimensions;
the attribute weight updating module is used for updating the class center data of each class by utilizing the final filtered membership degree so as to update the attribute weight of each dimension;
combining the obtained filtered membership degree updateCluster center of class->The obtained clustering center is used for subsequent calculation; updating the +.sup.th in combination with the membership and cluster center obtained above>Attribute weight of mth dimension of class +.>;
The calculation formula of the clustering center is as follows;
The result after multidimensional filtering is according toWeighted summation is carried out to obtain final membership after filtering; the weighting mode adopts mean weighting or an EFWFCM weight updating formula to calculate the weight; each dimension of the mean weighting is weighted asThe method comprises the steps of carrying out a first treatment on the surface of the The EFWFCM weight update formula is +.>,/>Is a regularized scalar;;
and the classification result output module is used for iteratively calculating and judging the termination condition of the step of updating the class center data of each class, and finally outputting the website behavior classification result.
5. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the website behavior classification method of any of claims 1-3.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the website behavior classification method of any one of claims 1-3 when the program is executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111014054.XA CN113688926B (en) | 2021-08-31 | 2021-08-31 | Website behavior classification method, system, storage medium and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111014054.XA CN113688926B (en) | 2021-08-31 | 2021-08-31 | Website behavior classification method, system, storage medium and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113688926A CN113688926A (en) | 2021-11-23 |
CN113688926B true CN113688926B (en) | 2024-03-08 |
Family
ID=78584470
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111014054.XA Active CN113688926B (en) | 2021-08-31 | 2021-08-31 | Website behavior classification method, system, storage medium and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113688926B (en) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605794A (en) * | 2013-12-05 | 2014-02-26 | 国家计算机网络与信息安全管理中心 | Website classifying method |
CN106651838A (en) * | 2016-11-15 | 2017-05-10 | 山东师范大学 | Gel protein partitioning method based on fuzzy clustering |
CN107643925A (en) * | 2017-09-30 | 2018-01-30 | 广东欧珀移动通信有限公司 | Background application method for cleaning, device, storage medium and electronic equipment |
CN108197650A (en) * | 2017-12-30 | 2018-06-22 | 南京理工大学 | The high spectrum image extreme learning machine clustering method that local similarity is kept |
CN109285175A (en) * | 2018-08-15 | 2019-01-29 | 中国科学院苏州生物医学工程技术研究所 | The fuzzy clustering image partition method filtered based on morphological reconstruction and degree of membership |
CN109685820A (en) * | 2018-11-29 | 2019-04-26 | 济南大学 | Image partition method based on morphological reconstruction with the FCM cluster with guidance filtering |
CN109726738A (en) * | 2018-11-30 | 2019-05-07 | 济南大学 | Data classification method based on transfer learning Yu attribute entropy weighted fuzzy clustering |
CN109741330A (en) * | 2018-12-21 | 2019-05-10 | 东华大学 | A kind of medical image cutting method of mixed filtering strategy and fuzzy C-mean algorithm |
CN110569915A (en) * | 2019-09-12 | 2019-12-13 | 齐鲁工业大学 | automobile data clustering method and system based on intuitive fuzzy C-means |
CN110659930A (en) * | 2019-08-27 | 2020-01-07 | 深圳大学 | Consumption upgrading method and device based on user behaviors, storage medium and equipment |
CN111062394A (en) * | 2019-11-18 | 2020-04-24 | 济南大学 | Fuzzy clustering color image segmentation method based on multi-channel weighting guide filtering |
CN111932472A (en) * | 2020-07-27 | 2020-11-13 | 江苏大学 | Image edge-preserving filtering method based on soft clustering |
CN113222924A (en) * | 2021-04-30 | 2021-08-06 | 西安电子科技大学 | Hyperspectral image anomaly detection system based on FPGA |
-
2021
- 2021-08-31 CN CN202111014054.XA patent/CN113688926B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103605794A (en) * | 2013-12-05 | 2014-02-26 | 国家计算机网络与信息安全管理中心 | Website classifying method |
CN106651838A (en) * | 2016-11-15 | 2017-05-10 | 山东师范大学 | Gel protein partitioning method based on fuzzy clustering |
CN107643925A (en) * | 2017-09-30 | 2018-01-30 | 广东欧珀移动通信有限公司 | Background application method for cleaning, device, storage medium and electronic equipment |
CN108197650A (en) * | 2017-12-30 | 2018-06-22 | 南京理工大学 | The high spectrum image extreme learning machine clustering method that local similarity is kept |
CN109285175A (en) * | 2018-08-15 | 2019-01-29 | 中国科学院苏州生物医学工程技术研究所 | The fuzzy clustering image partition method filtered based on morphological reconstruction and degree of membership |
CN109685820A (en) * | 2018-11-29 | 2019-04-26 | 济南大学 | Image partition method based on morphological reconstruction with the FCM cluster with guidance filtering |
CN109726738A (en) * | 2018-11-30 | 2019-05-07 | 济南大学 | Data classification method based on transfer learning Yu attribute entropy weighted fuzzy clustering |
CN109741330A (en) * | 2018-12-21 | 2019-05-10 | 东华大学 | A kind of medical image cutting method of mixed filtering strategy and fuzzy C-mean algorithm |
CN110659930A (en) * | 2019-08-27 | 2020-01-07 | 深圳大学 | Consumption upgrading method and device based on user behaviors, storage medium and equipment |
CN110569915A (en) * | 2019-09-12 | 2019-12-13 | 齐鲁工业大学 | automobile data clustering method and system based on intuitive fuzzy C-means |
CN111062394A (en) * | 2019-11-18 | 2020-04-24 | 济南大学 | Fuzzy clustering color image segmentation method based on multi-channel weighting guide filtering |
CN111932472A (en) * | 2020-07-27 | 2020-11-13 | 江苏大学 | Image edge-preserving filtering method based on soft clustering |
CN113222924A (en) * | 2021-04-30 | 2021-08-06 | 西安电子科技大学 | Hyperspectral image anomaly detection system based on FPGA |
Non-Patent Citations (3)
Title |
---|
Morphological Reconstruction-Based Image-Guided Fuzzy Clustering with a Novel Impact Factor;Qingxue Qin等;Journal of Healthcare Engineering;20210914;第2021卷;1-13 * |
基于引导滤波的模糊聚类算法研究及图像分割应用;徐广梅;CNKI优秀硕士学位论文全文库;20210115;全文 * |
模糊评判法和模糊聚类分析在优化布点中的应用;潘玉奇;微电子学与计算机;20070831;第24卷(第8期);169-172 * |
Also Published As
Publication number | Publication date |
---|---|
CN113688926A (en) | 2021-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Vijay et al. | An efficient brain tumor detection methodology using K-means clustering algoriftnn | |
KR101434170B1 (en) | Method for study using extracted characteristic of data and apparatus thereof | |
Shao et al. | Using the maximum between-class variance for automatic gridding of cDNA microarray images | |
CN111753987A (en) | Method and device for generating machine learning model | |
Boss et al. | Mammogram image segmentation using fuzzy clustering | |
WO2022033015A1 (en) | Method and apparatus for processing abnormal region in image, and image segmentation method and apparatus | |
CN113688926B (en) | Website behavior classification method, system, storage medium and equipment | |
CN116504314B (en) | Gene regulation network construction method based on cell dynamic differentiation | |
Dehariya et al. | Brain image segmentation to diagnose tumor by applying wiener filter and intelligent water drop algorithm | |
Aswathy et al. | MRI brain tumor segmentation using genetic algorithm with SVM classifier | |
Biju et al. | A genetic algorithm based fuzzy C mean clustering model for segmenting microarray images | |
Corso et al. | Segmentation of sub-cortical structures by the graph-shifts algorithm | |
Vadaparthi et al. | Segmentation of brain mr images based on finite skew gaussian mixture model with fuzzy c-means clustering and em algorithm | |
CN115240843A (en) | Fairness prediction system based on structure causal model | |
JP2016520220A (en) | Hidden attribute model estimation device, method and program | |
JP7429514B2 (en) | machine learning device | |
CN113656707A (en) | Financing product recommendation method, system, storage medium and equipment | |
Ramathilaga et al. | Two novel fuzzy clustering methods for solving data clustering problems | |
CN112419047A (en) | Method and system for predicting overdue of bank personal loan by utilizing characteristic trend analysis | |
Ackerman et al. | Density-based interpretable hypercube region partitioning for mixed numeric and categorical data | |
Mandal et al. | Adaptive median filtering based on unsupervised classification of pixels | |
Xu et al. | An improved fuzzy c-means clustering algorithm with guided filter for image segmentation | |
CN110825707A (en) | Data compression method | |
CN114821206B (en) | Multi-modal image fusion classification method and system based on confrontation complementary features | |
Paulraj et al. | An effective approach of CT Lung segmentation using possibilistic fuzzy c-means algorithm and classification of lung cancer cells with the aid of soft computing techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |