CN110532485B - User behavior detection method and device based on multi-source data fusion - Google Patents
User behavior detection method and device based on multi-source data fusion Download PDFInfo
- Publication number
- CN110532485B CN110532485B CN201910624299.0A CN201910624299A CN110532485B CN 110532485 B CN110532485 B CN 110532485B CN 201910624299 A CN201910624299 A CN 201910624299A CN 110532485 B CN110532485 B CN 110532485B
- Authority
- CN
- China
- Prior art keywords
- user
- target user
- behavior
- class
- users
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 50
- 230000004927 fusion Effects 0.000 title claims abstract description 30
- 239000013598 vector Substances 0.000 claims abstract description 157
- 238000000034 method Methods 0.000 claims abstract description 20
- 230000002159 abnormal effect Effects 0.000 claims abstract description 16
- 230000006399 behavior Effects 0.000 claims description 263
- 206010000117 Abnormal behaviour Diseases 0.000 claims description 20
- 238000012935 Averaging Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000004891 communication Methods 0.000 description 6
- 230000003203 everyday effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 101100059544 Arabidopsis thaliana CDC5 gene Proteins 0.000 description 2
- 101150115300 MAC1 gene Proteins 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 101100244969 Arabidopsis thaliana PRL1 gene Proteins 0.000 description 1
- 102100039558 Galectin-3 Human genes 0.000 description 1
- 101100454448 Homo sapiens LGALS3 gene Proteins 0.000 description 1
- 101150051246 MAC2 gene Proteins 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention provides a user behavior detection method and device based on multi-source data fusion, wherein the method comprises the following steps: acquiring a current behavior feature vector of a target user and a current behavior average feature vector of other users in a target user class based on a campus wireless network user log; acquiring a first historical behavior average characteristic vector of a target user and second historical behavior average characteristic vectors of other users in the target user class; calculating a first difference degree between the current behavior feature vector and the first historical behavior average feature vector; calculating a first difference between the current behavior feature vector and the current behavior average feature vector, calculating a second difference between the first and second historical behavior average feature vectors, and calculating a second degree of difference based on the first difference and the second difference; and obtaining a behavior detection result of the target user according to the first difference degree and the second difference degree. The embodiment of the invention can effectively acquire the user behavior and detect the abnormal user behavior.
Description
Technical Field
The invention relates to the technical field of network communication, in particular to a user behavior detection method and device based on multi-source data fusion.
Background
In recent years, with rapid development of mobile internet and wide popularization of intelligent terminals, the campus wireless network full coverage is realized in many colleges and universities in China. As an important component of the internet, campus wireless networks face problems in terms of network management and the like in high-speed development like the internet.
Departments such as a school network center, a educational administration department, a student department and the like master a great amount of information of students and employees, such as sex, age, grade, curriculum schedule, score, internet surfing time, internet surfing place, network service type and the like. A method or a system for acquiring effective information from multi-source data generated by a user in campus life, accurately analyzing user behaviors and detecting abnormal behaviors in the user behaviors, thereby more effectively managing campus users and performing instant prevention and control on student mental health, which becomes the focus of attention of the whole society.
The network user behavior refers to a behavior rule expressed by the network user in network life, and the behavior rule can be quantitatively or qualitatively expressed by adopting the statistical characteristics or the mutual relation of related characteristic quantities in network data. However, the analysis method and the emphasis point of users of different types of networks are different, and no method or system for analyzing and detecting the behavior of users of the campus network exists at present.
Disclosure of Invention
Embodiments of the present invention provide a method and an apparatus for detecting user behavior based on multi-source data fusion, which overcome the above problems or at least partially solve the above problems.
In a first aspect, an embodiment of the present invention provides a user behavior detection method based on multi-source data fusion, including:
respectively analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vectors corresponding to other users in the class of the target user;
based on a pre-constructed historical behavior feature database, acquiring a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to acquire second historical behavior average feature vectors corresponding to all other users in the class of the target user;
calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value;
judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
In a second aspect, an embodiment of the present invention provides a user behavior detection apparatus based on multi-source data fusion, including:
the user behavior analysis module is used for analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period respectively based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vector corresponding to all other users in the class of the target user;
a historical behavior feature obtaining module, configured to obtain, based on a pre-constructed historical behavior feature database, a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class where the target user is located in the historical time period, and average the historical behavior feature vectors of the other users in the class where the target user is located in the historical time period to obtain second historical behavior average feature vectors corresponding to all other users in the class where the target user is located;
a first difference degree calculation module, configured to calculate a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
a second difference degree calculation module, configured to calculate a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculate a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculate a second difference degree based on the first difference value and the second difference value;
the detection module is used for judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the multi-source data fusion-based user behavior detection method according to the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the multi-source data fusion-based user behavior detection method as provided in the first aspect.
According to the user behavior detection method and device based on multi-source data fusion, provided by the embodiment of the invention, the user behavior is obtained by performing feature extraction on data on the basis of the weblog data, and the abnormal user behavior is detected, so that the method and device are beneficial to a management department to perform early intervention on users with abnormal behaviors, the safety risk caused by the abnormal behaviors is reduced, the method is simple to operate, and the method and device have higher practicability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a user behavior detection method based on multi-source data fusion according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a user behavior detection apparatus based on multi-source data fusion according to an embodiment of the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention mainly aims at studying student groups in campus network users, analyzes and describes the behaviors of the campus network users by introducing a data mining algorithm to obtain the behavior mode of the campus network users, detects abnormal user behaviors and provides decision support for a network manager.
As shown in fig. 1, a schematic flow chart of a user behavior detection method based on multi-source data fusion provided in an embodiment of the present invention includes:
specifically, in order to accurately detect the user abnormal behavior, in the embodiment of the present invention, the abnormal behavior detection is divided into two types: self-anomaly detection and analog anomaly detection. The self-abnormity detection means comparing the current behavior characteristics of the user with historical behavior characteristics, and detecting whether the current user behavior is abnormal or not through self-characteristic comparison; the analog anomaly detection means that the difference between the current behavior characteristics of the target user and the current behavior characteristics of other users is compared with the difference between the historical behavior characteristics of the target user and the historical behavior characteristics of other users, the user anomaly behavior is detected through the difference between the behavior characteristics of the analog target user and the historical behavior characteristics of other users, and the anomaly grade is determined according to the difference change degree.
The embodiment of the invention firstly obtains the campus wireless network user log information in the current time period from the network management system of the campus wireless network.
The campus wireless network user log information comprises: user ID, user online and offline time, target URL, terminal MAC address and network access point MAC address. The user ID can be information for identifying the user identity, such as student name, school number, identity card number and the like; the user online and offline time refers to the time when the user logs in the campus wireless network and the time when the user logs out of the campus wireless network; the target URL may be used to determine what the user surfs on the internet; the terminal MAC address refers to the MAC address of the user terminal; the network access point MAC address may reflect the location of the user when surfing the internet.
And analyzing the internet surfing behavior of the target user in the current time period based on the campus wireless network user log information in the current time period to obtain the current behavior feature vector of the target user.
And analyzing the internet surfing behavior of each other user in the class of the target user in the current time period based on the campus wireless network user log information in the current time period to obtain the current behavior feature vector of each other user.
The above analysis process is a process of extracting user behavior characteristics.
In order to comprehensively represent user behaviors, the embodiment of the invention provides the following user behavior characteristic representation method:
BC=BC(T,L,I,G)
wherein, T represents the characteristic of measuring the distribution of the user internet time period, L represents the characteristic of measuring the distribution of the user internet position, I represents the characteristic of measuring the user internet preference, and G represents the characteristic of measuring the user group combining degree.
It is understood that, in the embodiment of the present invention, the behavior feature vector includes the following information: the method comprises the steps of measuring the characteristics of the distribution of the internet surfing time periods of users, measuring the characteristics of the distribution of the internet surfing positions of the users, measuring the characteristics of the internet surfing preference of the users and measuring the user group combining degree.
After obtaining the current behavior feature vectors of other users, averaging the current behavior feature vectors of the other users in the class where the target user is located, and obtaining the current behavior average feature vectors corresponding to all the other users in the class where the target user is located.
specifically, the internet surfing behavior of the user is analyzed based on the campus wireless network user log information to obtain the behavior feature vector of the user, that is, the behavior feature vectors of the campus wireless network users in different historical time periods can be obtained by the same method as the step 100, so that a historical behavior feature database is constructed.
Then, based on a pre-constructed historical behavior feature database, obtaining a first historical behavior average feature vector of the target user in a certain historical time period, obtaining historical behavior feature vectors of other users in the class of the target user in the historical time period according to class information of the target user, averaging the historical behavior feature vectors of all other users in the historical time period, and obtaining a second historical behavior average feature vector corresponding to all other users in the class of the target user.
102, calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
specifically, a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period is calculated by adopting the following formula:
wherein, 0 < p1,p2,p3,p4< 1, and p1+p2+p3+p4=1,(TA,LA,IA,GA) A current behavior feature vector representing the target user a,a first historical behavior average feature vector representing the target user A over the historical period of time.
When the delta BC is larger than phi, namely if the first difference degree is judged to be larger than a first preset threshold, the current internet surfing behavior of the target user is known to have self abnormality, and the larger the delta BC value is, the higher the self abnormality degree is. p is a radical of1,p2,p3,p4The value of phi can be set as desired.
103, calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value;
specifically, the second difference degree is calculated by using the following formula:
wherein, 0 < q1,q2,q3,q4< 1, and q1+q2+q3+q4=1,(TA,LA,IA,GA) A current behavior feature vector representing the target user A, (T)Θ/{A},LΘ/{A},IΘ/{A},GΘ/{A}) Representing the current behavior average feature vector corresponding to all other users in the class of the target user A,a first historical behavior average feature vector representing the target user A over the historical period of time,a second historical behavior average feature vector representing all other users within the class of the target user A.
Wherein the second degree of difference is used to characterize an analogy anomaly degree of the user behavior. When the temperature is higher than the set temperatureIf the second difference degree is larger than a second preset threshold, the current behavior characteristics of the user can be judged to have analog abnormity, andthe larger the value, the higher the degree of analogy anomaly.The value of (c) can be set as desired.
104, judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
specifically, the embodiment of the present invention calculates the comprehensive abnormal degree of the user by synthesizing the self-abnormality and the analog abnormality, and performs weighted summation on the first difference degree and the second difference degree to obtain the abnormal behavior detection result of the target user:
wherein 0 < α, β < 1, and α + β ═ 1.
When J (A) > gamma, the current behavior characteristics of the user can be judged to be abnormal, and the larger the value of J (A), the higher the abnormal degree of the user A.
The values of α, β, γ can be flexibly set as desired.
The user behavior detection method based on multi-source data fusion provided by the embodiment of the invention is based on the weblog data, obtains the user behavior by performing feature extraction on the data, detects the abnormal user behavior, is beneficial to the management department to perform early intervention on the abnormal behavior user, reduces the safety risk caused by the abnormal behavior, is simple to operate, and has higher practicability.
Based on the content of the above embodiment, the step of analyzing the internet access behaviors of the target user and each other user in the class where the target user is located in the current time period based on the campus wireless network user log information in the current time period, and acquiring the current behavior feature vector of the target user and the current behavior feature vector of each other user in the class where the target user is located specifically includes:
acquiring campus wireless network user log information in a current time period, wherein the campus wireless network user log information comprises: user ID, user online and offline time, target URL, terminal MAC address and network access point MAC address;
dividing each day into 24 time periods by taking hours as a unit, determining the internet surfing time of a target user and each other user in the class of the target user in each time period based on the user on-off line time in the campus wireless network user log information in the current time period, and obtaining the characteristics for measuring the internet surfing time period distribution of the target user and each other user in the class of the target user;
specifically, each day is divided into 24 periods { t ] in units of hours1(0:00-1:00),t2(1:00-2:00),…,t24(23:00-24:00) }, based on the user on-off line time in the campus wireless network user log information in the current time period, determining the target user and the internet surfing time of each other user in the class where the target user is located in each time period.
E.g. TAThat is {0.5,0, …,1} indicates that user a is in period { t }1(0:00-1:00),t2(1:00-2:00),…,t24(23:00-24:00) the internet surfing time is 0.5 hour, 0, … and 1 hour respectively.
Identifying positions of a target user and other users in the class where the target user is located based on the network access point MAC address in the campus wireless network user log information in the current time period, counting the internet surfing time of each access point in unit time, determining the internet surfing time of the target user and other users in the class where the target user is located in each position, and obtaining the characteristic of measuring the internet surfing position distribution of the target user and other users in the class where the target user is located;
specifically, in order to identify the user position more accurately, the embodiment of the present invention identifies the user position by the MAC address of the AP in the blog, and counts the time duration of accessing the internet through each AP in unit time (for example, every day), so as to obtain the internet time of the user at each position. For example, LA ═ 1-101(MAC1, 0.5 hour), 2-203(MAC2, 2 hours), … … } indicates that user a is on line for 0.5 hour at 1-101 via AP address MAC1, and 2-203 is on line for 2 hours at 2-203 via AP address MAC 2.
Dividing target URLs in the weblogs into a plurality of network service categories, determining consumed time of each network service of a target user and each other user in a class where the target user is located based on the target URLs in the campus wireless network user log information in the current time period, and obtaining characteristics for measuring internet surfing preferences of the target user and each other user in the class where the target user is located;
specifically, because the current network content is more, in order to reduce the favorite feature dimension, the target URLs in the weblog are firstly classified, for example, into the following categories: office/study class, live video class, video on demand class, instant messaging class, game class, e-commerce class, illegal service class, etc., and then count the time spent by the user on various network services in unit time (such as every day). For example, IA ═ office/study class (1 hour), video live class (1 hour), video on demand class (2 hours), … … }, which indicates that the time spent by user a in office/study class, video live class, and video on demand class … … is 1 hour, 2 hours, and … …, respectively.
Calculating the relation degree mean value of the users and the class classmates thereof based on the terminal MAC address and the network access point MAC address in the campus wireless network user log information in the current time period, and obtaining the characteristic of measuring the group combining degree of the target user and each other user in the class where the target user is located;
specifically, in the embodiment of the present invention, a ratio of time during which two users access the same AP simultaneously within a unit time (e.g., every day) is defined as a degree of relationship between the two users. For example, if the time for simultaneously accessing the same AP by the user a and the user B every day is 3 hours, the relationship degree R between the user a and the user B is consideredAB3 ÷ 24 ═ 0.125; when the time for simultaneously accessing the same AP every day by the user B and the user C is 6 hours, the degree of relationship RBC between the user B and the user C is 6 ÷ 24 ═ 0.25, that is, the relationship between the user B and the user a is not as close as that between the user B and the user C. Further, the degree of group represents the mean of the degree of relationship between the user and his classmates.
The relationship degree mean value of the user and the same class thereof is calculated by adopting the following formula:
wherein R isAiAnd M represents the number of classmates of the user B.
As shown in fig. 2, a schematic structural diagram of a user behavior detection apparatus based on multi-source data fusion provided in an embodiment of the present invention includes: a user behavior analysis module 201, a historical behavior feature acquisition module 202, a first difference degree calculation module 203, a second difference degree calculation module 204 and a detection module 205, wherein,
the user behavior analysis module 201 is configured to analyze internet surfing behaviors of a target user and other users in a class where the target user is located in the current time period based on campus wireless network user log information in the current time period, obtain a current behavior feature vector of the target user and current behavior feature vectors of other users in the class where the target user is located, average the current behavior feature vectors of other users in the class where the target user is located, and obtain current behavior average feature vectors corresponding to all other users in the class where the target user is located;
a historical behavior feature obtaining module 202, configured to obtain, based on a pre-constructed historical behavior feature database, a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class where the target user is located in the historical time period, and average the historical behavior feature vectors of the other users in the class where the target user is located in the historical time period to obtain second historical behavior average feature vectors corresponding to all other users in the class where the target user is located;
a first difference degree calculation module 203, configured to calculate a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
a second difference degree calculation module 204, configured to calculate a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculate a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculate a second difference degree based on the first difference value and the second difference value;
the detection module 205 is configured to determine the abnormal behavior of the target user according to the first difference degree and the second difference degree, and obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
The user behavior detection device based on multi-source data fusion is used for executing the user behavior detection method based on multi-source data fusion in the method embodiment. Therefore, the description and definition in the foregoing embodiment of the user behavior detection method based on multi-source data fusion may be used for understanding the user behavior detection apparatus based on multi-source data fusion in the embodiment of the present invention, and are not described herein again.
The user behavior detection device based on multi-source data fusion provided by the embodiment of the invention is based on weblog data, obtains the user behavior by performing feature extraction on the data, and detects the abnormal user behavior, so that the device is beneficial to a management department to perform early intervention on users with abnormal behaviors, reduces the safety risk caused by the abnormal behaviors, is simple to operate, and has higher practicability.
Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke a computer program stored on the memory 330 and executable on the processor 310 to perform the multi-source data fusion-based user behavior detection method provided by the above-described method embodiments, for example, including: respectively analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vectors corresponding to other users in the class of the target user; based on a pre-constructed historical behavior feature database, acquiring a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to acquire second historical behavior average feature vectors corresponding to all other users in the class of the target user; calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period; calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value; judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user; the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the multi-source data fusion-based user behavior detection method provided in the foregoing method embodiments, for example, the method includes: respectively analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vectors corresponding to other users in the class of the target user; based on a pre-constructed historical behavior feature database, acquiring a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to acquire second historical behavior average feature vectors corresponding to all other users in the class of the target user; calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period; calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value; judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user; the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (8)
1. A user behavior detection method based on multi-source data fusion is characterized by comprising the following steps:
respectively analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vectors corresponding to other users in the class of the target user;
based on a pre-constructed historical behavior feature database, acquiring a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to acquire second historical behavior average feature vectors corresponding to all other users in the class of the target user;
calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and second historical behavior average feature vectors of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value;
judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior;
wherein the behavior feature vector comprises: measuring the characteristics of the distribution of the user internet time periods, the characteristics of the distribution of the user internet positions, the characteristics of the user internet favorite and the characteristics of the user group combining degree;
the method comprises the following steps of respectively analyzing the internet surfing behaviors of a target user and other users in a class where the target user is located in the current time period based on campus wireless network user log information in the current time period, and acquiring a current behavior feature vector of the target user and current behavior feature vectors of other users in the class where the target user is located, specifically:
acquiring campus wireless network user log information in a current time period, wherein the campus wireless network user log information comprises: user ID, user online and offline time, target URL, terminal MAC address and network access point MAC address;
dividing each day into 24 time periods by taking hours as a unit, determining the internet surfing time of a target user and each other user in the class of the target user in each time period based on the user on-off line time in the campus wireless network user log information in the current time period, and obtaining the characteristics for measuring the internet surfing time period distribution of the target user and each other user in the class of the target user;
identifying positions of a target user and other users in the class where the target user is located based on the network access point MAC address in the campus wireless network user log information in the current time period, counting the internet surfing time of each access point in unit time, determining the internet surfing time of the target user and other users in the class where the target user is located in each position, and obtaining the characteristic of measuring the internet surfing position distribution of the target user and other users in the class where the target user is located;
dividing target URLs in the weblogs into a plurality of network service categories, determining consumed time of each network service of a target user and each other user in a class where the target user is located based on the target URLs in the campus wireless network user log information in the current time period, and obtaining characteristics for measuring internet surfing preferences of the target user and each other user in the class where the target user is located;
calculating the mean value of the relationship degree of the users and the class classmates thereof based on the terminal MAC address and the network access point MAC address in the campus wireless network user log information in the current time period, and obtaining the characteristic of measuring the group combination degree of the target user and other users in the class of the target user;
the relationship degree is specifically a time ratio of two users accessing the same network access point at the same time.
2. The multi-source data fusion-based user behavior detection method according to claim 1, wherein a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period is calculated by using the following formula:
3. The multi-source data fusion-based user behavior detection method according to claim 1, wherein the second difference degree is calculated by using the following formula:
wherein, q is more than 01,q2,q3,q4< 1, and q1+q2+q3+q4=1,(TA,LA,IA,GA) A current behavior feature vector representing the target user A, (T)Θ/{A},LΘ/{A},IΘ/{A},GΘ/{A}) Representing the current behavior average feature vector corresponding to all other users in the class of the target user A,a first historical behavior average feature vector representing the target user A over the historical period of time,a second historical behavior average feature vector representing all other users within the class of the target user A.
4. The multi-source data fusion-based user behavior detection method according to claim 1, wherein the step of determining the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain the behavior detection result of the target user specifically comprises:
and carrying out weighted summation on the first difference degree and the second difference degree to obtain an abnormal behavior detection result of the target user.
6. A user behavior detection device based on multi-source data fusion is characterized by comprising:
the user behavior analysis module is used for analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period respectively based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vector corresponding to all other users in the class of the target user;
a historical behavior feature obtaining module, configured to obtain, based on a pre-constructed historical behavior feature database, a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class where the target user is located in the historical time period, and average the historical behavior feature vectors of the other users in the class where the target user is located in the historical time period to obtain second historical behavior average feature vectors corresponding to all other users in the class where the target user is located;
a first difference degree calculation module, configured to calculate a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
a second difference degree calculation module, configured to calculate a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculate a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculate a second difference degree based on the first difference value and the second difference value;
the detection module is used for judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior;
wherein the behavior feature vector comprises: measuring the characteristics of the distribution of the user internet time periods, the characteristics of the distribution of the user internet positions, the characteristics of the user internet favorite and the characteristics of the user group combining degree;
the method comprises the steps of respectively analyzing the internet surfing behaviors of a target user and other users in a class where the target user is located in the current time period based on campus wireless network user log information in the current time period, and acquiring a current behavior feature vector of the target user and current behavior feature vectors of other users in the class where the target user is located, wherein the steps are specifically as follows:
acquiring campus wireless network user log information in a current time period, wherein the campus wireless network user log information comprises: user ID, user online and offline time, target URL, terminal MAC address and network access point MAC address;
dividing each day into 24 time periods by taking hours as a unit, determining the internet surfing time of a target user and each other user in the class of the target user in each time period based on the user on-off line time in the campus wireless network user log information in the current time period, and obtaining the characteristics for measuring the internet surfing time period distribution of the target user and each other user in the class of the target user;
identifying positions of a target user and other users in the class where the target user is located based on the network access point MAC address in the campus wireless network user log information in the current time period, counting the internet surfing time of each access point in unit time, determining the internet surfing time of the target user and other users in the class where the target user is located in each position, and obtaining the characteristic of measuring the internet surfing position distribution of the target user and other users in the class where the target user is located;
dividing target URLs in the weblogs into a plurality of network service categories, determining consumed time of each network service of a target user and each other user in a class where the target user is located based on the target URLs in the campus wireless network user log information in the current time period, and obtaining characteristics for measuring internet surfing preferences of the target user and each other user in the class where the target user is located;
calculating the mean value of the relationship degree of the users and the class classmates thereof based on the terminal MAC address and the network access point MAC address in the campus wireless network user log information in the current time period, and obtaining the characteristic of measuring the group combination degree of the target user and other users in the class of the target user;
the relationship degree is specifically a time ratio of two users accessing the same network access point at the same time.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the multi-source data fusion-based user behavior detection method according to any one of claims 1 to 5 when executing the program.
8. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the multi-source data fusion-based user behavior detection method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910624299.0A CN110532485B (en) | 2019-07-11 | 2019-07-11 | User behavior detection method and device based on multi-source data fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910624299.0A CN110532485B (en) | 2019-07-11 | 2019-07-11 | User behavior detection method and device based on multi-source data fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110532485A CN110532485A (en) | 2019-12-03 |
CN110532485B true CN110532485B (en) | 2022-06-03 |
Family
ID=68659689
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910624299.0A Expired - Fee Related CN110532485B (en) | 2019-07-11 | 2019-07-11 | User behavior detection method and device based on multi-source data fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532485B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114513432A (en) * | 2020-10-29 | 2022-05-17 | 南京中兴新软件有限责任公司 | Method, device, medium and equipment for detecting internet access abnormity and offline |
CN112291622B (en) * | 2020-10-30 | 2022-05-27 | 中国建设银行股份有限公司 | Method and device for determining favorite internet surfing time period of user |
CN112633395B (en) * | 2020-12-29 | 2024-07-19 | 平安科技(深圳)有限公司 | Abnormal data detection method, device, computer equipment and storage medium |
CN116980239B (en) * | 2023-09-25 | 2023-11-24 | 江苏天创科技有限公司 | SASE-based network security monitoring and early warning method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107846389A (en) * | 2016-09-21 | 2018-03-27 | 中国科学院信息工程研究所 | Inside threat detection method and system based on the subjective and objective data fusion of user |
CN108763319A (en) * | 2018-04-28 | 2018-11-06 | 中国科学院自动化研究所 | Merge the social robot detection method and system of user behavior and text message |
CN106101116B (en) * | 2016-06-29 | 2019-01-08 | 东北大学 | A kind of user behavior abnormality detection system and method based on principal component analysis |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10735445B2 (en) * | 2016-09-21 | 2020-08-04 | Cognizant Technology Solutions U.S. Corporation | Detecting behavioral anomaly in machine learned rule sets |
US10721239B2 (en) * | 2017-03-31 | 2020-07-21 | Oracle International Corporation | Mechanisms for anomaly detection and access management |
-
2019
- 2019-07-11 CN CN201910624299.0A patent/CN110532485B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106101116B (en) * | 2016-06-29 | 2019-01-08 | 东北大学 | A kind of user behavior abnormality detection system and method based on principal component analysis |
CN107846389A (en) * | 2016-09-21 | 2018-03-27 | 中国科学院信息工程研究所 | Inside threat detection method and system based on the subjective and objective data fusion of user |
CN108763319A (en) * | 2018-04-28 | 2018-11-06 | 中国科学院自动化研究所 | Merge the social robot detection method and system of user behavior and text message |
Non-Patent Citations (2)
Title |
---|
A Cache Privacy Protection Strategy Based on Content Privacy and User Security Classification in CCN;Jie Liang,Yinlong Liu;《2019 IEEE Wireless Communications and Networking Conference》;20190418;全文 * |
Web页面细粒度数据抽取方法研究;王旭仁;《计算机工程与设计》;20140220;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110532485A (en) | 2019-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110532485B (en) | User behavior detection method and device based on multi-source data fusion | |
CN110417721B (en) | Security risk assessment method, device, equipment and computer readable storage medium | |
CN108632227B (en) | Malicious domain name detection processing method and device | |
CN109685536B (en) | Method and apparatus for outputting information | |
CN109345417B (en) | Online assessment method and terminal equipment for business personnel based on identity authentication | |
CN105824805B (en) | Identification method and device | |
CN107689956B (en) | Threat assessment method and device for abnormal event | |
US20160117717A1 (en) | Systems and Techniques for Intelligent A/B Testing of Marketing Campaigns | |
CN111754241A (en) | User behavior perception method, device, equipment and medium | |
CN113627566A (en) | Early warning method and device for phishing and computer equipment | |
CN112650608B (en) | Abnormal root cause positioning method, related device and equipment | |
WO2022142903A1 (en) | Identity recognition method and apparatus, electronic device, and related product | |
CN111770353A (en) | Live broadcast monitoring method and device, electronic equipment and storage medium | |
CN111612085B (en) | Method and device for detecting abnormal points in peer-to-peer group | |
CN109495378A (en) | Detect method, apparatus, server and the storage medium of abnormal account number | |
CN108108299B (en) | User interface testing method and device | |
CN108074108B (en) | Method and terminal for displaying net recommendation value | |
CN112000862B (en) | Data processing method and device | |
CN110222297B (en) | Identification method of tag user and related equipment | |
CN108521435B (en) | Method and system for user network behavior portrayal | |
CN114363082B (en) | Network attack detection method, device, equipment and computer readable storage medium | |
CN112511489B (en) | Domain name service abuse assessment method and device | |
CN116010221A (en) | Alarm processing method and device | |
CN113796834B (en) | Cognitive ability evaluation method, device, equipment and storage medium | |
CN111291259B (en) | Data screening method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220603 |