CN110532485B - User behavior detection method and device based on multi-source data fusion - Google Patents

User behavior detection method and device based on multi-source data fusion Download PDF

Info

Publication number
CN110532485B
CN110532485B CN201910624299.0A CN201910624299A CN110532485B CN 110532485 B CN110532485 B CN 110532485B CN 201910624299 A CN201910624299 A CN 201910624299A CN 110532485 B CN110532485 B CN 110532485B
Authority
CN
China
Prior art keywords
user
target user
behavior
class
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910624299.0A
Other languages
Chinese (zh)
Other versions
CN110532485A (en
Inventor
刘银龙
耿立茹
王旭仁
付佳
田野
谢菲
冯祥虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Institute of Information Engineering of CAS
Original Assignee
Capital Normal University
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University, Institute of Information Engineering of CAS filed Critical Capital Normal University
Priority to CN201910624299.0A priority Critical patent/CN110532485B/en
Publication of CN110532485A publication Critical patent/CN110532485A/en
Application granted granted Critical
Publication of CN110532485B publication Critical patent/CN110532485B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides a user behavior detection method and device based on multi-source data fusion, wherein the method comprises the following steps: acquiring a current behavior feature vector of a target user and a current behavior average feature vector of other users in a target user class based on a campus wireless network user log; acquiring a first historical behavior average characteristic vector of a target user and second historical behavior average characteristic vectors of other users in the target user class; calculating a first difference degree between the current behavior feature vector and the first historical behavior average feature vector; calculating a first difference between the current behavior feature vector and the current behavior average feature vector, calculating a second difference between the first and second historical behavior average feature vectors, and calculating a second degree of difference based on the first difference and the second difference; and obtaining a behavior detection result of the target user according to the first difference degree and the second difference degree. The embodiment of the invention can effectively acquire the user behavior and detect the abnormal user behavior.

Description

User behavior detection method and device based on multi-source data fusion
Technical Field
The invention relates to the technical field of network communication, in particular to a user behavior detection method and device based on multi-source data fusion.
Background
In recent years, with rapid development of mobile internet and wide popularization of intelligent terminals, the campus wireless network full coverage is realized in many colleges and universities in China. As an important component of the internet, campus wireless networks face problems in terms of network management and the like in high-speed development like the internet.
Departments such as a school network center, a educational administration department, a student department and the like master a great amount of information of students and employees, such as sex, age, grade, curriculum schedule, score, internet surfing time, internet surfing place, network service type and the like. A method or a system for acquiring effective information from multi-source data generated by a user in campus life, accurately analyzing user behaviors and detecting abnormal behaviors in the user behaviors, thereby more effectively managing campus users and performing instant prevention and control on student mental health, which becomes the focus of attention of the whole society.
The network user behavior refers to a behavior rule expressed by the network user in network life, and the behavior rule can be quantitatively or qualitatively expressed by adopting the statistical characteristics or the mutual relation of related characteristic quantities in network data. However, the analysis method and the emphasis point of users of different types of networks are different, and no method or system for analyzing and detecting the behavior of users of the campus network exists at present.
Disclosure of Invention
Embodiments of the present invention provide a method and an apparatus for detecting user behavior based on multi-source data fusion, which overcome the above problems or at least partially solve the above problems.
In a first aspect, an embodiment of the present invention provides a user behavior detection method based on multi-source data fusion, including:
respectively analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vectors corresponding to other users in the class of the target user;
based on a pre-constructed historical behavior feature database, acquiring a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to acquire second historical behavior average feature vectors corresponding to all other users in the class of the target user;
calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value;
judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
In a second aspect, an embodiment of the present invention provides a user behavior detection apparatus based on multi-source data fusion, including:
the user behavior analysis module is used for analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period respectively based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vector corresponding to all other users in the class of the target user;
a historical behavior feature obtaining module, configured to obtain, based on a pre-constructed historical behavior feature database, a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class where the target user is located in the historical time period, and average the historical behavior feature vectors of the other users in the class where the target user is located in the historical time period to obtain second historical behavior average feature vectors corresponding to all other users in the class where the target user is located;
a first difference degree calculation module, configured to calculate a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
a second difference degree calculation module, configured to calculate a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculate a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculate a second difference degree based on the first difference value and the second difference value;
the detection module is used for judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the multi-source data fusion-based user behavior detection method according to the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the multi-source data fusion-based user behavior detection method as provided in the first aspect.
According to the user behavior detection method and device based on multi-source data fusion, provided by the embodiment of the invention, the user behavior is obtained by performing feature extraction on data on the basis of the weblog data, and the abnormal user behavior is detected, so that the method and device are beneficial to a management department to perform early intervention on users with abnormal behaviors, the safety risk caused by the abnormal behaviors is reduced, the method is simple to operate, and the method and device have higher practicability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a user behavior detection method based on multi-source data fusion according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a user behavior detection apparatus based on multi-source data fusion according to an embodiment of the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention mainly aims at studying student groups in campus network users, analyzes and describes the behaviors of the campus network users by introducing a data mining algorithm to obtain the behavior mode of the campus network users, detects abnormal user behaviors and provides decision support for a network manager.
As shown in fig. 1, a schematic flow chart of a user behavior detection method based on multi-source data fusion provided in an embodiment of the present invention includes:
step 100, respectively analyzing the internet surfing behaviors of a target user and other users in a class where the target user is located in the current time period based on campus wireless network user log information in the current time period, obtaining a current behavior feature vector of the target user and current behavior feature vectors of other users in the class where the target user is located, averaging the current behavior feature vectors of other users in the class where the target user is located, and obtaining current behavior average feature vectors corresponding to all other users in the class where the target user is located;
specifically, in order to accurately detect the user abnormal behavior, in the embodiment of the present invention, the abnormal behavior detection is divided into two types: self-anomaly detection and analog anomaly detection. The self-abnormity detection means comparing the current behavior characteristics of the user with historical behavior characteristics, and detecting whether the current user behavior is abnormal or not through self-characteristic comparison; the analog anomaly detection means that the difference between the current behavior characteristics of the target user and the current behavior characteristics of other users is compared with the difference between the historical behavior characteristics of the target user and the historical behavior characteristics of other users, the user anomaly behavior is detected through the difference between the behavior characteristics of the analog target user and the historical behavior characteristics of other users, and the anomaly grade is determined according to the difference change degree.
The embodiment of the invention firstly obtains the campus wireless network user log information in the current time period from the network management system of the campus wireless network.
The campus wireless network user log information comprises: user ID, user online and offline time, target URL, terminal MAC address and network access point MAC address. The user ID can be information for identifying the user identity, such as student name, school number, identity card number and the like; the user online and offline time refers to the time when the user logs in the campus wireless network and the time when the user logs out of the campus wireless network; the target URL may be used to determine what the user surfs on the internet; the terminal MAC address refers to the MAC address of the user terminal; the network access point MAC address may reflect the location of the user when surfing the internet.
And analyzing the internet surfing behavior of the target user in the current time period based on the campus wireless network user log information in the current time period to obtain the current behavior feature vector of the target user.
And analyzing the internet surfing behavior of each other user in the class of the target user in the current time period based on the campus wireless network user log information in the current time period to obtain the current behavior feature vector of each other user.
The above analysis process is a process of extracting user behavior characteristics.
In order to comprehensively represent user behaviors, the embodiment of the invention provides the following user behavior characteristic representation method:
BC=BC(T,L,I,G)
wherein, T represents the characteristic of measuring the distribution of the user internet time period, L represents the characteristic of measuring the distribution of the user internet position, I represents the characteristic of measuring the user internet preference, and G represents the characteristic of measuring the user group combining degree.
It is understood that, in the embodiment of the present invention, the behavior feature vector includes the following information: the method comprises the steps of measuring the characteristics of the distribution of the internet surfing time periods of users, measuring the characteristics of the distribution of the internet surfing positions of the users, measuring the characteristics of the internet surfing preference of the users and measuring the user group combining degree.
After obtaining the current behavior feature vectors of other users, averaging the current behavior feature vectors of the other users in the class where the target user is located, and obtaining the current behavior average feature vectors corresponding to all the other users in the class where the target user is located.
Step 101, based on a pre-constructed historical behavior feature database, obtaining a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to obtain a second historical behavior average feature vector corresponding to all other users in the class of the target user;
specifically, the internet surfing behavior of the user is analyzed based on the campus wireless network user log information to obtain the behavior feature vector of the user, that is, the behavior feature vectors of the campus wireless network users in different historical time periods can be obtained by the same method as the step 100, so that a historical behavior feature database is constructed.
Then, based on a pre-constructed historical behavior feature database, obtaining a first historical behavior average feature vector of the target user in a certain historical time period, obtaining historical behavior feature vectors of other users in the class of the target user in the historical time period according to class information of the target user, averaging the historical behavior feature vectors of all other users in the historical time period, and obtaining a second historical behavior average feature vector corresponding to all other users in the class of the target user.
102, calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
specifically, a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period is calculated by adopting the following formula:
Figure BDA0002126549940000071
wherein, 0 < p1,p2,p3,p4< 1, and p1+p2+p3+p4=1,(TA,LA,IA,GA) A current behavior feature vector representing the target user a,
Figure BDA0002126549940000072
a first historical behavior average feature vector representing the target user A over the historical period of time.
When the delta BC is larger than phi, namely if the first difference degree is judged to be larger than a first preset threshold, the current internet surfing behavior of the target user is known to have self abnormality, and the larger the delta BC value is, the higher the self abnormality degree is. p is a radical of1,p2,p3,p4The value of phi can be set as desired.
103, calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value;
specifically, the second difference degree is calculated by using the following formula:
Figure BDA0002126549940000081
wherein, 0 < q1,q2,q3,q4< 1, and q1+q2+q3+q4=1,(TA,LA,IA,GA) A current behavior feature vector representing the target user A, (T)Θ/{A},LΘ/{A},IΘ/{A},GΘ/{A}) Representing the current behavior average feature vector corresponding to all other users in the class of the target user A,
Figure BDA0002126549940000082
a first historical behavior average feature vector representing the target user A over the historical period of time,
Figure BDA0002126549940000083
a second historical behavior average feature vector representing all other users within the class of the target user A.
Wherein the second degree of difference is used to characterize an analogy anomaly degree of the user behavior. When the temperature is higher than the set temperature
Figure BDA0002126549940000084
If the second difference degree is larger than a second preset threshold, the current behavior characteristics of the user can be judged to have analog abnormity, and
Figure BDA0002126549940000087
the larger the value, the higher the degree of analogy anomaly.
Figure BDA0002126549940000085
The value of (c) can be set as desired.
104, judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
specifically, the embodiment of the present invention calculates the comprehensive abnormal degree of the user by synthesizing the self-abnormality and the analog abnormality, and performs weighted summation on the first difference degree and the second difference degree to obtain the abnormal behavior detection result of the target user:
Figure BDA0002126549940000086
wherein 0 < α, β < 1, and α + β ═ 1.
When J (A) > gamma, the current behavior characteristics of the user can be judged to be abnormal, and the larger the value of J (A), the higher the abnormal degree of the user A.
The values of α, β, γ can be flexibly set as desired.
The user behavior detection method based on multi-source data fusion provided by the embodiment of the invention is based on the weblog data, obtains the user behavior by performing feature extraction on the data, detects the abnormal user behavior, is beneficial to the management department to perform early intervention on the abnormal behavior user, reduces the safety risk caused by the abnormal behavior, is simple to operate, and has higher practicability.
Based on the content of the above embodiment, the step of analyzing the internet access behaviors of the target user and each other user in the class where the target user is located in the current time period based on the campus wireless network user log information in the current time period, and acquiring the current behavior feature vector of the target user and the current behavior feature vector of each other user in the class where the target user is located specifically includes:
acquiring campus wireless network user log information in a current time period, wherein the campus wireless network user log information comprises: user ID, user online and offline time, target URL, terminal MAC address and network access point MAC address;
dividing each day into 24 time periods by taking hours as a unit, determining the internet surfing time of a target user and each other user in the class of the target user in each time period based on the user on-off line time in the campus wireless network user log information in the current time period, and obtaining the characteristics for measuring the internet surfing time period distribution of the target user and each other user in the class of the target user;
specifically, each day is divided into 24 periods { t ] in units of hours1(0:00-1:00),t2(1:00-2:00),…,t24(23:00-24:00) }, based on the user on-off line time in the campus wireless network user log information in the current time period, determining the target user and the internet surfing time of each other user in the class where the target user is located in each time period.
E.g. TAThat is {0.5,0, …,1} indicates that user a is in period { t }1(0:00-1:00),t2(1:00-2:00),…,t24(23:00-24:00) the internet surfing time is 0.5 hour, 0, … and 1 hour respectively.
Identifying positions of a target user and other users in the class where the target user is located based on the network access point MAC address in the campus wireless network user log information in the current time period, counting the internet surfing time of each access point in unit time, determining the internet surfing time of the target user and other users in the class where the target user is located in each position, and obtaining the characteristic of measuring the internet surfing position distribution of the target user and other users in the class where the target user is located;
specifically, in order to identify the user position more accurately, the embodiment of the present invention identifies the user position by the MAC address of the AP in the blog, and counts the time duration of accessing the internet through each AP in unit time (for example, every day), so as to obtain the internet time of the user at each position. For example, LA ═ 1-101(MAC1, 0.5 hour), 2-203(MAC2, 2 hours), … … } indicates that user a is on line for 0.5 hour at 1-101 via AP address MAC1, and 2-203 is on line for 2 hours at 2-203 via AP address MAC 2.
Dividing target URLs in the weblogs into a plurality of network service categories, determining consumed time of each network service of a target user and each other user in a class where the target user is located based on the target URLs in the campus wireless network user log information in the current time period, and obtaining characteristics for measuring internet surfing preferences of the target user and each other user in the class where the target user is located;
specifically, because the current network content is more, in order to reduce the favorite feature dimension, the target URLs in the weblog are firstly classified, for example, into the following categories: office/study class, live video class, video on demand class, instant messaging class, game class, e-commerce class, illegal service class, etc., and then count the time spent by the user on various network services in unit time (such as every day). For example, IA ═ office/study class (1 hour), video live class (1 hour), video on demand class (2 hours), … … }, which indicates that the time spent by user a in office/study class, video live class, and video on demand class … … is 1 hour, 2 hours, and … …, respectively.
Calculating the relation degree mean value of the users and the class classmates thereof based on the terminal MAC address and the network access point MAC address in the campus wireless network user log information in the current time period, and obtaining the characteristic of measuring the group combining degree of the target user and each other user in the class where the target user is located;
specifically, in the embodiment of the present invention, a ratio of time during which two users access the same AP simultaneously within a unit time (e.g., every day) is defined as a degree of relationship between the two users. For example, if the time for simultaneously accessing the same AP by the user a and the user B every day is 3 hours, the relationship degree R between the user a and the user B is consideredAB3 ÷ 24 ═ 0.125; when the time for simultaneously accessing the same AP every day by the user B and the user C is 6 hours, the degree of relationship RBC between the user B and the user C is 6 ÷ 24 ═ 0.25, that is, the relationship between the user B and the user a is not as close as that between the user B and the user C. Further, the degree of group represents the mean of the degree of relationship between the user and his classmates.
The relationship degree mean value of the user and the same class thereof is calculated by adopting the following formula:
Figure BDA0002126549940000111
wherein R isAiAnd M represents the number of classmates of the user B.
As shown in fig. 2, a schematic structural diagram of a user behavior detection apparatus based on multi-source data fusion provided in an embodiment of the present invention includes: a user behavior analysis module 201, a historical behavior feature acquisition module 202, a first difference degree calculation module 203, a second difference degree calculation module 204 and a detection module 205, wherein,
the user behavior analysis module 201 is configured to analyze internet surfing behaviors of a target user and other users in a class where the target user is located in the current time period based on campus wireless network user log information in the current time period, obtain a current behavior feature vector of the target user and current behavior feature vectors of other users in the class where the target user is located, average the current behavior feature vectors of other users in the class where the target user is located, and obtain current behavior average feature vectors corresponding to all other users in the class where the target user is located;
a historical behavior feature obtaining module 202, configured to obtain, based on a pre-constructed historical behavior feature database, a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class where the target user is located in the historical time period, and average the historical behavior feature vectors of the other users in the class where the target user is located in the historical time period to obtain second historical behavior average feature vectors corresponding to all other users in the class where the target user is located;
a first difference degree calculation module 203, configured to calculate a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
a second difference degree calculation module 204, configured to calculate a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculate a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculate a second difference degree based on the first difference value and the second difference value;
the detection module 205 is configured to determine the abnormal behavior of the target user according to the first difference degree and the second difference degree, and obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
The user behavior detection device based on multi-source data fusion is used for executing the user behavior detection method based on multi-source data fusion in the method embodiment. Therefore, the description and definition in the foregoing embodiment of the user behavior detection method based on multi-source data fusion may be used for understanding the user behavior detection apparatus based on multi-source data fusion in the embodiment of the present invention, and are not described herein again.
The user behavior detection device based on multi-source data fusion provided by the embodiment of the invention is based on weblog data, obtains the user behavior by performing feature extraction on the data, and detects the abnormal user behavior, so that the device is beneficial to a management department to perform early intervention on users with abnormal behaviors, reduces the safety risk caused by the abnormal behaviors, is simple to operate, and has higher practicability.
Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke a computer program stored on the memory 330 and executable on the processor 310 to perform the multi-source data fusion-based user behavior detection method provided by the above-described method embodiments, for example, including: respectively analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vectors corresponding to other users in the class of the target user; based on a pre-constructed historical behavior feature database, acquiring a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to acquire second historical behavior average feature vectors corresponding to all other users in the class of the target user; calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period; calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value; judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user; the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the multi-source data fusion-based user behavior detection method provided in the foregoing method embodiments, for example, the method includes: respectively analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vectors corresponding to other users in the class of the target user; based on a pre-constructed historical behavior feature database, acquiring a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to acquire second historical behavior average feature vectors corresponding to all other users in the class of the target user; calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period; calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value; judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user; the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A user behavior detection method based on multi-source data fusion is characterized by comprising the following steps:
respectively analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vectors corresponding to other users in the class of the target user;
based on a pre-constructed historical behavior feature database, acquiring a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class of the target user in the historical time period, and averaging the historical behavior feature vectors of the other users in the class of the target user in the historical time period to acquire second historical behavior average feature vectors corresponding to all other users in the class of the target user;
calculating a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
calculating a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculating a second difference value between a first historical behavior average feature vector of the target user in the historical time period and second historical behavior average feature vectors of all other users in the class where the target user is located, and calculating to obtain a second difference degree based on the first difference value and the second difference value;
judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior;
wherein the behavior feature vector comprises: measuring the characteristics of the distribution of the user internet time periods, the characteristics of the distribution of the user internet positions, the characteristics of the user internet favorite and the characteristics of the user group combining degree;
the method comprises the following steps of respectively analyzing the internet surfing behaviors of a target user and other users in a class where the target user is located in the current time period based on campus wireless network user log information in the current time period, and acquiring a current behavior feature vector of the target user and current behavior feature vectors of other users in the class where the target user is located, specifically:
acquiring campus wireless network user log information in a current time period, wherein the campus wireless network user log information comprises: user ID, user online and offline time, target URL, terminal MAC address and network access point MAC address;
dividing each day into 24 time periods by taking hours as a unit, determining the internet surfing time of a target user and each other user in the class of the target user in each time period based on the user on-off line time in the campus wireless network user log information in the current time period, and obtaining the characteristics for measuring the internet surfing time period distribution of the target user and each other user in the class of the target user;
identifying positions of a target user and other users in the class where the target user is located based on the network access point MAC address in the campus wireless network user log information in the current time period, counting the internet surfing time of each access point in unit time, determining the internet surfing time of the target user and other users in the class where the target user is located in each position, and obtaining the characteristic of measuring the internet surfing position distribution of the target user and other users in the class where the target user is located;
dividing target URLs in the weblogs into a plurality of network service categories, determining consumed time of each network service of a target user and each other user in a class where the target user is located based on the target URLs in the campus wireless network user log information in the current time period, and obtaining characteristics for measuring internet surfing preferences of the target user and each other user in the class where the target user is located;
calculating the mean value of the relationship degree of the users and the class classmates thereof based on the terminal MAC address and the network access point MAC address in the campus wireless network user log information in the current time period, and obtaining the characteristic of measuring the group combination degree of the target user and other users in the class of the target user;
the relationship degree is specifically a time ratio of two users accessing the same network access point at the same time.
2. The multi-source data fusion-based user behavior detection method according to claim 1, wherein a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period is calculated by using the following formula:
Figure FDA0003476294210000031
wherein, 0 < p1,p2,p3,p4< 1, and p1+p2+p3+p4=1,(TA,LA,IA,GA) A current behavior feature vector representing the target user a,
Figure FDA0003476294210000032
a first historical behavior average feature vector representing the target user A over the historical period of time.
3. The multi-source data fusion-based user behavior detection method according to claim 1, wherein the second difference degree is calculated by using the following formula:
Figure FDA0003476294210000033
wherein, q is more than 01,q2,q3,q4< 1, and q1+q2+q3+q4=1,(TA,LA,IA,GA) A current behavior feature vector representing the target user A, (T)Θ/{A},LΘ/{A},IΘ/{A},GΘ/{A}) Representing the current behavior average feature vector corresponding to all other users in the class of the target user A,
Figure FDA0003476294210000034
a first historical behavior average feature vector representing the target user A over the historical period of time,
Figure FDA0003476294210000035
a second historical behavior average feature vector representing all other users within the class of the target user A.
4. The multi-source data fusion-based user behavior detection method according to claim 1, wherein the step of determining the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain the behavior detection result of the target user specifically comprises:
and carrying out weighted summation on the first difference degree and the second difference degree to obtain an abnormal behavior detection result of the target user.
5. The multi-source data fusion-based user behavior detection method according to claim 1, wherein the relationship mean of the user and his classmates is calculated by using the following formula:
Figure FDA0003476294210000041
wherein R isAiAnd M represents the number of classmates of the user B.
6. A user behavior detection device based on multi-source data fusion is characterized by comprising:
the user behavior analysis module is used for analyzing the internet surfing behaviors of a target user and other users in the class of the target user in the current time period respectively based on the campus wireless network user log information in the current time period, acquiring the current behavior feature vector of the target user and the current behavior feature vector of other users in the class of the target user, averaging the current behavior feature vectors of other users in the class of the target user, and acquiring the current behavior average feature vector corresponding to all other users in the class of the target user;
a historical behavior feature obtaining module, configured to obtain, based on a pre-constructed historical behavior feature database, a first historical behavior average feature vector of the target user in a certain historical time period and historical behavior feature vectors of other users in the class where the target user is located in the historical time period, and average the historical behavior feature vectors of the other users in the class where the target user is located in the historical time period to obtain second historical behavior average feature vectors corresponding to all other users in the class where the target user is located;
a first difference degree calculation module, configured to calculate a first difference degree between the current behavior feature vector of the target user and a first historical behavior average feature vector of the target user in the historical time period;
a second difference degree calculation module, configured to calculate a first difference value between the current behavior feature vector of the target user and current behavior average feature vectors corresponding to all other users in the class where the target user is located, calculate a second difference value between a first historical behavior average feature vector of the target user in the historical time period and a second historical behavior average feature vector of all other users in the class where the target user is located, and calculate a second difference degree based on the first difference value and the second difference value;
the detection module is used for judging the abnormal behavior of the target user according to the first difference degree and the second difference degree to obtain a behavior detection result of the target user;
the first difference degree is used for representing the self-abnormal degree of the user behavior, and the second difference degree is used for representing the analogy abnormal degree of the user behavior;
wherein the behavior feature vector comprises: measuring the characteristics of the distribution of the user internet time periods, the characteristics of the distribution of the user internet positions, the characteristics of the user internet favorite and the characteristics of the user group combining degree;
the method comprises the steps of respectively analyzing the internet surfing behaviors of a target user and other users in a class where the target user is located in the current time period based on campus wireless network user log information in the current time period, and acquiring a current behavior feature vector of the target user and current behavior feature vectors of other users in the class where the target user is located, wherein the steps are specifically as follows:
acquiring campus wireless network user log information in a current time period, wherein the campus wireless network user log information comprises: user ID, user online and offline time, target URL, terminal MAC address and network access point MAC address;
dividing each day into 24 time periods by taking hours as a unit, determining the internet surfing time of a target user and each other user in the class of the target user in each time period based on the user on-off line time in the campus wireless network user log information in the current time period, and obtaining the characteristics for measuring the internet surfing time period distribution of the target user and each other user in the class of the target user;
identifying positions of a target user and other users in the class where the target user is located based on the network access point MAC address in the campus wireless network user log information in the current time period, counting the internet surfing time of each access point in unit time, determining the internet surfing time of the target user and other users in the class where the target user is located in each position, and obtaining the characteristic of measuring the internet surfing position distribution of the target user and other users in the class where the target user is located;
dividing target URLs in the weblogs into a plurality of network service categories, determining consumed time of each network service of a target user and each other user in a class where the target user is located based on the target URLs in the campus wireless network user log information in the current time period, and obtaining characteristics for measuring internet surfing preferences of the target user and each other user in the class where the target user is located;
calculating the mean value of the relationship degree of the users and the class classmates thereof based on the terminal MAC address and the network access point MAC address in the campus wireless network user log information in the current time period, and obtaining the characteristic of measuring the group combination degree of the target user and other users in the class of the target user;
the relationship degree is specifically a time ratio of two users accessing the same network access point at the same time.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the multi-source data fusion-based user behavior detection method according to any one of claims 1 to 5 when executing the program.
8. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the multi-source data fusion-based user behavior detection method according to any one of claims 1 to 5.
CN201910624299.0A 2019-07-11 2019-07-11 User behavior detection method and device based on multi-source data fusion Expired - Fee Related CN110532485B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910624299.0A CN110532485B (en) 2019-07-11 2019-07-11 User behavior detection method and device based on multi-source data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910624299.0A CN110532485B (en) 2019-07-11 2019-07-11 User behavior detection method and device based on multi-source data fusion

Publications (2)

Publication Number Publication Date
CN110532485A CN110532485A (en) 2019-12-03
CN110532485B true CN110532485B (en) 2022-06-03

Family

ID=68659689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910624299.0A Expired - Fee Related CN110532485B (en) 2019-07-11 2019-07-11 User behavior detection method and device based on multi-source data fusion

Country Status (1)

Country Link
CN (1) CN110532485B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114513432A (en) * 2020-10-29 2022-05-17 南京中兴新软件有限责任公司 Method, device, medium and equipment for detecting internet access abnormity and offline
CN112291622B (en) * 2020-10-30 2022-05-27 中国建设银行股份有限公司 Method and device for determining favorite internet surfing time period of user
CN112633395B (en) * 2020-12-29 2024-07-19 平安科技(深圳)有限公司 Abnormal data detection method, device, computer equipment and storage medium
CN116980239B (en) * 2023-09-25 2023-11-24 江苏天创科技有限公司 SASE-based network security monitoring and early warning method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107846389A (en) * 2016-09-21 2018-03-27 中国科学院信息工程研究所 Inside threat detection method and system based on the subjective and objective data fusion of user
CN108763319A (en) * 2018-04-28 2018-11-06 中国科学院自动化研究所 Merge the social robot detection method and system of user behavior and text message
CN106101116B (en) * 2016-06-29 2019-01-08 东北大学 A kind of user behavior abnormality detection system and method based on principal component analysis

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10735445B2 (en) * 2016-09-21 2020-08-04 Cognizant Technology Solutions U.S. Corporation Detecting behavioral anomaly in machine learned rule sets
US10721239B2 (en) * 2017-03-31 2020-07-21 Oracle International Corporation Mechanisms for anomaly detection and access management

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106101116B (en) * 2016-06-29 2019-01-08 东北大学 A kind of user behavior abnormality detection system and method based on principal component analysis
CN107846389A (en) * 2016-09-21 2018-03-27 中国科学院信息工程研究所 Inside threat detection method and system based on the subjective and objective data fusion of user
CN108763319A (en) * 2018-04-28 2018-11-06 中国科学院自动化研究所 Merge the social robot detection method and system of user behavior and text message

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Cache Privacy Protection Strategy Based on Content Privacy and User Security Classification in CCN;Jie Liang,Yinlong Liu;《2019 IEEE Wireless Communications and Networking Conference》;20190418;全文 *
Web页面细粒度数据抽取方法研究;王旭仁;《计算机工程与设计》;20140220;全文 *

Also Published As

Publication number Publication date
CN110532485A (en) 2019-12-03

Similar Documents

Publication Publication Date Title
CN110532485B (en) User behavior detection method and device based on multi-source data fusion
CN110417721B (en) Security risk assessment method, device, equipment and computer readable storage medium
CN108632227B (en) Malicious domain name detection processing method and device
CN109685536B (en) Method and apparatus for outputting information
CN109345417B (en) Online assessment method and terminal equipment for business personnel based on identity authentication
CN105824805B (en) Identification method and device
CN107689956B (en) Threat assessment method and device for abnormal event
US20160117717A1 (en) Systems and Techniques for Intelligent A/B Testing of Marketing Campaigns
CN111754241A (en) User behavior perception method, device, equipment and medium
CN113627566A (en) Early warning method and device for phishing and computer equipment
CN112650608B (en) Abnormal root cause positioning method, related device and equipment
WO2022142903A1 (en) Identity recognition method and apparatus, electronic device, and related product
CN111770353A (en) Live broadcast monitoring method and device, electronic equipment and storage medium
CN111612085B (en) Method and device for detecting abnormal points in peer-to-peer group
CN109495378A (en) Detect method, apparatus, server and the storage medium of abnormal account number
CN108108299B (en) User interface testing method and device
CN108074108B (en) Method and terminal for displaying net recommendation value
CN112000862B (en) Data processing method and device
CN110222297B (en) Identification method of tag user and related equipment
CN108521435B (en) Method and system for user network behavior portrayal
CN114363082B (en) Network attack detection method, device, equipment and computer readable storage medium
CN112511489B (en) Domain name service abuse assessment method and device
CN116010221A (en) Alarm processing method and device
CN113796834B (en) Cognitive ability evaluation method, device, equipment and storage medium
CN111291259B (en) Data screening method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220603