CN114201655B - Account classification method, device, equipment and storage medium - Google Patents

Account classification method, device, equipment and storage medium Download PDF

Info

Publication number
CN114201655B
CN114201655B CN202010909515.9A CN202010909515A CN114201655B CN 114201655 B CN114201655 B CN 114201655B CN 202010909515 A CN202010909515 A CN 202010909515A CN 114201655 B CN114201655 B CN 114201655B
Authority
CN
China
Prior art keywords
account
accounts
equipment
graph structure
system hardware
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010909515.9A
Other languages
Chinese (zh)
Other versions
CN114201655A (en
Inventor
温蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010909515.9A priority Critical patent/CN114201655B/en
Publication of CN114201655A publication Critical patent/CN114201655A/en
Application granted granted Critical
Publication of CN114201655B publication Critical patent/CN114201655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses an account classification method, an account classification device, account classification equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: based on system hardware logs and service content logs corresponding to a plurality of first account numbers, obtaining user characteristics corresponding to the plurality of first account numbers, wherein the user characteristics are used for representing actions of the account numbers on login equipment and corresponding service contents; acquiring equipment characteristics corresponding to a plurality of equipment identifiers based on system hardware logs corresponding to the plurality of first account numbers; generating a graph structure based on user characteristics corresponding to the plurality of first account numbers and equipment characteristics corresponding to the plurality of equipment identifiers; based on the graph structure, classification is carried out to obtain the category corresponding to the plurality of first account numbers, and because richer information is considered, the classification result obtained based on artificial intelligence is more accurate.

Description

Account classification method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for classifying accounts.
Background
With the rapid development of computer technology and the wide spread of the internet, the user scale of the internet is also increasing. However, in addition to the normal users, many abnormal users occur in the internet, and these abnormal users may propagate false information through the internet, which seriously affects the network environment, so how to distinguish the normal users from the abnormal users becomes a problem to be solved.
In the related art, whether a user is an abnormal user is determined by previously setting various kinds of violations, detecting whether the user performs the violations. However, the method is used for judging the problem that misjudgment often occurs, and normal users and abnormal users cannot be well distinguished.
Disclosure of Invention
The embodiment of the application provides an account classification method, an account classification device, account classification equipment and a storage medium, which can improve the accuracy of account classification. The technical scheme is as follows:
in one aspect, an account classification method is provided, and the method includes:
based on system hardware logs and service content logs corresponding to a plurality of first account numbers, obtaining user characteristics corresponding to the plurality of first account numbers, wherein the user characteristics are used for representing actions of the account numbers on login equipment and corresponding service contents;
Acquiring equipment characteristics corresponding to a plurality of equipment identifiers based on system hardware logs corresponding to the plurality of first account numbers, wherein the plurality of equipment identifiers are equipment identifiers in the system hardware logs corresponding to the plurality of first account numbers, and the equipment characteristics are used for representing actions occurring on equipment;
generating a graph structure based on user characteristics corresponding to the plurality of first account numbers and equipment characteristics corresponding to the plurality of equipment identifiers, wherein one node in the graph structure corresponds to one account number or one equipment, and an edge between two nodes is used for indicating that the account numbers and the equipment corresponding to the two nodes correspond to the same log;
and classifying based on the graph structure to obtain the category corresponding to the plurality of first account numbers.
In one possible implementation manner, the invoking the at least one neighbor aggregation layer performs aggregation processing on each node in the graph structure to obtain a processed graph structure, including:
and calling any neighbor aggregation layer, carrying out aggregation treatment on each node in the input graph structure to obtain a treated graph structure, and inputting the treated graph structure to the next layer.
In another aspect, an account classifying device is provided, the device includes:
The first acquisition module is used for acquiring user characteristics corresponding to a plurality of first accounts based on system hardware logs and service content logs corresponding to the plurality of first accounts, wherein the user characteristics are used for representing the actions of the accounts on the login equipment and corresponding service content;
the second acquisition module is used for acquiring equipment characteristics corresponding to a plurality of equipment identifiers based on the system hardware logs corresponding to the plurality of first account numbers, wherein the plurality of equipment identifiers are equipment identifiers in the system hardware logs corresponding to the plurality of first account numbers, and the equipment characteristics are used for representing actions occurring on equipment;
the generating module is used for generating a graph structure based on user characteristics corresponding to the plurality of first account numbers and equipment characteristics corresponding to the plurality of equipment identifiers, wherein one node in the graph structure corresponds to one account number or one equipment, and an edge between two nodes is used for representing that the account numbers and the equipment corresponding to the two nodes correspond to the same log;
and the classification module is used for classifying based on the graph structure to obtain the category corresponding to the plurality of first account numbers.
In one possible implementation, the generating module includes:
The node generating unit is used for generating nodes in the graph structure based on the plurality of first account numbers and the plurality of equipment identifiers;
an edge adding unit, configured to add edges between nodes in the graph structure based on a relationship between the account number and the device indicated by the system hardware log;
the association unit is used for associating the user characteristics corresponding to the plurality of first account numbers to corresponding nodes in the graph structure;
the association unit is further configured to associate device features corresponding to the plurality of device identifiers to corresponding nodes in the graph structure.
In one possible implementation manner, the first obtaining module includes:
the first feature acquisition unit is used for acquiring equipment behavior features corresponding to the plurality of first accounts based on system hardware logs corresponding to the plurality of first accounts, wherein the equipment behavior features are used for representing the behavior of the accounts on the logged-in equipment;
a second feature obtaining unit, configured to obtain, based on service content logs corresponding to the plurality of first account numbers, service content features corresponding to the plurality of first account numbers, where the service content features are used to represent service content executed by an account number;
And the splicing unit is used for splicing the equipment behavior characteristics and the business content characteristics corresponding to the plurality of first account numbers to obtain the user characteristics of the plurality of first account numbers.
In one possible implementation manner, the first feature obtaining unit is configured to obtain, from a system hardware log of any first account, a target system hardware log including the same device identifier; calling a memory model to perform feature extraction on the target system hardware log to obtain behavior features of any one first account on the equipment; and fusing the behavior characteristics of any one of the first account numbers on a plurality of devices to obtain the device behavior characteristics of any one of the first account numbers.
In one possible implementation manner, the classification module is configured to invoke a first account classification model to classify the graph structure, so as to obtain the category to which the plurality of first accounts correspond.
In one possible implementation manner, the first account classification model includes at least one neighbor aggregation layer and a prediction layer, and the classification module includes:
the aggregation unit is used for calling the at least one neighbor aggregation layer, and carrying out aggregation treatment on each node in the graph structure to obtain a treated graph structure;
And the prediction unit is used for calling the prediction layer and classifying the processed graph structure to obtain the category corresponding to the plurality of first account numbers.
In one possible implementation manner, the aggregation unit is configured to invoke any neighboring aggregation layer, perform aggregation processing on each node in the input graph structure, obtain a processed graph structure, and input the processed graph structure to a next layer.
In one possible implementation, the first account classification model includes at least one first neighbor aggregation layer and at least one second neighbor aggregation layer, and the apparatus further includes:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring sample data, the sample data comprises system hardware logs corresponding to a plurality of second accounts, service content logs, a first sample class and a second sample class, the first sample class is obtained by classifying the system hardware logs corresponding to the plurality of second accounts, the first sample class comprises an abnormal account or a normal account, the second sample class is obtained by classifying the service content logs corresponding to the plurality of second accounts, and the second sample class comprises a normal account or a suspicious account;
The classification module is used for classifying through the first account classification model based on the sample data to obtain prediction categories corresponding to the plurality of second accounts;
the first training module is used for training the at least one first neighbor aggregation layer according to the prediction categories and the second sample categories corresponding to the plurality of second account numbers;
and the second training module is used for training the at least one second neighbor aggregation layer and the prediction layer according to the prediction categories and the first sample categories corresponding to the plurality of second account numbers.
In another aspect, a computer device is provided, the computer device including a processor and a memory, the memory storing at least one program code, the at least one program code loaded by the processor and performing the operations performed in the account sorting method of the above aspect.
In another aspect, a computer readable storage medium having stored therein at least one program code loaded and executed by a processor to perform operations performed in the account sorting method of the above aspect is provided.
In yet another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer program code, the computer program code being stored in a computer readable storage medium. The computer program code is read from a computer readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform operations performed in the account classification method as described in the above aspect.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least:
according to the account classification method, the device, the equipment and the storage medium, the user characteristics corresponding to the accounts are extracted from the system hardware logs and the service content logs, so that the user characteristics can represent the actions of the accounts on the logged-in equipment and the service content executed by the accounts, the information covered by the user characteristics is enriched, the graph structure is generated based on the user characteristics and the equipment characteristics, the information in the graph structure is enriched, and when classification is carried out based on the graph structure, the actions of the user on the terminal equipment and the corresponding service content can be combined, classification is carried out from multiple aspects, and the obtained category is more accurate.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;
fig. 2 is a flowchart of an account classification method according to an embodiment of the present application;
fig. 3 is a flowchart of an account classification method according to an embodiment of the present application;
FIG. 4 is a flowchart of feature extraction of a memory model according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a structure of the present application;
FIG. 6 is a flowchart of a first account classification model training method according to an embodiment of the present application;
FIG. 7 is a flowchart of a first account classification model training method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an account classifying device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of another account classifying device according to an embodiment of the present application;
Fig. 10 is a block diagram of a terminal according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
It is to be understood that the terms "first," "second," and the like, as used herein, may be used to describe various concepts, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first account number can be referred to as a second account number, and similarly, a second account number can be referred to as a first account number, without departing from the scope of the present disclosure.
As used herein, the terms "at least one", "a plurality", "each", "any" and "at least one include one, two or more, a plurality includes two or more, and each refers to each of a corresponding plurality, any one refers to any one of a plurality, for example, a plurality of accounts includes 3 accounts, and each refers to each of the 3 accounts, any one refers to any one of the 3 accounts, first, second, or third.
It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, the account represented by the user features in the present application is obtained under the condition of sufficient authorization when the account acts on the login device and corresponding business content.
With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medicine, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.
Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Natural language processing (Nature Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
The scheme provided by the embodiment of the application relates to an artificial intelligence natural language processing technology for processing a system hardware log and a business content log, and the account classification method of the application is described in detail by the following embodiment:
the account classification method provided by the embodiment of the application is applied to computer equipment, and in one possible implementation manner, the computer equipment is terminal equipment such as a smart phone, a tablet personal computer, a notebook computer, a desktop computer, a smart sound box, a smart watch and the like; in another possible implementation, the computer device is a server, e.g., the server is a stand-alone physical server; or the server is a server cluster or a distributed system formed by a plurality of physical servers; or, the server is a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms.
In another possible implementation, the computer device includes a terminal and a server. FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application, referring to FIG. 1, the implementation environment includes: the plurality of terminals 101 and the server 102, and the plurality of terminals 101 and the server 102 can be directly or indirectly connected by wired or wireless communication, respectively, and the present application is not limited thereto.
Optionally, the terminal 101 is provided with any application client, and the application client is an instant messaging client, an information sharing client, a shopping client, or the like. The server 102 is a server for providing services to any application client. A user logs in the any application client based on an account number on the terminal 101, a service is executed in the any application client, the terminal 101 generates a system hardware log and a service content log corresponding to the account number according to the behavior of the user, the system hardware log and the service content log are uploaded to the server 102, and the server 102 performs classification processing according to the system hardware logs and the service content logs of a plurality of accounts uploaded by the plurality of terminals 101 to obtain the category of each account number.
The method provided by the embodiment of the application can be applied to scenes for classifying the account numbers.
For example, in an abnormal account identification scenario, an administrator wants to distinguish whether a plurality of accounts are normal accounts or abnormal accounts, respectively, by adopting the account classification method provided by the embodiment of the application, the belonging category corresponding to the plurality of accounts can be obtained based on the system hardware logs and the service content logs corresponding to the plurality of accounts, and the belonging category comprises the normal account or the abnormal account.
Because the account classification method provided by the embodiment of the application is based on classification processing of the system hardware log and the service content log of the account, the hardware level information and the service level information of the account are considered, and the considered information is more comprehensive, so that the obtained result is more accurate.
The method provided by the embodiment of the application can also be applied to any other scene for classifying accounts, and the embodiment of the application is not limited herein.
Fig. 2 is a flowchart of an account classification method according to an embodiment of the present application, where an execution subject is a server, and referring to fig. 2, the method includes:
201. and acquiring user characteristics corresponding to the plurality of first accounts based on the system hardware logs and the service content logs corresponding to the plurality of first accounts.
The system hardware log is used for recording equipment information of equipment logged in by the account, the service content log is used for recording service content executed by the account, namely, the system hardware log is used for recording information of a hardware layer, and the service content log is used for recording information of a service content layer.
The system hardware logs and the service content logs corresponding to the plurality of first account numbers refer to: and the system hardware logs correspond to the plurality of first account numbers and the business content logs correspond to the plurality of first account numbers.
The system hardware logs corresponding to the plurality of first account numbers refer to: the system hardware log of each first account in the plurality of first accounts, and the service content log corresponding to the plurality of first accounts refers to: and the service content log of each first account in the plurality of first accounts.
The user characteristics are used for representing the actions of the account number on the login equipment and corresponding business content, namely, the user characteristics can represent the characteristics of a hardware level and the characteristics of a business content level.
202. And acquiring equipment characteristics corresponding to a plurality of equipment identifiers based on the system hardware logs corresponding to the plurality of first account numbers, wherein the plurality of equipment identifiers are equipment identifiers in the system hardware logs corresponding to the plurality of first account numbers.
Wherein the device characteristics are used to represent the behavior that occurs on the device.
203. Based on the user characteristics corresponding to the plurality of first account numbers and the equipment characteristics corresponding to the plurality of equipment identifiers, generating a graph structure, wherein one node in the graph structure corresponds to one account number or one equipment, and an edge between two nodes is used for indicating that the account numbers and the equipment corresponding to the two nodes correspond to the same log.
And generating a graph structure based on the user characteristics corresponding to the plurality of first accounts and the equipment characteristics of the plurality of equipment identifiers, namely generating nodes corresponding to each first account and the equipment identifier, connecting the two nodes if the accounts and the equipment corresponding to the two nodes correspond to the same log, and adding the user characteristics of the plurality of first accounts and the equipment characteristics of the plurality of equipment identifiers to the corresponding nodes to enable the graph structure to comprise more comprehensive information.
204. And classifying based on the graph structure to obtain the category corresponding to the plurality of first account numbers.
The categories corresponding to the plurality of first accounts include a normal account, an abnormal account, a suspicious account, or the like, which is not limited in the embodiment of the present application.
According to the account classification method provided by the embodiment of the application, the user characteristics corresponding to a plurality of accounts are extracted from the system hardware logs and the service content logs, so that the user characteristics can represent the actions of the accounts on the logged-in equipment and the service content executed by the accounts, the information covered by the user characteristics is enriched, the graph structure is generated based on the user characteristics and the equipment characteristics, the information in the graph structure is enriched, and when the classification is carried out based on the graph structure, the actions of the user on the terminal equipment and the corresponding service content can be combined, the classification is carried out from multiple aspects, and the obtained category is more accurate.
Fig. 3 is a flowchart of an account classification method according to an embodiment of the present application, where an execution subject is a server, and referring to fig. 3, the method includes:
301. and acquiring system hardware logs and business content logs corresponding to the plurality of first account numbers.
The terminal is provided with an application client, and a user can log in the application client based on an account number, and execute a service in the application client to generate a system hardware log and a service content log of the account number.
The account is an account registered in the application client, and the account is used for determining a unique corresponding user and can represent the identity of the user, and optionally, the account is a nickname of the user, an identity card number of the user, a mobile phone number of the user and the like.
The system hardware log is used for recording equipment information of equipment logged in by an account, for example, when a user logs in a terminal B based on the account A, the terminal B generates a corresponding system hardware log as follows: (account number: a, time: 8/2020, 1/12/00, IP address: 125.11.250.250, service: login), wherein IP address: 125.11.250.250 is the IP address of terminal B.
The system hardware log of the first account includes a plurality of logs recorded according to the service executed by the first account, optionally, the system hardware log includes an account and a device identifier, where the account is used to determine a unique corresponding user, for example, the account is QIMEI (a mobile user identifier). The device identity is used to determine a uniquely corresponding device, e.g. the device identity is an IP (Internet Protocol, international protocol) address, or a MAC (Media Access Control, medium access control) address, or an IMSI (International Mobile Subscriber Identification Number, international mobile subscriber identity), or an IMEI (International Mobile Equipment Identity ), etc.
The service content log is used for recording service content of a service executed by the account. For example, the user executes a comment service on the terminal B based on the account a, the comment content is "the skirt is really good", and the terminal B generates a corresponding service content log as follows: (Account: A, time: month 8 of 2020, day 1: 13:00, business: comment, business content: this skirt is really good); optionally, the terminal B may also generate a system hardware log according to the comment service, where the system hardware log is: (Account: A, time: 8/2020/1/13/00, IP address: 125.11.250.250, service: comment), that is, the terminal can generate a corresponding system hardware log and service content log according to one operation of the user.
The service content log of the first account includes a plurality of logs recorded according to the service executed by the first account. Optionally, the service content log includes a service type and service content, where the service type is any service that can be executed on the terminal, such as login, praise, comment, or view, and the service content includes an object of praise, an object of comment, or comment information, or video identifier of a video being viewed, or duration of viewing the video, and the like.
Optionally, the system hardware log and the service content log are generated by the terminal according to the service operation of the user, and the terminal uploads the system hardware log and the service content log generated in the time period to the server at regular intervals, or uploads the generated system hardware log and service content log to the server after the terminal generates a certain number of system hardware logs or service content logs.
Optionally, the system hardware log and the service content log are generated by a server, when a user executes a service on a terminal, a service request is sent to the server, the service request carries an account number, a device identifier and a service type, and the server generates the corresponding system hardware log and service content log according to the service request, wherein the system hardware log comprises the account number, the device identifier and the service type; the service content log includes service types and service content. For example, a user logs in an application client on a terminal, the terminal sends a login request to a server, the login request carries a device identifier and an account number of the terminal, and after receiving the login request, the server generates a system hardware log and a service content log according to the login request, wherein the system hardware log comprises a login, the account number and the device identifier; the business content log comprises login and login time.
Optionally, when the terminal or the server generates the log, the log is generated according to a format corresponding to the log, for example, the format of the system hardware log is: user + time + device identification + service type; the format of the service content log is: user + time + service type + service content. In some cases, however, the generated log does not completely conform to the format of the log, for example, the system hardware log does not include at least one of a user, time, device identifier or service type, and since the user feature cannot be better extracted from the log with incomplete information, optionally, the system hardware log and service content log corresponding to the plurality of first account numbers are acquired, including: and screening the original system hardware logs and the original service content logs corresponding to the plurality of first account numbers according to the format corresponding to the logs to obtain the system hardware logs and the service content logs of the plurality of first account numbers.
The screening of the original system hardware logs and the original service content logs corresponding to the plurality of first account numbers according to the format corresponding to the logs means that: and screening incomplete logs in the original system hardware logs and the business content logs according to the format corresponding to the logs, and finishing data cleaning.
Because the user may execute the service on the terminal multiple times per day, the logs generated according to the service executed by the user are also more, and in view of less help of the log with a longer current time distance to classification, in one possible implementation manner, the system hardware logs and service content logs corresponding to the multiple first accounts are obtained, including: and acquiring the system hardware logs and the business content logs which are generated in the target time period and correspond to the plurality of first account numbers.
Optionally, the target time period is a time period preset long from the current time, and by acquiring the logs in the target time period, the number of logs to be processed is reduced on the basis of ensuring accurate classification results, namely, the data processing amount is reduced, and the classification efficiency is improved.
The system hardware log and the service content log comprise time, the time represents the time of executing the service by the user, if the time included in the system hardware log belongs to a target time period, the system hardware log is reserved, and if the time included in the system hardware log does not belong to the target time period, the system hardware log is discarded; if the time included in the service content log belongs to the target time period, the service content log is reserved, and if the time included in the service content log does not belong to the target time period, the service content log is discarded.
302. And acquiring the equipment behavior characteristics corresponding to the plurality of first account numbers based on the system hardware logs corresponding to the plurality of first account numbers.
The device behavior feature is used for representing the behavior of the account number on the login device. If there are some user organizations that have abnormal operations in the application client, the organization would normally have the organization leader take the organization member to perform the abnormal operations, which would result in the organization member performing the consistent operations, e.g., organization leader 15 in 2020, 7, 1: 33, comment on a certain video provided by the application client to improve popularity of the video, a plurality of organization members will be 15 in 2020, 7, 1: 35 executing the business commented in the video back and forth; organization leader 15 in 2020, 7 months 1: 43 to issue a task to view another video provided by the application client to enhance popularity of the video, a plurality of organization members will be 15 on 7 months 1 days 2020: 45 performs the task of viewing the video back and forth.
Therefore, the behaviors of the organization members on the registered devices are almost consistent, and the user organization performing abnormal operation can be discovered according to the same device behavior characteristics by acquiring the device behavior characteristics of the user.
Optionally, since the system hardware log of the first account includes at least one type of device identifier, the device behavior feature is used to represent the behavior of the account occurring on the logged-in at least one type of device. Alternatively, the device behavior features are represented in a vector fashion.
Because the behavior of the account number on the login device occurs according to the time sequence, optionally, when the behavior characteristics of the device are acquired, a memory model is adopted to process the system hardware logs arranged according to the time sequence, so as to obtain the behavior of the account number on the login device according to the time sequence.
In one possible implementation manner, based on the system hardware logs corresponding to the plurality of first account numbers, obtaining the device behavior characteristics corresponding to the plurality of first account numbers includes the following steps (a) to (c).
(a) And acquiring a target system hardware log containing the same equipment identifier from the system hardware log of any first account.
(b) And calling a memory model to perform feature extraction on the target system hardware log to obtain the behavior features of any one first account on the equipment.
The memory model is a model capable of extracting time sequence characteristics when extracting behavior characteristics, for example, an RNN (Recurrent Neural Network ) model, or an LSTM (Long Short-Term Memory networks, long-term memory network) model, or a GRU RNN (Gated Recurrent Units Recurrent Neural Network, gated recurrent neural network) model, a CW RNN (Clockwork Recurrent Neural Network, recurrent neural network) model, or the like.
As the memory model can extract time sequence characteristics when extracting behavior characteristics, optionally, when calling the memory model to extract characteristics of the target system hardware logs, the target system hardware logs are arranged in time sequence.
For example, the device type is an IP address, and the following three IP address logs are logged on by a user a on the day of 2019, 11, 15:
(1) The user: a, time: 10.15.12:00, IP address: 125.11.250.250
(2) The user: a, time: 2019, 10, 15, 13:00, IP address: 10.11.250.51
(3) The user: a, time: 15:00, 10.15.2019, ip address: 115.11.25.25
The IP behavior sequence of the user a is: (user: A, IP behavior sequence: 125.11.250.250,10.11.250.51,115.11.25.25), invoking a memory model to perform feature extraction on the IP behavior sequence of the user A so as to obtain the IP behavior feature of the user A.
Optionally, invoking a memory model to perform feature extraction on the target system hardware log to obtain behavior features of the any one first account on the device, including: inputting a target system hardware log into the memory model, wherein the target system hardware log comprises t logs which are arranged according to time sequence, and t is any integer greater than or equal to 1; calling the memory model, and executing the following operations on each log in the target system hardware log in turn: according to the first characteristic h output at the last moment of the memory model t-1 And a log x of current processing inputs t Determining a forgetting door f t Input gate i t Output door o t Candidate states of memory cellsAccording to the first feature, the forgetting door f t The input gate i t And the candidate state->Updating the state c of the memory cell t Based on the updated state of the memory cell and the output gate o t Determining the corresponding characteristic h of the log t The method comprises the steps of carrying out a first treatment on the surface of the And taking the last characteristic output by the memory model as the behavior characteristic of any first user on the equipment.
Wherein forget door f t For use inDetermining the cell state c of the memory cell at the previous time t-1 How much of the cell state c remains in the memory cell at the current time t The method comprises the steps of carrying out a first treatment on the surface of the Input gate i t Cell state c for determining how much of the characteristics of the log of the network input at the current time are saved to the memory cell t The method comprises the steps of carrying out a first treatment on the surface of the The output gate is used for determining the cell state c of the memory cell t How much output is to feature h t Is a kind of medium. Through the memory model, a longer time sequence dependency relationship can be established according to the behavior sequence of the user, and finally the output characteristics are behavior characteristics for expressing time sequence behaviors.
Optionally, as shown in fig. 4, the memory model includes a circulation unit 401, where the circulation unit 401 is configured to output a first feature h according to a previous time of the memory model t-1 And a log x of current processing inputs t Determining a forgetting door f t Input gate i t Output door o t Candidate states of memory cellsAccording to the first feature, the forgetting door f t The input gate i t And the candidate state->Updating the state c of the memory cell t Based on the updated state of the memory cell and the output gate o t Determining the corresponding characteristic h of the log t
Wherein f t =σ(W f ·[h t-1 ,x t ]+b f );
i t =σ(W i ·[h t-1 ,x t ]+b i );
o t =σ(W o ·[h t-1 ,x t ]+b o );
W f Is a forgetting door f t Weight matrix of b) f Is a forgetting door f t Bias term, W of i Is the input gate i t Weight matrix of b) i Is the input gate i t Bias term, W of o Is an output gate o t Weight matrix of b) o Is an output gate o t Bias term of [ h ] t-1 ,x t ]Representing the characteristic h t-1 And feature x t Connection, σ is a sigmoid (activation) function, W c Is a candidate stateWeight matrix of b) c Is a candidate state->Tan h is an activation function.
( c ) And fusing the behavior characteristics of any one of the first account numbers on a plurality of devices to obtain the device behavior characteristics of any one of the first account numbers.
Optionally, fusing the behavior features of the any first account on the multiple devices to obtain the device behavior feature of the any first account, including: superposing the behavior characteristics of any one of the first account numbers on a plurality of devices to obtain the device behavior characteristics of any one of the first account numbers; or extracting the behavior characteristics of any first account on a plurality of devices to obtain the device behavior characteristics of any first account.
303. And acquiring service content characteristics corresponding to the plurality of first account numbers based on the service content logs corresponding to the plurality of first account numbers.
The service content characteristics are used for representing the service content of the account execution service, and optionally, the service content characteristics are represented in a vector mode. According to the service content in the service content log, whether the account has abnormal behavior can be known, so that the service content characteristics obtained according to the service content log can also indicate whether the account has abnormal behavior.
For example, the service content log of the first account includes playing time lengths of a plurality of videos watched by the first account, where the playing time lengths are respective playing time lengths of each video, and if playing time lengths of a large number of videos are smaller than a preset threshold, the first account is suspected of having a video brushing.
The service content log of any first account may include a plurality of logs, and optionally, based on the service content logs corresponding to the plurality of first accounts, acquiring service content features corresponding to the plurality of first accounts includes: and for any first account, extracting features of a plurality of business content logs of the any first account to obtain business content features corresponding to the business content logs, and fusing the business content features corresponding to the business content logs to obtain the business content features of the any first account. And obtaining the service content characteristics corresponding to the plurality of first account numbers by executing the operation of obtaining the service content characteristics on the plurality of first account numbers.
Or, based on the service content logs corresponding to the plurality of first account numbers, acquiring the service content features corresponding to the plurality of first account numbers includes: and for any first account, carrying out aggregation processing on a plurality of business content logs of the any first account to obtain target business content, and carrying out characteristics on the target business content to obtain business content characteristics of the any first account. By firstly carrying out aggregation processing on the business content logs, the processing amount of the feature processing step is reduced, and the processing speed is improved.
The method for aggregating the multiple business content logs of any first account number to obtain target business content comprises the following steps: selecting a target service log comprising the same service type, carrying out aggregation processing on at least one log in the target service log to obtain first service content, and carrying out fusion processing on the first service content corresponding to multiple service types to obtain target service content.
For example, the log of playing video by the user a has 5 pieces, and the 5 pieces of service content logs are respectively:
(1) A user A; 11:00 on month 1 and 8 of 2020; watching a video; a video identifier A; playing for 6 minutes; the total duration of the video A is 6 minutes;
(2) A user A; 11:06 at 1 month and 8 days 2020; watching a video; a video identification B; playing for 3 minutes; video B total duration 8 minutes;
(3) A user A; 11:09 on month 1 and 8 of 2020; watching a video; a video identifier C; playing for 3 minutes; video C total duration 3 minutes;
(4) A user A; 11:12 on 1 month and 8 days 2020; watching a video; a video identifier D; the playing time is 10 seconds; video D total duration 5 minutes;
(5) A user A; 11:12 on 1 month and 8 days 2020; watching a video; a video identifier E; playing for 6 minutes; video B total duration 6 minutes.
The aggregation processing of the 5 business content logs can obtain that a user A watches 5 videos in a time period from 11:00 of 8 days of 1 month in 2020 to 11:18 of 8 days of 1 month in 2020, wherein the 5 videos are respectively video A, video B, video C, video D and video E, and the average playing time of the 5 videos is 3 minutes and 38 seconds.
Optionally, the target service content includes a multimedia data playing duration, or comment quantity, or comment content, etc., and the embodiment of the present application does not limit the target service content.
304. And splicing the equipment behavior characteristics and the business content characteristics corresponding to the plurality of first account numbers to obtain user characteristics corresponding to the plurality of first account numbers.
The user characteristics are used for representing the actions of the account number on the login device and corresponding business content.
For example, the device behavior feature of the first account is (1,2,5,0,4), the service content feature of the first account is (3,7,6,1,0), and the device behavior feature and the service content feature are spliced to obtain the user feature (1,2,5,0,4,3,7,6,1,0).
305. And acquiring equipment characteristics corresponding to a plurality of equipment identifiers based on the system hardware logs corresponding to the plurality of first account numbers, wherein the plurality of equipment identifiers are equipment identifiers in the system hardware logs corresponding to the plurality of first account numbers, and the equipment characteristics are used for representing actions occurring on equipment.
Optionally, at least one type of device identifier, such as an IP address, or a MAC address, or an IMSI, or an IMEI, etc., is included in the system hardware log, so that a user node may be connected to multiple device nodes, as shown in fig. 5.
Optionally, acquiring the device feature of any device identifier includes: and acquiring a first system hardware log containing any equipment identifier from the system hardware logs corresponding to the plurality of first account numbers, and extracting the characteristics of the first system hardware log to obtain the equipment characteristics of any equipment identifier.
Optionally, feature extraction is performed on the first system hardware log to obtain device features of the any device identifier, including: and acquiring the service type of the service executed by any device corresponding to the device identifier according to the first system hardware log, and extracting the characteristics according to the service type of the service executed by the device to obtain the device characteristics of the device. For example, the device is a terminal such as a computer, the service is a call making service, if the device performs the call making service, some illegal software may be installed on the device, and the user performs the illegal operation based on the illegal software.
Optionally, feature extraction is performed on the first system hardware log to obtain device features of the any device identifier, including: and acquiring the times of executing the target type service by the equipment corresponding to any equipment identifier according to the first system hardware log, and extracting the characteristics according to the times of executing the target type service by the equipment to obtain the equipment characteristics of any equipment identifier. Optionally, the target type service is a service that the device identifies cannot be performed by the corresponding device. For example, the device is a terminal such as a computer, the service is a call making, if the device performs the call making service multiple times, some violation software may be installed on the device, and the user performs the violation operation based on the violation software. Feature extraction is performed by the number of illegal operations, so that the situation that the user is identified as an abnormal user or a cheating user or the like is avoided due to log generation errors or the fact that the user is only tried and is not malicious is avoided.
306. Based on the user characteristics corresponding to the plurality of first accounts and the equipment characteristics of the plurality of equipment identifiers, generating a graph structure, wherein one node in the graph structure corresponds to one account or one equipment, and an edge between two nodes is used for representing that the accounts and the equipment corresponding to the two nodes correspond to the same log.
Wherein the graph structure is a structure composed of a plurality of nodes and edges between two nodes of the plurality of nodes.
Optionally, generating a graph structure based on the user features corresponding to the plurality of first account numbers and the device features of the plurality of device identifications includes: generating nodes in the graph structure based on the plurality of first account numbers and the plurality of device identifications; adding edges between nodes in the graph structure based on the relationship between the account number and the device indicated by the system hardware log; associating user characteristics corresponding to the plurality of first account numbers to corresponding nodes in the graph structure; the device characteristics corresponding to the plurality of device identifications are associated to corresponding nodes in the graph structure.
Therefore, the characteristics of the plurality of first account numbers at the hardware level and the characteristics of the service level can be fused into the graph structure, and more comprehensive information can be considered to obtain more accurate results when the first account numbers are classified according to the graph structure.
As shown in fig. 5, the graph structure 501 includes a plurality of user nodes and a plurality of device nodes, and each user node is connected to the plurality of device nodes.
Optionally, adding edges between nodes in the graph structure based on the relationship between the account number and the device indicated by the system hardware log includes: if the account number and the equipment identifier of the equipment exist in the same system hardware log, adding an edge between two nodes corresponding to the account number and the equipment identifier.
After obtaining the graph structure, the server may classify the graph structure to obtain the category corresponding to the plurality of first account numbers. In one possible implementation manner, the classifying, by the server, based on the graph structure, to obtain the category to which the plurality of first account numbers correspond, including: and calling a first account classification model to classify the graph structure to obtain the category corresponding to the plurality of first accounts. Optionally, the first account classification model is called, and the graph structure is classified, so that the obtaining of the belonging categories corresponding to the plurality of first accounts can be achieved through the following steps 307 to 309.
307. And calling at least one neighbor aggregation layer of the first account classification model, performing aggregation treatment on each node in the graph structure to obtain a treated graph structure, and inputting the treated graph structure into a prediction layer.
The first account classification model comprises at least one neighbor aggregation layer and a prediction layer, wherein the neighbor aggregation layer is used for aggregating each node in the graph structure with a corresponding neighbor node respectively so that each node has richer characteristics; therefore, the more the neighbor aggregation layers are, the more the features each node is fused to, the more the relationship between the user and the device and between the user and the user can be detected, and the more accurate the obtained classification result is. The prediction layer is used for classifying each user node according to the characteristics of each user node in the graph structure to obtain the category of each user.
Calling at least one neighbor aggregation layer, and performing aggregation treatment on each node in the graph structure to obtain a treated graph structure, wherein the method comprises the following steps: and calling any neighbor aggregation layer, carrying out aggregation treatment on each node in the input graph structure to obtain a treated graph structure, and inputting the treated graph structure to the next layer.
Optionally, any neighbor aggregation layer aggregates each node in the input graph structure to obtain a processed graph structure, which can be realized by the following steps: calling any neighbor aggregation layer, and executing the following operations on each node in the input graph structure:
Determining at least one neighbor node of the node, acquiring node characteristics of the at least one neighbor node, and carrying out fusion processing on the node characteristics of the node and the node characteristics of the at least one neighbor node to obtain a first characteristic of the node; and updating the node characteristics of each node into corresponding first characteristics to obtain a processed graph structure. And by carrying out aggregation processing on each node and the neighbor nodes, the characteristics included by each node are richer.
If the node is a user node, the node characteristic of the node is the user characteristic of the user node; if a node is a device node, the node characteristics of that node are the device characteristics of the device node.
For example, 3 neighbor nodes exist in any user node, the 3 neighbor nodes are all equipment nodes, the feature of one equipment node in the 3 equipment nodes indicates that the equipment executes the violation operation, and the user corresponding to the user node also executes the violation operation on the equipment, so that suspicion that the user corresponding to the user node has the violation should rise.
Similarly, when the neighboring node of the equipment node is a user node and the node characteristics of the equipment node are fused with the neighboring node, the illegal suspicion of the equipment node is also affected by the neighboring node.
Optionally, determining at least one neighbor node of the node includes: determining all neighbor nodes of the node; or sampling from the neighbor nodes of the node to obtain at least one neighbor node of the node. When sampling is carried out from the neighbor nodes of the node, the server randomly selects the neighbor nodes with target quantity as the neighbor nodes of subsequent aggregation processing; or the server selects the neighbor nodes of the nodes according to the types of the neighbor nodes.
In one possible implementation, determining at least one neighbor node of the node includes: from each type of neighbor node of the node, a target number of nodes is selected. Wherein the target number is any one of 1, 2, 3, etc.; the target number is a default value for the system or a value set by a supervisor.
Optionally, the system hardware log includes at least one device identifier, where the at least one device identifier includes at least one of an IP address, a MAC address, an IMSI, or an IMSI, and when generating the graph structure, the server may perform at least one of the following operations: adding a first type identifier in a node corresponding to the IP address; adding a second type identifier in the node corresponding to the MAC address; adding a third type identifier in a node corresponding to the IMSI; and adding a fourth type identifier in the node corresponding to the IMSI. Optionally, the server adds a fifth type identifier in the node corresponding to the account.
Optionally, fusing the node characteristics of the node with the node characteristics of the at least one neighboring node to obtain a first characteristic of the node, including: and extracting the node characteristics of the node and the node characteristics of the at least one neighbor node to obtain a first characteristic of the node. The first feature is extracted based on the node feature of the node and the node feature of at least one neighboring node, so that the first feature comprises the node feature of the node and the node feature of at least one neighboring node.
Or, fusing the node characteristics of the node with the node characteristics of the at least one neighboring node to obtain a first characteristic of the node, including: and superposing the node characteristics of the node and the node characteristics of the at least one neighbor node, and extracting the characteristics after superposition to obtain a first characteristic of the node.
308. And calling a prediction layer to classify the processed graph structure to obtain the category corresponding to the plurality of first account numbers.
The processed graph structure is input into a prediction layer, and the prediction layer can classify the processed graph structure to obtain the category corresponding to the plurality of first account numbers.
Optionally, invoking a prediction layer to classify the processed graph structure to obtain the category corresponding to the plurality of first account numbers, including: and calling a prediction layer, and classifying the characteristics of each user node in the processed graph structure to obtain the category of each first account.
The first account classification model is used for classifying the accounts into two categories, and optionally, the two categories are a normal account and an abnormal account or a normal account and a suspicious account.
It should be noted that, in the embodiment of the present application, the classification processing of the graph structure is described only by taking the first account classification model as an example, in other embodiments, the graph structure is directly processed without calling the first account classification model, and in one possible implementation manner, classification is performed based on the graph structure to obtain the category corresponding to the plurality of first accounts, including: and carrying out at least one aggregation treatment on each node in the graph structure to obtain a treated graph structure, and carrying out classification treatment on the characteristics of each user node in the treated graph structure to obtain the category corresponding to the plurality of first account numbers.
It should be noted that, the account classification method provided by the embodiment of the application can be applied to auditing anti-cheating service, products using the service can be selectively accessed into a lighthouse SDK (Software Development Kit ) or logs containing necessary fields are provided, the auditing can extract corresponding equipment behavior characteristics and business content characteristics according to the logs reported by the SDK or the logs provided by the terminal, a graph structure is constructed, and classification processing is performed based on the graph structure to obtain the category corresponding to a plurality of accounts.
According to the account classification model provided by the embodiment of the application, the user characteristics of a plurality of accounts are extracted from the system hardware logs and the service content logs, so that the user characteristics can represent the actions of the accounts on the login equipment and the service content executed by the accounts, the information covered by the user characteristics is enriched, the graph structure is generated based on the user characteristics and the equipment characteristics, the information in the graph structure is enriched, and when the classification is carried out based on the graph structure, the actions of the user on the terminal equipment and the corresponding service content can be combined, the classification is carried out from multiple aspects, and the obtained category corresponding to the plurality of first accounts is more accurate.
In addition, the behavior characteristics of any account on the equipment are extracted through the memory model, the characteristics of a plurality of behaviors executed by the user according to time sequence can be extracted, and as a plurality of cheating members exist in the cheating group and the same operation is executed in a plurality of same time periods, the behavior characteristics are beneficial to mining the cheating group, and account numbers can be more accurately classified.
In addition, the application not only can aggregate the equipment characteristics of the equipment nodes with the connection relationship, but also can aggregate the user characteristics of the user nodes with the indirect connection relationship by carrying out the aggregation processing on the graph structure twice, namely, not only the relationship between the users and the equipment is considered, but also the relationship between the users is considered, so that account numbers can be more accurately classified.
It should be noted that, in the foregoing steps 305 to 307, the classification processing is performed by using the trained first account classification model as an example, the first account classification model is described, and before the foregoing steps 305 to 307, the first account classification model is further trained, so that the first account classification model outputs an accurate classification result, and the following description is given to the training process of the first account classification model:
Fig. 6 is a flowchart of a first account classification model training method provided by an embodiment of the present application, where an execution body of the embodiment of the present application is a server, and referring to fig. 6, the method includes:
601. and acquiring system hardware logs and business content logs corresponding to the plurality of second account numbers.
The method for obtaining the system hardware logs and the service content logs corresponding to the plurality of second accounts is similar to the method for obtaining the system hardware logs and the service content logs corresponding to the plurality of first accounts in step 301, and is not described in detail herein.
The second account is a sample account, optionally, the second account and the first account are different accounts, or the second account and the first account are the same account.
602. And classifying the system hardware logs corresponding to the plurality of second accounts to obtain a first sample category of each second account, wherein the first sample category comprises an abnormal account or a normal account.
In step 602, the plurality of second accounts are classified into the abnormal accounts or the normal accounts only according to the system hardware logs corresponding to the plurality of second accounts.
Optionally, the step 602 may be performed by invoking a second account classification model, for example, by classifying system hardware logs corresponding to the plurality of second accounts to obtain a first sample class of each second account, including: and calling a second account classification model, and classifying the system hardware logs corresponding to the plurality of second accounts to obtain a first sample class of each second account. The second account classification model is used for dividing the account into a normal account or an abnormal account, for example, the normal account is 1, and the abnormal account is 0; alternatively, the normal account number is 0 and the abnormal account number is 1. The second account classification model is a model with more accurate output results after training.
Optionally, the second account classification model is an audit model. The embodiment of the application does not limit the second account classification model.
603. And classifying the service content logs corresponding to the plurality of second accounts to obtain a second sample category of each second account, wherein the second sample category comprises a normal account or a suspicious account.
The step 602 is to divide the plurality of second accounts into normal accounts or suspicious accounts according to the service content logs corresponding to the plurality of second accounts.
Optionally, the step 602 may be performed by invoking a third account classification model, for example, by performing classification processing on service content logs corresponding to a plurality of second accounts, to obtain a second sample class of each second account, including: and calling a third account classification model, and classifying the service content logs corresponding to the plurality of second accounts to obtain a second sample category of each second account. The third account classification model is used for dividing the account into a normal account or a suspicious account, for example, the normal account is 1, and the suspicious account is 0; or the normal account number is 0, and the suspicious account number is 1. The second account classification model is a model with more accurate output results after training.
604. Based on the system hardware logs, the service content logs, the first sample categories and the second sample categories corresponding to the plurality of second accounts, the first account classification model is used for classifying, and the prediction categories corresponding to the plurality of second accounts are obtained.
Optionally, based on the system hardware logs, the service content logs, the first sample categories and the second sample categories corresponding to the plurality of second accounts, classification processing is performed through the first account classification model to obtain prediction categories corresponding to the plurality of second accounts, including: acquiring user characteristics corresponding to the plurality of second accounts based on the system hardware logs and the service content logs corresponding to the plurality of second accounts; acquiring equipment characteristics of a plurality of equipment identifiers based on system hardware logs corresponding to a plurality of second account numbers, wherein the equipment identifiers are equipment identifiers in the system logs corresponding to a plurality of first account numbers; generating a sample graph structure based on system hardware logs and user features corresponding to the plurality of second account numbers, device features of the plurality of device identifications, corresponding first sample categories and corresponding second sample categories; and calling a first account classification model to classify based on the sample graph structure, and obtaining prediction categories corresponding to the plurality of second accounts.
The obtaining of the user features corresponding to the plurality of second accounts based on the system hardware logs and the service content logs corresponding to the plurality of second accounts is similar to the obtaining of the user features corresponding to the plurality of first accounts based on the system hardware logs and the service content logs corresponding to the plurality of first accounts in steps 302 to 304, and is not described herein in detail.
The obtaining of the device features corresponding to the plurality of device identifiers based on the system hardware logs corresponding to the plurality of second account numbers is similar to the obtaining of the device features corresponding to the plurality of device identifiers based on the system hardware logs corresponding to the plurality of first account numbers in step 305, and will not be described in detail herein.
In addition, the generating a sample graph structure based on the system hardware logs corresponding to the plurality of second account numbers, the user features, the corresponding first sample categories, the corresponding second sample categories, and the device features of the plurality of device identifications is similar to the step 306, and is not described in detail herein, except that in the step 604, the first sample categories and the second sample categories corresponding to the plurality of second account numbers are associated to the corresponding user nodes in the sample graph structure.
Based on the sample graph structure, the first account classification model is invoked to classify, and the prediction categories of the plurality of second accounts are similar to step 307, which is not described in detail herein.
605. And training at least one first neighbor aggregation layer according to the prediction category and the second sample category corresponding to the plurality of second account numbers.
The first neighbor aggregation layer is used for conducting aggregation processing on each node in the sample graph structure, updating the node of each node, and if the first neighbor aggregation layer is trained according to the prediction categories and the second sample categories corresponding to the plurality of second accounts, the first neighbor aggregation layer can pay more attention to whether the user is a suspicious user or not, namely, pay more attention to the characteristics of the third account classification model.
In one possible implementation, training at least one first neighbor aggregation layer according to a prediction category and a second sample category corresponding to a plurality of second account numbers includes: and adjusting model parameters of the at least one first neighbor aggregation layer according to errors between the prediction categories corresponding to the plurality of second account numbers and the corresponding second sample categories so as to enable the errors between the prediction categories and the second sample categories output based on the trained first account number classification model to be converged.
In addition, when training the first neighbor aggregation layer, further considering the first sample categories of the plurality of second accounts, optionally, training at least one first neighbor aggregation layer according to the prediction categories and the second sample categories corresponding to the plurality of second accounts, including: training at least one first neighbor aggregation layer according to the prediction category, the second sample category and the first sample category corresponding to the plurality of second account numbers can enable the first neighbor aggregation layer to pay attention to whether the user is a feature of a suspicious user or not, and pay attention to whether the user is a feature of an abnormal user or not, namely, pay attention to the features paid attention to by the second account number classification model and the third account number classification model.
In addition, when training is performed, different weights can be allocated to the first sample category and the second sample category, so that the first neighbor aggregation layer can pay more attention to the feature focused by the third account classification model or pay more attention to the feature focused by the second account classification model.
Optionally, in the embodiment of the present application, the classification task of the second account classification model is mainly and the classification task of the third account classification model is auxiliary, so that a higher weight is allocated to the first sample class and a lower weight is allocated to the second sample class.
606. And training at least one second neighbor aggregation layer and a prediction layer according to the prediction category and the first sample category corresponding to the plurality of second account numbers.
By training at least one second neighbor aggregation layer according to the prediction category and the first sample category corresponding to the plurality of second account numbers, the at least one second neighbor aggregation layer and the prediction layer pay more attention to whether the user is a feature of an abnormal user or not, and classification can be performed more accurately.
In one possible implementation, training at least one second neighbor aggregation layer and prediction layer according to the prediction category and the first sample category corresponding to the plurality of second account numbers: and adjusting model parameters of the at least one second neighbor aggregation layer and the prediction layer according to errors between the prediction categories corresponding to the plurality of second account numbers and the first sample categories, so that the errors between the prediction categories output based on the trained first account number classification model and the first sample categories are converged.
As shown in fig. 7, the input of the first account classification model includes a first graph structure 701, where the first graph structure 701 includes a plurality of user nodes and device nodes, connection relationships between the plurality of user nodes and the plurality of device nodes, user characteristics of the plurality of user nodes, and device characteristics of the plurality of device nodes. The first neighbor aggregation layer 702 of the first account classification model performs neighbor aggregation processing on the first graph structure 701 to obtain a second graph structure 703, where the first neighbor aggregation layer 702 of fig. 7 shows neighbor aggregation processing on one of the user nodes, and features of the node 5, the node 2, and the node 4 are aggregated into the node 1 as indicated by an arrow in the first neighbor aggregation layer 702. The second graph structure 703 is input to the second neighbor aggregation layer 704 to perform neighbor aggregation processing to obtain a third graph structure, and the third graph structure is input to the prediction layer 705 to obtain the category of the multiple accounts, where the category includes a or B.
In the related art, the coverage of abnormality detection is limited to a certain extent by the second account classification model based on the system hardware log.
It should be noted that, in the embodiment of the present application, only taking the first sample category obtained by the second account classification model and the second sample category obtained by the third account classification model as examples, the process of obtaining the first sample category and the second sample category is illustrated, and in other embodiments, the labeling personnel labels the plurality of second accounts based on the system hardware logs corresponding to the plurality of second accounts, so as to obtain the first sample category corresponding to the plurality of second accounts, where the first sample category is a normal account or an abnormal account; in other embodiments, the labeling personnel performs labeling processing on the plurality of second accounts based on the service content logs corresponding to the plurality of second accounts to obtain a second sample category corresponding to the plurality of second accounts, where the second sample category includes a normal account or a suspicious account.
Fig. 8 is a schematic structural diagram of an account classifying device according to an embodiment of the present application, referring to fig. 8, the device includes: a first acquisition module 801, a second acquisition module 802, a generation module 803, and a classification module 804.
A first obtaining module 801, configured to obtain user characteristics corresponding to a plurality of first accounts based on system hardware logs and service content logs corresponding to the plurality of first accounts, where the user characteristics are used to represent actions of the accounts on the logged-in device and corresponding service content;
A second obtaining module 802, configured to obtain, based on the system hardware logs corresponding to the plurality of first account numbers, device features corresponding to a plurality of device identifiers, where the plurality of device identifiers are device identifiers in the system hardware logs corresponding to the plurality of first account numbers, and the device features are used to represent behaviors occurring on a device;
a generating module 803, configured to generate a graph structure based on user features corresponding to the plurality of first accounts and device features corresponding to the plurality of device identifiers, where one node in the graph structure corresponds to one account or one device, and an edge between two nodes is used to indicate that the accounts and the devices corresponding to the two nodes correspond to the same log;
the classification module 804 is configured to classify based on the graph structure, and obtain the category to which the plurality of first account numbers correspond.
As shown in fig. 9, the generating module 803 optionally includes:
a node generating unit 8031, configured to generate a node in the graph structure based on the plurality of first account numbers and the plurality of device identifiers;
an edge adding unit 8032, configured to add edges between nodes in the graph structure based on the relationship between the account number and the device indicated by the system hardware log;
An associating unit 8033, configured to associate user features corresponding to the plurality of first account numbers to corresponding nodes in the graph structure;
the associating unit 8033 is further configured to associate device features corresponding to the plurality of device identifiers to corresponding nodes in the graph structure.
Optionally, the first obtaining module 801 includes:
a first feature obtaining unit 8011, configured to obtain, based on system hardware logs corresponding to the plurality of first account numbers, device behavior features corresponding to the plurality of first account numbers, where the device behavior features are used to represent behaviors of the account numbers that occur on the logged-in device;
a second feature obtaining unit 8012, configured to obtain, based on service content logs corresponding to the plurality of first account numbers, service content features corresponding to the plurality of first account numbers, where the service content features are used to represent service content executed by the account numbers;
and the splicing unit 8013 is configured to splice the device behavior features and the service content features corresponding to the plurality of first account numbers, so as to obtain user features of the plurality of first account numbers.
Optionally, the first feature obtaining unit 8011 is configured to obtain, from a system hardware log of any first account, a target system hardware log including the same device identifier; calling a memory model to perform feature extraction on the target system hardware log to obtain behavior features of any one first account on the equipment; and fusing the behavior characteristics of any one of the first account numbers on a plurality of devices to obtain the device behavior characteristics of any one of the first account numbers.
Optionally, the classification module 804 is configured to invoke a first account classification model to classify the graph structure, so as to obtain the category to which the plurality of first accounts correspond.
Optionally, the first account classification model includes at least one neighbor aggregation layer and a prediction layer, and the classification module 804 includes:
an aggregation unit 8041, configured to invoke the at least one neighbor aggregation layer, and perform aggregation processing on each node in the graph structure, so as to obtain a processed graph structure;
the prediction unit 8042 is configured to call the prediction layer, classify the processed graph structure, and obtain the category corresponding to the plurality of first account numbers.
Optionally, the aggregation unit 8041 is configured to invoke any neighboring aggregation layer, perform aggregation processing on each node in the input graph structure, obtain a processed graph structure, and input the processed graph structure to a next layer.
Optionally, the first account classification model includes at least one first neighbor aggregation layer and at least one second neighbor aggregation layer, and the apparatus further includes:
the sample obtaining module 805 is configured to obtain sample data, where the sample data includes a system hardware log corresponding to a plurality of second accounts, a service content log, a first sample class and a second sample class, the first sample class is obtained by classifying the system hardware log corresponding to the plurality of second accounts, the first sample class includes an abnormal account or a normal account, the second sample class is obtained by classifying the service content log corresponding to the plurality of second accounts, and the second sample class includes a normal account or a suspicious account;
The classification module 804 is configured to classify, based on the sample data, by using the first account classification model to obtain prediction categories corresponding to the plurality of second accounts;
a first training module 806, configured to train the at least one first neighbor aggregation layer according to the prediction categories and the second sample categories corresponding to the plurality of second account numbers;
a second training module 807 configured to train the at least one second neighbor aggregation layer and the prediction layer according to the prediction category and the first sample category corresponding to the plurality of second account numbers.
It should be noted that: in the account sorting device provided in the above embodiment, when sorting accounts, only the division of the above functional modules is used for illustration, in practical application, the above functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the account classification device and the account classification method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the account classification device and the account classification method are detailed in the method embodiments and are not described herein again.
Fig. 10 is a block diagram of a terminal according to an embodiment of the present application. The terminal 1000 is configured to perform the steps performed by the terminal in the above embodiment, and may be a portable mobile terminal, for example: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal 1000 can also be referred to by other names of user equipment, portable terminal, laptop terminal, desktop terminal, etc.
In general, terminal 1000 can include: a processor 1001 and a memory 1002.
The processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1001 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1001 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and drawing of content that the display screen needs to display. In some embodiments, the processor 1001 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. Memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one program code for execution by processor 1001 to implement the account classification method provided by the method embodiments of the present application.
In some embodiments, terminal 1000 can optionally further include: a peripheral interface 1003, and at least one peripheral. The processor 1001, the memory 1002, and the peripheral interface 1003 may be connected by a bus or signal line. The various peripheral devices may be connected to the peripheral device interface 1003 via a bus, signal wire, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, a display 1005, a camera assembly 1006, audio circuitry 1007, and a power supply 1009.
Peripheral interface 1003 may be used to connect I/O (Input/Output) related at least one peripheral to processor 1001 and memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1001, memory 1002, and peripheral interface 1003 may be implemented on a separate chip or circuit board, as embodiments of the application are not limited in this respect.
Radio Frequency circuit 1004 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Radio frequency circuitry 1004 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. Radio frequency circuitry 1004 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 1004 may also include NFC (Near Field Communication ) related circuitry, which is not limiting of the application.
The display screen 1005 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1005 is a touch screen, the display 1005 also has the ability to capture touch signals at or above the surface of the display 1005. The touch signal may be input to the processor 1001 as a control signal for processing. At this time, the display 1005 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, display 1005 may be one, providing a front panel of terminal 1000; in other embodiments, display 1005 may be provided in at least two, separately provided on different surfaces of terminal 1000 or in a folded configuration; in other embodiments, display 1005 may be a flexible display disposed on a curved surface or a folded surface of terminal 1000. Even more, the display 1005 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display 1005 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
The camera assembly 1006 is used to capture images or video. Optionally, camera assembly 1006 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1006 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuit 1007 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 1001 for processing, or inputting the electric signals to the radio frequency circuit 1004 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple, each located at a different portion of terminal 1000. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1001 or the radio frequency circuit 1004 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuit 1007 may also include a headphone jack.
Power supply 1009 is used to power the various components in terminal 1000. The power source 1009 may be alternating current, direct current, disposable battery or rechargeable battery. When the power source 1009 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 1000 can further include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1011, gyro sensor 1012, pressure sensor 1013, optical sensor 1015, and proximity sensor 1016.
The acceleration sensor 1011 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal 1000. For example, the acceleration sensor 1011 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1001 may control the display screen 1005 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1011. The acceleration sensor 1011 may also be used for the acquisition of motion data of a game or a user.
The gyro sensor 1012 may detect the body direction and the rotation angle of the terminal 1000, and the gyro sensor 1012 may collect the 3D motion of the user to the terminal 1000 in cooperation with the acceleration sensor 1011. The processor 1001 may implement the following functions according to the data collected by the gyro sensor 1012: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
Pressure sensor 1013 may be disposed on a side frame of terminal 1000 and/or on an underlying layer of display 1005. When the pressure sensor 1013 is provided at a side frame of the terminal 1000, a grip signal of the terminal 1000 by a user can be detected, and the processor 1001 performs right-and-left hand recognition or quick operation according to the grip signal collected by the pressure sensor 1013. When the pressure sensor 1013 is provided at the lower layer of the display screen 1005, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1005. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The optical sensor 1015 is used to collect ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the display screen 1005 based on the ambient light intensity collected by the optical sensor 1015. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 1005 is turned up; when the ambient light intensity is low, the display brightness of the display screen 1005 is turned down. In another embodiment, the processor 1001 may dynamically adjust the shooting parameters of the camera module 1006 according to the ambient light intensity collected by the optical sensor 1015.
Proximity sensor 1016, also referred to as a distance sensor, is typically located on the front panel of terminal 1000. Proximity sensor 1016 is used to collect the distance between the user and the front of terminal 1000. In one embodiment, when proximity sensor 1016 detects a gradual decrease in the distance between the user and the front face of terminal 1000, processor 1001 controls display 1005 to switch from the bright screen state to the off screen state; when proximity sensor 1016 detects a gradual increase in the distance between the user and the front of terminal 1000, processor 1001 controls display 1005 to switch from the off-screen state to the on-screen state.
Those skilled in the art will appreciate that the structure shown in fig. 10 is not limiting and that terminal 1000 can include more or fewer components than shown, or certain components can be combined, or a different arrangement of components can be employed.
Fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 1101 and one or more memories 1102, where at least one program code is stored in the memories 1102, and the at least one program code is loaded and executed by the processor 1101 to implement the methods provided in the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
The server 1100 may be configured to perform the steps performed by the server in the account classification method described above.
The embodiment of the application also provides a computer device, which comprises a processor and a memory, wherein at least one program code is stored in the memory, and the at least one program code is loaded by the processor and executes the operations executed in the account classification method of the embodiment.
The embodiment of the application also provides a computer readable storage medium, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor to implement the operations performed in the account classification method of the above embodiment.
Embodiments of the present application also provide a computer program product or computer program comprising computer program code stored in a computer readable storage medium. The processor of the computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code so that the computer device performs the operations performed in the account sorting method of the above-described embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by program code related hardware, where the program may be stored in a computer readable storage medium, and the above storage medium may be a read only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims (16)

1. An account classification method, the method comprising:
based on system hardware logs and service content logs corresponding to a plurality of first account numbers, obtaining user characteristics corresponding to the plurality of first account numbers, wherein the user characteristics are used for representing actions of the account numbers on login equipment and corresponding service contents;
acquiring equipment characteristics corresponding to a plurality of equipment identifiers based on system hardware logs corresponding to the plurality of first account numbers, wherein the plurality of equipment identifiers are equipment identifiers in the system hardware logs corresponding to the plurality of first account numbers, and the equipment characteristics are used for representing actions occurring on equipment;
Generating a graph structure based on user characteristics corresponding to the plurality of first account numbers and equipment characteristics corresponding to the plurality of equipment identifiers, wherein one node in the graph structure corresponds to one account number or one equipment, and an edge between two nodes is used for indicating that the account numbers and the equipment corresponding to the two nodes correspond to the same log;
and calling a first account classification model to classify the graph structure to obtain the category corresponding to the plurality of first accounts.
2. The method of claim 1, wherein the generating a graph structure based on the user characteristics corresponding to the plurality of first account numbers, the device characteristics corresponding to the plurality of device identifications, comprises:
generating nodes in the graph structure based on the plurality of first account numbers and the plurality of device identifications;
adding edges between nodes in the graph structure based on the relationship between the account number and the device indicated by the system hardware log;
associating user characteristics corresponding to the plurality of first account numbers to corresponding nodes in the graph structure;
and associating the device features corresponding to the plurality of device identifications to corresponding nodes in the graph structure.
3. The method of claim 1, wherein the obtaining user features corresponding to the plurality of first accounts based on the system hardware logs and the business content logs corresponding to the plurality of first accounts comprises:
acquiring equipment behavior characteristics corresponding to the plurality of first accounts based on system hardware logs corresponding to the plurality of first accounts, wherein the equipment behavior characteristics are used for representing the behavior of the accounts on the logged-in equipment;
acquiring service content characteristics corresponding to the plurality of first accounts based on service content logs corresponding to the plurality of first accounts, wherein the service content characteristics are used for representing service content executed by the accounts;
and splicing the equipment behavior characteristics and the business content characteristics corresponding to the plurality of first accounts to obtain the user characteristics of the plurality of first accounts.
4. The method of claim 3, wherein the obtaining device behavior features corresponding to the plurality of first accounts based on system hardware logs corresponding to the plurality of first accounts comprises:
acquiring a target system hardware log containing the same equipment identifier from the system hardware log of any first account;
Calling a memory model to perform feature extraction on the target system hardware log to obtain behavior features of any one first account on the equipment;
and fusing the behavior characteristics of any one of the first account numbers on a plurality of devices to obtain the device behavior characteristics of any one of the first account numbers.
5. The method of claim 1, wherein the first account classification model includes at least one neighbor aggregation layer and a prediction layer, and the calling the first account classification model classifies the graph structure to obtain the category corresponding to the plurality of first accounts, including:
invoking the at least one neighbor aggregation layer, and performing aggregation treatment on each node in the graph structure to obtain a treated graph structure;
and calling the prediction layer to classify the processed graph structure to obtain the category corresponding to the plurality of first account numbers.
6. The method of claim 5, wherein invoking the at least one neighbor aggregation layer aggregates each node in the graph structure to obtain a processed graph structure, comprising:
and calling any neighbor aggregation layer, carrying out aggregation treatment on each node in the input graph structure to obtain a treated graph structure, and inputting the treated graph structure to the next layer.
7. The method of claim 1, wherein the first account classification model includes at least one first neighbor aggregation layer and at least one second neighbor aggregation layer, and wherein before invoking the first account classification model to classify the graph structure to obtain the categories to which the plurality of first accounts correspond, the method further comprises:
acquiring sample data, wherein the sample data comprises system hardware logs corresponding to a plurality of second accounts, service content logs, a first sample class and a second sample class, the first sample class is obtained by classifying the system hardware logs corresponding to the plurality of second accounts, the first sample class comprises abnormal accounts or normal accounts, the second sample class is obtained by classifying the service content logs corresponding to the plurality of second accounts, and the second sample class comprises normal accounts or suspicious accounts;
classifying by the first account classification model based on the sample data to obtain prediction categories corresponding to the plurality of second accounts;
training the at least one first neighbor aggregation layer according to the prediction category and the second sample category corresponding to the plurality of second account numbers;
Training the at least one second neighbor aggregation layer and the prediction layer according to the prediction category and the first sample category corresponding to the plurality of second account numbers.
8. An account sorting apparatus, the apparatus comprising:
the first acquisition module is used for acquiring user characteristics corresponding to a plurality of first accounts based on system hardware logs and service content logs corresponding to the plurality of first accounts, wherein the user characteristics are used for representing the actions of the accounts on the login equipment and corresponding service content;
the second acquisition module is used for acquiring equipment characteristics corresponding to a plurality of equipment identifiers based on the system hardware logs corresponding to the plurality of first account numbers, wherein the plurality of equipment identifiers are equipment identifiers in the system hardware logs corresponding to the plurality of first account numbers, and the equipment characteristics are used for representing actions occurring on equipment;
the generating module is used for generating a graph structure based on user characteristics corresponding to the plurality of first account numbers and equipment characteristics corresponding to the plurality of equipment identifiers, wherein one node in the graph structure corresponds to one account number or one equipment, and an edge between two nodes is used for representing that the account numbers and the equipment corresponding to the two nodes correspond to the same business content;
And the classification module is used for calling a first account classification model, classifying the graph structure and obtaining the category corresponding to the plurality of first accounts.
9. The apparatus of claim 8, wherein the generating module comprises:
the node generating unit is used for generating nodes in the graph structure based on the plurality of first account numbers and the plurality of equipment identifiers;
an edge adding unit, configured to add edges between nodes in the graph structure based on a relationship between the account number and the device indicated by the system hardware log;
the association unit is used for associating the user characteristics corresponding to the plurality of first account numbers to corresponding nodes in the graph structure;
the association unit is further configured to associate device features corresponding to the plurality of device identifiers to corresponding nodes in the graph structure.
10. The apparatus of claim 8, wherein the first acquisition module comprises:
the first feature acquisition unit is used for acquiring equipment behavior features corresponding to the plurality of first accounts based on system hardware logs corresponding to the plurality of first accounts, wherein the equipment behavior features are used for representing the behavior of the accounts on the logged-in equipment;
A second feature obtaining unit, configured to obtain, based on service content logs corresponding to the plurality of first account numbers, service content features corresponding to the plurality of first account numbers, where the service content features are used to represent service content executed by an account number;
and the splicing unit is used for splicing the equipment behavior characteristics and the business content characteristics corresponding to the plurality of first account numbers to obtain the user characteristics of the plurality of first account numbers.
11. The apparatus according to claim 10, wherein the first feature acquisition unit is configured to:
acquiring a target system hardware log containing the same equipment identifier from the system hardware log of any first account;
calling a memory model to perform feature extraction on the target system hardware log to obtain behavior features of any one first account on the equipment;
and fusing the behavior characteristics of any one of the first account numbers on a plurality of devices to obtain the device behavior characteristics of any one of the first account numbers.
12. The apparatus of claim 8, wherein the first account classification model comprises at least one neighbor aggregation layer and a prediction layer, the classification module comprising:
The aggregation unit is used for calling the at least one neighbor aggregation layer, and carrying out aggregation treatment on each node in the graph structure to obtain a treated graph structure;
and the prediction unit is used for calling the prediction layer and classifying the processed graph structure to obtain the category corresponding to the plurality of first account numbers.
13. The apparatus of claim 12, wherein the aggregation unit is configured to invoke any neighboring aggregation layer, perform aggregation processing on each node in the input graph structure, obtain a processed graph structure, and input the processed graph structure to a next layer.
14. The apparatus of claim 8, wherein the first account classification model comprises at least one first neighbor aggregation layer and at least one second neighbor aggregation layer, the apparatus further comprising:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring sample data, the sample data comprises system hardware logs corresponding to a plurality of second accounts, service content logs, a first sample class and a second sample class, the first sample class is obtained by classifying the system hardware logs corresponding to the plurality of second accounts, the first sample class comprises an abnormal account or a normal account, the second sample class is obtained by classifying the service content logs corresponding to the plurality of second accounts, and the second sample class comprises a normal account or a suspicious account;
The classification module is used for classifying through the first account classification model based on the sample data to obtain prediction categories corresponding to the plurality of second accounts;
the first training module is used for training the at least one first neighbor aggregation layer according to the prediction categories and the second sample categories corresponding to the plurality of second account numbers;
and the second training module is used for training the at least one second neighbor aggregation layer and the prediction layer according to the prediction categories and the first sample categories corresponding to the plurality of second account numbers.
15. A computer device comprising a processor and a memory having stored therein at least one program code that is loaded and executed by the processor to perform the operations performed in the account sorting method of any of claims 1 to 7.
16. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to perform the operations performed in the account sorting method of any of claims 1 to 7.
CN202010909515.9A 2020-09-02 2020-09-02 Account classification method, device, equipment and storage medium Active CN114201655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010909515.9A CN114201655B (en) 2020-09-02 2020-09-02 Account classification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010909515.9A CN114201655B (en) 2020-09-02 2020-09-02 Account classification method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114201655A CN114201655A (en) 2022-03-18
CN114201655B true CN114201655B (en) 2023-08-25

Family

ID=80644312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010909515.9A Active CN114201655B (en) 2020-09-02 2020-09-02 Account classification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114201655B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209820A (en) * 2019-06-05 2019-09-06 腾讯科技(深圳)有限公司 User identifier detection method, device and storage medium
CN110278175A (en) * 2018-03-14 2019-09-24 阿里巴巴集团控股有限公司 Graph structure model training, the recognition methods of rubbish account, device and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11238368B2 (en) * 2018-07-02 2022-02-01 Paypal, Inc. Machine learning and security classification of user accounts

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110278175A (en) * 2018-03-14 2019-09-24 阿里巴巴集团控股有限公司 Graph structure model training, the recognition methods of rubbish account, device and equipment
CN110209820A (en) * 2019-06-05 2019-09-06 腾讯科技(深圳)有限公司 User identifier detection method, device and storage medium

Also Published As

Publication number Publication date
CN114201655A (en) 2022-03-18

Similar Documents

Publication Publication Date Title
CN110209952B (en) Information recommendation method, device, equipment and storage medium
WO2021155687A1 (en) Target account inspection method and apparatus, electronic device, and storage medium
CN110585726A (en) User recall method, device, server and computer readable storage medium
CN111104980B (en) Method, device, equipment and storage medium for determining classification result
CN112069414A (en) Recommendation model training method and device, computer equipment and storage medium
CN109784351B (en) Behavior data classification method and device and classification model training method and device
CN111291200B (en) Multimedia resource display method and device, computer equipment and storage medium
CN112131473B (en) Information recommendation method, device, equipment and storage medium
CN112749956A (en) Information processing method, device and equipment
CN111708944A (en) Multimedia resource identification method, device, equipment and storage medium
CN114154068A (en) Media content recommendation method and device, electronic equipment and storage medium
CN112423011B (en) Message reply method, device, equipment and storage medium
CN114281936A (en) Classification method and device, computer equipment and storage medium
CN112231768B (en) Data processing method and device, computer equipment and storage medium
CN111931075B (en) Content recommendation method and device, computer equipment and storage medium
CN113570510A (en) Image processing method, device, equipment and storage medium
CN114201655B (en) Account classification method, device, equipment and storage medium
CN113377976B (en) Resource searching method and device, computer equipment and storage medium
CN113763932B (en) Speech processing method, device, computer equipment and storage medium
CN111414496B (en) Artificial intelligence-based multimedia file detection method and device
CN111897709A (en) Method, device, electronic equipment and medium for monitoring user
CN114764480A (en) Group type identification method and device, computer equipment and medium
CN110928913A (en) User display method, device, computer equipment and computer readable storage medium
CN114489559B (en) Audio playing method, audio playing processing method and device
CN114157906B (en) Video detection method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant