CN116992017A - Abnormal body detection method, device, equipment and storage medium - Google Patents

Abnormal body detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN116992017A
CN116992017A CN202210434514.2A CN202210434514A CN116992017A CN 116992017 A CN116992017 A CN 116992017A CN 202210434514 A CN202210434514 A CN 202210434514A CN 116992017 A CN116992017 A CN 116992017A
Authority
CN
China
Prior art keywords
word
main body
flow
feature
subject
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210434514.2A
Other languages
Chinese (zh)
Inventor
田言飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210434514.2A priority Critical patent/CN116992017A/en
Publication of CN116992017A publication Critical patent/CN116992017A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0248Avoiding fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an abnormal main body detection method, device, equipment and storage medium, which relate to the technical field of artificial intelligence and can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like; constructing a main body feature set based on main body operation sequences corresponding to the plurality of target flow objects, wherein the main body feature set comprises flow main body features of the associated flow main bodies corresponding to the plurality of target flow objects; and carrying out abnormality detection on the related flow main body according to the flow main body characteristics to obtain an abnormal main body detection result. The application can obviously improve the accuracy and reliability of abnormal main body detection.

Description

Abnormal body detection method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting an abnormal body.
Background
Along with the development of internet technology, online information popularization becomes a mainstream popularization mode. The popularization owner carries out information pushing through the flow owner responsible for information throwing and releasing, and the flow owner obtains profit of the popularization owner through information drainage to divide into, for example, the click quantity of push information is used as charging performance. For the reasons of obtaining interests or malicious competition by illegal rules and the like, part of traffic main bodies are used for improving exposure, clicking or conversion quantity in a false way through cheating actions such as bill swiping and the like, so that effective detection means are needed for identifying traffic main bodies with abnormal conditions so as to ensure normal popularization of information and maintenance of popularization main rights.
In the prior art, abnormal body detection is generally performed through data information or behavior data of flow bodies, for example, the flow bodies with similar filled or bound data information are determined as bodies of the same category, so that abnormal detection is further realized, or whether the flow bodies are of the same category is judged through exposure/click object coincidence degree among the flow bodies. However, the former method depends on the information provided by the traffic master, so that the interference of false data cannot be avoided, and the latter method has extremely high time cost and computation complexity and poor accuracy.
Disclosure of Invention
The application provides a method, a device, equipment and a storage medium for detecting an abnormal main body, which can obviously improve the accuracy and the reliability of detecting the abnormal main body.
In one aspect, the present application provides a method for detecting an abnormal subject, the method comprising:
acquiring main body operation sequences corresponding to a plurality of target flow objects in a preset period, wherein the main body operation sequences are formed by main body words of associated flow bodies corresponding to the target flow objects, and the associated flow bodies are flow bodies with preset interactive operations between the associated flow bodies and the target flow objects in the preset period;
constructing a main body feature set based on main body operation sequences corresponding to the plurality of target flow objects, wherein the main body feature set comprises flow main body features of associated flow main bodies corresponding to the plurality of target flow objects;
and carrying out anomaly detection on the related flow main body according to the flow main body characteristics to obtain an anomaly main body detection result.
Another aspect provides an abnormal subject detection apparatus, the apparatus comprising:
an operation sequence acquisition module: the method comprises the steps of acquiring a main body operation sequence corresponding to a plurality of target flow objects in a preset period, wherein the main body operation sequence consists of main body words of associated flow bodies corresponding to the target flow objects, and the associated flow bodies are flow bodies with preset interactive operation between the associated flow bodies and the target flow objects in the preset period;
The main body feature set construction module: the flow main feature set is used for constructing a main feature set based on main operation sequences corresponding to the plurality of target flow objects, and the main feature set comprises flow main features of associated flow main bodies corresponding to the plurality of target flow objects;
an abnormality detection module: and the flow main body detection module is used for carrying out abnormality detection on the related flow main body according to the flow main body characteristics to obtain an abnormal main body detection result.
In another aspect, a computer device is provided, the device including a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or the at least one program loaded and executed by the processor to implement an abnormal subject detection method as described above.
Another aspect provides a computer readable storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the abnormal subject detection method as described above.
In another aspect, a server is provided, where the server includes a processor and a memory, where at least one instruction or at least one program is stored, where the at least one instruction or the at least one program is loaded and executed by the processor to implement an abnormal body detection method as described above.
In another aspect, a terminal is provided, where the terminal includes a processor and a memory, where at least one instruction or at least one program is stored, where the at least one instruction or the at least one program is loaded and executed by the processor to implement an abnormal body detection method as described above.
Another aspect provides a computer program product or computer program comprising computer instructions which, when executed by a processor, implement a method of abnormal body detection as described above.
The abnormal body detection method, the device, the equipment, the storage medium, the server, the terminal, the computer program and the computer program product provided by the application have the following technical effects:
according to the technical scheme, a main body operation sequence corresponding to a plurality of target flow objects in a preset period is obtained, the main body operation sequence is composed of main body words of associated flow bodies corresponding to the target flow objects, and the associated flow bodies are flow bodies with preset interactive operations with the target flow objects in the preset period; constructing a main body feature set based on main body operation sequences corresponding to the plurality of target flow objects, wherein the main body feature set comprises flow main body features of the associated flow main bodies corresponding to the plurality of target flow objects; and then carrying out abnormality detection on the related flow main body according to the flow main body characteristics to obtain an abnormal main body detection result. Thus, a main body operation sequence is obtained based on the operation data of the target flow object, the flow main body characteristics capable of representing the flow main body are further constructed, the abnormal detection is carried out independently of the information provided by the flow main body, the reliability and the accuracy of an abnormal detection result are improved, the abnormal detection is carried out based on the flow main body characteristics, the calculation complexity is reduced, the detection efficiency is improved, and the resource occupation is reduced.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of another application environment provided by an embodiment of the present application;
FIG. 3 is a schematic flow chart of a method for detecting an abnormal body according to an embodiment of the present application;
FIG. 4 is a schematic illustration of a body interface for a set of flow bodies provided in accordance with an embodiment of the present application;
FIG. 5 is a flowchart of another method for detecting an abnormal body according to an embodiment of the present application;
FIG. 6 is a flowchart of another method for detecting an abnormal body according to an embodiment of the present application;
FIG. 7 is a flowchart of another method for detecting an abnormal body according to an embodiment of the present application;
FIG. 8 is a visual vector distribution diagram provided by an embodiment of the present application;
FIG. 9 is a visual vector distribution diagram of two-dimensional eigenvectors of a normal flow body provided by an embodiment of the present application;
FIG. 10 is a flowchart of another method for detecting an abnormal body according to an embodiment of the present application;
FIG. 11 is a diagram of a model structure of an initial word vector generation model according to an embodiment of the present application;
FIG. 12 is a schematic diagram of a frame of an abnormal body detection apparatus according to an embodiment of the present application;
fig. 13 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or sub-modules is not necessarily limited to those steps or sub-modules that are expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or sub-modules that are not expressly listed.
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
In recent years, with research and progress of artificial intelligence technology, the artificial intelligence technology is widely applied in a plurality of fields, and the scheme provided by the embodiment of the application relates to the technology of artificial intelligence such as machine learning/deep learning, natural language processing and the like, and is specifically described by the following embodiments.
Referring to fig. 1, fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application, and as shown in fig. 1, the application environment may at least include a terminal 01 and a server 02. In practical applications, the terminal 01 and the server 02 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
The server 02 in the embodiment of the present application may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content distribution networks), and basic cloud computing services such as big data and artificial intelligent platforms.
Specifically, cloud technology (Cloud technology) refers to a hosting technology that unifies serial resources such as hardware, software, networks, etc. in a wide area network or a local area network, so as to implement calculation, storage, processing, and sharing of data. The cloud technology can be applied to various fields such as medical cloud, cloud internet of things, cloud security, cloud education, cloud conference, artificial intelligent cloud service, cloud application, cloud calling, cloud social contact and the like, and is based on cloud computing (closed computing) business model application, and the cloud technology distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information service according to requirements. The network providing the resources is called a ' cloud ', and the resources in the cloud ' are infinitely expandable to the user, and can be acquired, used as required, expanded as required and paid for use as required. As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform is generally called IaaS (Infrastructure as a Service, infrastructure as a service)) platform is established, and multiple types of virtual resources are deployed in the resource pool for external clients to select for use. The cloud computing resource pool mainly comprises: computing devices (which are virtualized machines, including operating systems), storage devices, network devices.
According to the logic function division, a PaaS (Platform as a Service ) layer can be deployed on the IaS layer, a SaaS (Software as a Service ) layer can be deployed on the PaaS layer, and the SaaS can also be directly deployed on the IaS. PaaS is a platform on which software runs, such as a database, web container, etc. SaaS is a wide variety of business software such as web portals, sms mass senders, etc. Generally, saaS and PaaS are upper layers relative to IaaS.
Specifically, the server 02 may include an entity device, may include a network communication sub-module, a processor, a memory, and the like, may include software running in the entity device, and may include an application program and the like.
Specifically, the terminal 01 may include a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, an intelligent voice interaction device, an intelligent home appliance, an intelligent wearable device, a vehicle-mounted terminal device, and other types of entity devices, and may also include software running in the entity devices, such as an application program, and the like.
In the embodiment of the present application, the terminal 01 may be configured to receive a preset interactive operation of a flow object and a flow object, and send object operation data of the preset interactive operation to the server 02, where the object operation data may include operation action information, a flow object identifier, operation time information, and the like. The server 02 is configured to store object operation data, generate a body operation sequence corresponding to a target flow object according to the object operation data, and detect an abnormality of a flow body based on the body operation sequence. Specifically, referring to fig. 2, the server 02 may include an access layer server 021, a data server 022 and an anomaly detection server 023, as shown in fig. 2, the access layer server 021 is configured to receive object operation data reported by a terminal and store the object operation data in a promotion database in the data server 022, and the anomaly detection server 023 extracts a main operation sequence based on the stored object operation data, so as to perform anomaly detection. The body operation sequence may be a click operation sequence of the target traffic object.
In particular, the server 02 may be further configured to provide a model training service for the initial word vector generation model to obtain the initial word vector generation model, and may be further configured to store a central word feature matrix, a background word feature matrix, a subject phrase, a subject feature set, and the like.
Further, it should be understood that fig. 1 and 2 illustrate only an application environment of an abnormal body detection method, and the application environment may include more or fewer nodes, and the present application is not limited herein.
The application environment, or the terminal 01 and the server 02 in the application environment, according to the embodiments of the present application may be a distributed system formed by connecting a client, a plurality of nodes (any form of computing device in an access network, such as a server, a user terminal) through a network communication. The distributed system may be a blockchain system that may provide the anomaly subject detection service, the data storage service, and the like described above.
The following describes an abnormal subject detection method based on the above application environment, and the embodiments of the present application can be applied to various scenes, including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, driving assistance, and the like. Referring to fig. 3, fig. 3 is a flow chart of a method for detecting an abnormal body according to an embodiment of the present application, and the present specification provides method operation steps according to an embodiment or the flow chart, but may include more or less operation steps based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). Specifically, as shown in FIG. 3, the method may include the following steps S201-S205.
S201: and acquiring main body operation sequences corresponding to the plurality of target flow objects in a preset period.
The main body operation sequence consists of main body words of associated flow bodies corresponding to the target flow objects, wherein the associated flow bodies are flow bodies with preset interactive operation between the associated flow bodies and the target flow objects in a preset period. Specifically, the body operation sequence may characterize an operation trajectory of the target flow object for the associated flow body within a preset period of time. Each associated flow body corresponds to a body word, the body words of different associated flow bodies are different from each other, and the body word can be a flow body identifier of the associated flow body, such as a body registration name or a body registration ID of the associated flow body, or can be a text word obtained by text processing of the flow body identifier.
The preset period may be within a historical time period from a preset time, the historical time period may be, for example, 1 day, 7 days, 15 days, 1 month, or the like, and the preset period may be, for example, within a week before 00:00 a week, or within 1 month before 1:00 a month.
Specifically, the flow body may refer to a carrier for providing the object flow, for example, media, a website, a program, etc., and, for example, in a popularization platform, the flow body may be a media number with a certain vermicelli quantity. In the information promotion scene, the flow main body can acquire information promotion of a promotion main body to be divided into more, and if the click rate of promotion information is higher under the same exposure, the promotion is divided into more. The promotion main means a main body or a service provider paying to put in promotion information, the information promotion operation paid by the promotion main body is effective operation, but not cheating operation, for example, the click operation of the paid promotion information is effective click of a real flow object, but not cheating click. The cheating operation refers to operations such as brushing exposure, brushing clicking, brushing conversion and the like of a flow object for some malicious purpose in links such as popularization information exposure, clicking, effect conversion and the like, and the malicious operation with unreal intention is called as the cheating operation. The cheating operation not only damages the popularization main benefit, but also seriously affects the popularization effect, so that the cheating operation can stop or abandon the operation, and even the public switch risk is caused.
In an actual scene, the terminal can display a main body interface of the flow main body, a promotion position can be set in the main body interface, the promotion position refers to a media position identifier of promotion information delivery, for example, a plurality of position promotion positions, such as a top promotion position, a text promotion position and a bottom promotion position, can be set in the small program according to different positions, such as a banner promotion position, an excitation promotion position, a screen inserting promotion position and the like. Referring to fig. 4, the bottom promotion position of the media article, the banner promotion position and the screen insertion promotion position in the applet are shown in order from left to right in fig. 4. Specifically, a promotion control can be set on a promotion position, after a preset operation aiming at the promotion control is received, a corresponding target promotion page is displayed, the preset operation can include, but is not limited to, clicking and the like, and target promotion data such as promotion information, promotion conversion control and the like can be displayed on the target promotion page.
Specifically, the flow object may be an object that receives the popularization information exposure of the flow body and performs a preset operation on the flow body, that is, an object that generates a flow contribution, and the preset operation may include, but is not limited to, performing a click operation on a popularization location or a target popularization page of the flow body. Accordingly, the preset interactive operation may include, but is not limited to, submitting a preset operation to the flow body by the flow object, and pushing the target popularization data by the flow body in response to the preset operation. For example, the preset period is 7 days of history before the current time, the associated flow body is a flow body which receives clicking operation of the target flow object within 7 days of history, and target popularization information is pushed to the target flow object in response to the clicking operation.
In practical application, a promotion database may be set, when the flow object performs preset operation on the flow body, the terminal sends corresponding object operation data to the promotion database, where the object operation data may include, but is not limited to, operation action information, a body word associated with the flow body, a flow object identifier, and operation time information, the operation action information includes an operation action type, such as a click operation or a conversion operation, and the body word associated with the flow body may be a flow body identifier associated with the flow body. Correspondingly, object operation data of a plurality of target flow objects in a promotion database in a preset period are obtained, and a main body operation sequence is generated according to the operation data. The plurality of target flow objects may be all flow objects in the promotion database, for which a preset operation is submitted for the flow body in a preset period.
In practical applications, please refer to fig. 5, S201 may specifically include the following steps S2011-S2012.
S2011: object operation data of each of a plurality of target flow objects in a preset period are acquired, wherein the object operation data comprises a flow object identifier, a body word associated with a flow body and operation time information of preset interactive operation.
S2012: and combining the subject words according to the flow object identification and the operation time information to obtain subject operation sequences corresponding to the plurality of target flow objects.
In a specific embodiment, object operation data of object granularity in a preset period is extracted from a promotion database, and a main body operation sequence of the object granularity is obtained based on the object operation data. Based on the flow object identification and the operation time information, the subject words corresponding to each of the plurality of target flow objects are respectively sequenced and combined to obtain initial sequences corresponding to the plurality of target flow objects. Specifically, according to the operation time information, the subject words of the associated flow subjects subjected to the object granularity operation (such as clicking) in a preset period are ordered according to the time sequence, so as to obtain an initial sequence. For example, if the traffic body identifications of the associated traffic bodies clicked by the target traffic object x within 7 days are app id_1, app id_2, app id_3 … …, app id_n, respectively, the initial sequence may be (app id_1/app id_2/app id_3 … …/app id_n), where app id_n is the body word. And then, carrying out de-duplication treatment on the continuous identical subject words in the initial sequence to obtain a subject operation sequence corresponding to the plurality of target flow objects. The sequence obtained by de-duplicating the above sequence is (app id_1/app id_2/app id_3 … …/app id_n).
Therefore, the main body operation sequence contains time sequence information and main body information of operation, can reflect the operation track of the flow object, and is beneficial to improving the information content of main body characteristics. And the operation expression meanings under the continuous identical flow main bodies are identical, the data processing capacity can be simplified by removing the continuous identical main words, the effective extraction of the main word groups is facilitated, the continuous identical main words are prevented from being cut off by a window during the sliding extraction, and the semantic loss is further avoided.
Specifically, after the de-duplication processing, a sequence with a length smaller than a preset length of a main body operation sequence removal sequence of each target flow object is obtained, in some cases, the obtained main body operation sequence may be further subjected to screening processing, and the sequence with the length smaller than the preset length is removed, so as to exclude randomness caused by sparse operation objects, and further obtain main body operation sequences corresponding to a plurality of target flow objects. The preset length may be, for example, 3, i.e., 3 subject words.
In the scenario of popularizing and cheating, the same batch of flow objects is often used for cheating operation, or the same operation mode is used for cheating, where the operation mode refers to that the operation tracks of the flow objects on different flow bodies are similar, such as similar click tracks. If the flow main body accessed by the flow object x sequentially is a- > B- > C, and the flow main body a- > D- > C accessed by the user y sequentially (where the flow main body a may be used for issuing a cheating task), although the flow main bodies B and D have low overlap ratio to the flow object group corresponding to each other, the operation modes of the users x and y are similar, and the users x and y all access the flow main body a first and then access the flow main body C, and the flow main bodies B and C may still be the same type of flow main bodies, such as belong to the same abnormal group. Therefore, the main body operation sequence contains object information and operation mode information, and vector representation of the flow main body is constructed based on the main body operation sequence, so that semantic accuracy and information quantity of flow main body characteristics can be improved, and further reliability of abnormal main body detection is improved.
S203: and constructing a main body characteristic set based on the main body operation sequences corresponding to the plurality of target flow objects.
The main body feature set comprises flow body features of associated flow bodies corresponding to the plurality of target flow objects, and the flow body features represent drainage object information and operation mode information of the associated flow bodies. Specifically, the flow body feature may be a dense vector with a preset dimension M, which may be, for example, 64-dimensional or 200-dimensional, etc. And carrying out feature embedding processing on each subject word in the subject operation sequence, and optimizing and updating the obtained subject word embedding features based on the subject operation sequence and the initial word vector generation model to obtain the flow subject features capable of representing the drainage object information and the operation mode information of the associated flow subject.
S205: and carrying out abnormality detection on the related flow main body according to the flow main body characteristics to obtain an abnormal main body detection result.
Specifically, after the flow body characteristics of each associated flow body are obtained, abnormality detection may be performed based on the flow body characteristics. In some embodiments, the anomaly detection may be implemented by clustering, and accordingly, referring to fig. 6, S205 may specifically include steps S2051-S2052 described below.
S2051: and clustering the flow main body features in the main body feature set to obtain a main body feature cluster.
S2052: and when any associated traffic body corresponding to the body feature cluster has an abnormal label, determining the body feature cluster as an abnormal body group.
Specifically, an abnormal subject group may refer to a group of abnormal subjects that employ a similar cheating pattern or that utilize a batch of traffic objects for generalized cheating. The preset cluster analysis algorithm used in the clustering process may include, but is not limited to, K-means clustering (K-means), mean shift clustering, density-based clustering (DBSCAN) or maximum Expectation (EM) clustering, etc. The flow main body features in the main body feature set are divided into a plurality of main body feature cluster groups through clustering, and abnormal cluster groups are positioned based on the associated flow main body with the abnormal labels, and the positioned abnormal cluster groups are further determined to be the abnormal main body groups. The abnormal label can be manually marked, or can be determined based on attribute data and historical operation data of the associated flow body, such as feature extraction and classification identification of the attribute data and the historical operation data to obtain a body label of the associated flow body, wherein the body label comprises a normal label and an abnormal label. Therefore, the flow main body group identification and the abnormality detection are carried out through clustering, the abnormal main body is accurately positioned, meanwhile, the group division of the abnormal main body is realized, the multidimensional positioning information of the abnormal main body is obtained, and the optimization of the task effect of the downstream task is facilitated.
In other embodiments, the anomaly detection may be performed based on the feature similarity, and accordingly, referring to fig. 7, S205 may specifically include steps S2053-S2055 described below.
S2053: and acquiring seed abnormal bodies in the associated flow bodies corresponding to the plurality of target flow objects.
Specifically, the seed abnormal body may be obtained by manually marking the associated flow body, or may be obtained by storing the associated flow body in a blacklist of the body in advance, correspondingly, matching the associated flow body corresponding to the plurality of target flow objects with the blacklist of the body, and determining the matched associated flow body as the seed abnormal body. Specifically, the number of the seed abnormal bodies may be set based on actual requirements, for example, may be 5.
S2054: and determining the seed main body characteristics of the seed abnormal main body in the main body characteristic set.
It can be understood that the main feature set is constructed with the association relationship between the flow main feature and the main words, and the main word matching is performed in the main feature set based on the main words of the seed abnormal main body, so as to obtain the seed main feature.
S2055: and screening target subject features matched with the seed subject features from the subject feature set.
In a specific embodiment, similarity calculation is performed on the seed main body characteristics and flow main body characteristics in the main body characteristic set, and a characteristic similarity result is obtained. The similarity calculation may be, but not limited to, a similarity algorithm based on distance, cosine of included angle, correlation coefficient, etc., such as euclidean distance algorithm, cosine similarity algorithm, pearson correlation coefficient algorithm, etc. Then, a target subject feature is determined from the subject feature set based on the feature similarity result. Specifically, the feature similarity result includes feature similarities between the seed body feature and each of the other flow body features in the body feature set. In some cases, sorting the flow body features based on the feature similarity, and taking top flow body features with highest feature similarity as target body features, where K may be 5 as an example; in other cases, a similarity threshold may be set, and a flow subject feature with a feature similarity higher than the similarity threshold may be used as the target subject feature.
In one embodiment, an angle Cosine (Cosine) similarity algorithm is used to calculate the feature similarity between two flow body features, where A and B represent two different flow body features, and A is as follows i And B i The components of the flow body feature a and the flow body feature B, respectively.
S2056: and determining the associated flow main body corresponding to the target main body characteristic as a target abnormal main body.
The flow main body characteristics can effectively express the flow object information and the operation mode information of the related flow main body, and the higher the characteristic similarity is, the more similar the flow object information and the operation mode information among the flow main body characteristics are, the more the flow main body characteristics can indicate that the flow main body belongs to the same type main body. And the characteristic positioning of the abnormal main body is carried out through a small amount of seed abnormal main bodies, so that the rapid diffusion and excavation of the abnormal main bodies are realized. It can be appreciated that the more similar the context of two associated traffic bodies in each body operation sequence, the more similar the traffic object information and operation mode information between the associated traffic bodies, the higher the likelihood that the two associated traffic bodies belong to the same class of bodies.
It can be understood that the above feature similarity calculation can be repeatedly performed based on the determined target main body features to obtain feature similarities between the target main body features and other flow main body features in the main body feature set, and further determine flow main body features matched with the target main body features, so that the flow main body features are reciprocally moved, and the other similar target abnormal main bodies are located by single or several seed abnormal main body diffusion, so that the depth diffusion and excavation of the abnormal main bodies are realized, and the coverage and detection efficiency of the abnormal detection are effectively improved.
Specifically, the abnormal body group can be mined based on the feature similarity, for example, the feature similarity between the flow body features of the abnormal body a and other flow body features is calculated, a certain number of flow body features with the highest feature similarity are screened out, and the associated flow body corresponding to the screened flow body features and the abnormal body a are determined to be the same abnormal body group. After the incremental main body characteristics of the incremental flow main body are obtained, the characteristic similarity between the incremental main body characteristics and the main body characteristics of the known abnormal main body can be calculated, and the abnormal main body group to which the known abnormal main body with the highest characteristic similarity belongs is determined as the abnormal main body group to which the incremental flow main body belongs.
In some cases, after determining the target abnormal body, the abnormal detection result may be visually displayed, for example, a flow body identifier (APPID), feature similarity and attribute information of TOPK abnormal bodies are displayed through a table, please refer to the following table one, where the attribute information may include, but is not limited to, a body nickname, a body registration name, a settlement body, a body type, a registration time, and the like, and the table number 0 is a seed abnormal body.
List one
In other cases, the anomaly detection results may be generated and validated based on feature dimensionality reduction. Specifically, the dimension reduction process may be performed on each flow main feature in the main feature set, for example, reducing the dimension of the flow main feature in 64 dimensions to two-dimensional feature vectors, further generating a visualized vector distribution map based on the two-dimensional feature vectors corresponding to the main feature set, and generating an anomaly detection result based on the vector distribution result in the visualized vector distribution map, so as to determine the target anomaly main body. In particular, feature dimension reduction methods may include, but are not limited to, t-sne (t-distributed stochastic neighbor embedding, t-distributed random neighborhood embedding), and the like.
Referring to fig. 8, fig. 8 shows a visual vector distribution diagram in an embodiment, in which two-dimensional feature vectors of a seed abnormal body exist in a vector point group on the right side in the figure, and further, an associated flow body in the vector point group is determined to be a target abnormal body, in which a known normal flow body exists in a vector point group on the left side in the figure, and then, the associated flow body in the vector point group on the left side is determined to be a normal flow body. It can be seen that the two-dimensional eigenvectors of the abnormal body are clearly distinguished from the two-dimensional eigenvectors of the normal flow body. In addition, the target abnormal body determined by the similarity calculation mode based on the seed abnormal body is in high coincidence with the associated flow body in the right side direction point group in the figure, and the accuracy of the abnormal body detection method is verified on the side face. Therefore, based on the method, the abnormal aggregated flow main body can be rapidly positioned and determined so as to accurately sense the abnormality.
In addition, the normal flow main body can be further divided into groups based on the two-dimensional feature vector and the feature similarity result. Specifically, a visualized vector distribution diagram is generated based on two-dimensional feature vectors of the normal flow main body, the normal flow main body is divided into groups based on vector distribution and feature similarity results in the distribution diagram, and a plurality of normal main body groups are obtained, wherein the operation modes of the associated flow main bodies in each normal main body group are similar. Referring to fig. 9, a visual vector distribution of two-dimensional eigenvectors of normal flow principals provided by one embodiment is shown, with the arrows in fig. 9 marking identified groups of normal principals.
In summary, the technical scheme of the application constructs vector representation of the flow body through the body operation sequence of the flow object to express the object information and the operation mode information of the flow body; and locating the abnormal body based on the vector representation, and mining the abnormal body group and the diffusion of the abnormal body group, thereby realizing popularization and anti-cheating. According to the scheme, information dependence on the flow main body can be eliminated, the flow main group with inconsistent data information but high flow object similarity or similar cheating operation mode is identified, abnormal group judgment and abnormal main body diffusion are realized, and the accuracy, coverage rate and positioning efficiency of abnormal detection are effectively improved.
Based on some or all of the above embodiments, in the embodiment of the present application, please refer to fig. 10, the main feature set may be obtained based on the following steps S301 to S309.
S301: and taking each subject word in the subject operation sequence as a central word, and extracting the context of the subject operation sequence to obtain target subject phrases corresponding to a plurality of target flow objects, wherein the target subject phrases comprise the central subject word and background subject words adjacent to the central subject word.
Specifically, the main body operation sequence is used as a sentence, each main body word is used as a central word, and the background word of the context is extracted from the main body operation sequence based on a preset extraction window. If the preset extraction window is 2, extracting the first 2 main words and the last 2 main words of each central word in the main body operation sequence as background words, and combining each extracted background word with the central word to obtain a plurality of target main body phrases. For example, the subject operation sequence is "a/C/D/E/P/K", and a central word is used to extract the background words C and D, thereby obtaining the target subject phrase [ a, C ]/[ a, D ], and a central word is used to extract the above background words a and C, and the following background words E and P, thereby obtaining the target subject phrase [ D, a ]/[ D, C ]/[ D, E ]/[ D, P ].
S303: and constructing a subject word set according to the subject operation sequences corresponding to the plurality of target flow objects.
Specifically, the subject word set includes subject words of the associated flow subjects corresponding to the plurality of target flow objects. In some embodiments, the subject vocabulary may be stored in the form of a subject vocabulary.
S305: and constructing a first word feature matrix and a second word feature matrix of the initial word vector generation model based on the subject word set.
Specifically, the initial word vector generation model may include an input layer, a hidden layer and an output layer, where the first word feature matrix is a weight matrix between the input layer and the hidden layer, the second word feature matrix is a weight matrix between the hidden layer and the output layer, a dimension of the first word feature matrix is n×m, and a dimension of the first word feature matrix is m×n, where M is a dimension of a flow main feature, and N is a total number of main words in the main word set.
In a specific embodiment, S305 may specifically include steps S3051-S3054 described below.
S3051: and respectively performing first feature coding processing and second feature coding processing on the main words in the main word set to obtain a first coding feature and a second coding feature.
Specifically, the first feature encoding process and the second feature encoding process may be performing random feature assignment on the subject word, and encode the subject word into a random feature value, where the random feature value may be a floating point value, and the random feature values of different subject words are different from each other. After the first feature coding processing and the second feature coding processing are performed on the main words in the main word set, each main word obtains two coding features.
S3052: and respectively carrying out feature mapping processing on the first coding feature and the second coding feature to obtain an initial central word feature corresponding to the first coding feature and an initial background word feature corresponding to the second coding feature, wherein the initial central word feature and the initial background word feature are dense vectors with preset dimensions.
Specifically, the feature mapping process may be to map the random feature value of the subject word to a dense vector of a preset dimension M.
S3053: a first word feature matrix is constructed based on the initial center word feature.
S3054: and constructing a second word feature matrix based on the initial background word features.
The number of lines of the first word feature matrix is the total number of main words of the main word set, the number of columns of the second word feature matrix is a preset dimension, the number of lines of the second word feature matrix is a preset dimension, and the number of columns of the second word feature matrix is the total number of multi-main words.
Specifically, the first word feature matrix and the second word feature matrix are update targets in the model training process of initial word vector generation.
S307: and carrying out context prediction training on the initial word vector generation model according to the central subject word and the background subject word so as to update the first word feature matrix and the second word feature matrix, and obtaining a target word vector generation model comprising the central word feature matrix and the background word feature matrix.
Specifically, the central word feature matrix includes a target central word feature of each subject word in the subject word set, and the background word feature matrix includes a target background word feature of each subject word in the subject word set. Taking a first coding feature of a central subject word in the subject word group as input, taking a second coding feature of a background word extracted by a preset extraction window as expected output, and performing unsupervised training on an initial word vector generation model.
In practical applications, S307 may specifically include steps S3071-S3074 described below.
S3071: and taking the first coding feature of the central subject word as the input of the initial word vector generation model, and searching initial central word features corresponding to the first coding feature in the first word feature matrix.
S3072: and performing feature cross processing on the initial central word feature and the second word feature matrix corresponding to the first coding feature to obtain a cross feature value set.
S3073: model loss is determined based on the set of intersection eigenvalues.
S3074: training an initial word vector generation model based on model loss to update initial central word characteristics in the first word characteristic matrix and initial background word characteristics in the second word characteristic matrix until training ending conditions are met, and obtaining a target word vector generation model.
Specifically, the first coding feature of the central subject word is multiplied by the first word feature matrix through the input layer, the multiplication result is input into the hidden layer, and then the initial central word feature of the central subject word is output through the hidden layer. Or, the first word feature matrix uses the first coding feature as the index of the initial central word feature, and the second word feature matrix uses the second coding feature as the index of the initial background word feature, so that the initial central word feature can be searched through the first coding feature association to input the hidden layer. Performing feature cross processing on the initial central word feature output by the hidden layer and the second word feature matrix, wherein the feature cross processing can be that inner products are formed on the initial central word feature and the initial background word feature of other main words in the second word feature matrix, and the cross feature value set comprises N feature values between the initial central word feature and each initial background word feature; the N feature values are mapped through the softmax layer to obtain the prediction probabilities of N main body words, the prediction probabilities represent the probabilities that the corresponding main body word is a certain target background main body word in the main body word group to which the input center main body word belongs, and the output layer is used for outputting the predicted background center word based on the prediction probabilities.
Further, determining model loss according to the prediction probability and a preset loss function, and updating the first word feature matrix and the second word feature matrix according to the model loss and a gradient descent method; the updating process here is specifically to update each initial center word feature in the first word feature matrix and each initial background word feature in the second word feature matrix. Specifically, the model training learning aims to increase the prediction probability of the target background subject word and reduce the prediction probability of the non-relevant subject word.
Further, repeating the steps S3071-S3073, if the obtained model loss is lower than a preset loss or the iteration number reaches a preset iteration number, determining that a training end condition is met, determining an initial word vector generation model obtained by the current iteration as a target word vector generation model, determining an updated first word feature matrix obtained by the current iteration as a central word feature matrix, and determining an updated second feature matrix as a background word feature matrix; if the model loss is higher than the preset loss or the iteration number does not reach the preset iteration number, repeating S3071-S3073 until the training ending condition is met.
In one embodiment, the initial word vector generation model may be a skip-gram model, please refer to fig. 11, which shows a model structure diagram of the initial word vector generation model, where W is a central word feature matrix, W' is a background word feature matrix, and c ki A second coded feature of the background subject word of the same group as the central subject word i predicted for the model.
S309: and constructing a main body feature set according to the central word feature matrix of the target word vector generation model.
Specifically, after the target word vector generation model is obtained, the target central word feature of each main word in the central word feature matrix is determined to be the flow main feature of the associated flow main body corresponding to the main word, and then the main feature set is obtained. It can be understood that the initial central word feature and the initial background word feature obtained by initializing through the feature mapping process do not contain effective semantic information, the initial word vector generation model is iteratively trained based on the main body operation sequence of the object granularity containing the operation mode information, and the word vectors of the main body words are optimized so that the word vectors of the main body words contain the operation mode and the semantic information of the object information, so that the diffusion and the group mining of abnormal main bodies are realized.
In some embodiments, the training target word vector generation model may be updated with a certain period of time as an update granularity, so as to obtain an updated main feature set, and a certain period of time may be 1 day, for example, the main operation sequence of the previous 7 days is acquired every day, so that the main feature set corresponding to the current historical 7 days is constructed based on the method.
In other embodiments, the training of updating the target word vector generation model may be performed based on incremental updating, and accordingly, S203 may further include the following steps.
S401: and acquiring an increment main body operation sequence and an increment main body phrase corresponding to the increment main body operation sequence.
S403: and determining a new subject word in the increment subject operation sequence, wherein the new subject word is a subject word which does not belong to the subject word set.
S405: and screening out reference flow bodies from the associated flow bodies corresponding to the target flow objects, wherein the reference flow bodies are flow bodies with similar attributes among incremental flow bodies corresponding to the newly added body words.
Specifically, the incremental subject operation sequence may be periodically acquired, for example, the subject feature sequence may be acquired once a week, and a corresponding subject phrase may be obtained. And matching each increment subject word in the increment subject operation sequence with the subject word in the subject word set, if the increment subject word exists in the subject word set, still using the target center word characteristic of the increment subject word in the current center word characteristic matrix, if the increment subject word does not exist in the subject word set, determining the increment subject word as a newly added subject word, acquiring attribute data, such as the subject category, the subject operation data and the like, of the associated flow subject corresponding to the newly added subject word, and screening a certain number of reference flow subjects based on the attribute data.
S407: an initial incremental body feature of the incremental flow body is generated from the flow body features of the reference flow body.
Specifically, the flow body features of a certain number of reference flow bodies may be subjected to addition and averaging processing to obtain initial incremental body features, where the addition and averaging processing may be simple addition and averaging or weighted addition and averaging, etc.
S409: and updating the central word feature matrix and the background word feature matrix of the target word vector generation model according to the initial increment main body features.
In some cases, the initial incremental subject feature of the newly added subject word is added to the center word feature matrix and the background word feature matrix. In other cases, initial background word features of the incremental flow body are generated according to the background word features of the reference flow body, so that initial incremental body features of the newly added body words are added in the central word feature matrix, and initial background word features of the newly added body words are added in the background word feature matrix.
S411: and performing iterative training of context prediction on the target word vector generation model based on the increment subject phrase and the target subject phrase to obtain an updated word vector generation model.
S413: and constructing an updated main body feature set based on the central word feature matrix of the updated target model.
It is to be understood that S411 and S413 are similar to the implementation of S307 and S309, and will not be described herein. The center word feature matrix of the updated word vector generation model comprises target center word features of the increment subject words, namely the updated subject feature set comprises flow subject features of the associated flow subject corresponding to the increment subject words. In this way, the model and the main feature set are updated regularly in an increment mode, the confidence of the flow main feature and the abnormal detection result is improved, the initial increment main feature of the increment flow main is generated based on the existing flow main feature, updating training is further carried out, the initial word vector has certain effective semantic information, the model convergence speed is improved, and the pre-confidence of the flow main feature of the increment flow main is improved.
In summary, the method constructs the main body phrase based on the main body operation sequence, trains the main body vector representation by utilizing the preset word vector generation model, characterizes the feature similarity among the main bodies, and finally digs the abnormal group and diffuses the abnormal main body based on the main body vector representation and the feature similarity, thereby improving the coverage rate of the cheating main body, perceiving the abnormal main body in the group dimension, reducing the potential cheating risk, and maintaining the main benefit and the ecological popularization.
The embodiment of the application also provides an abnormal body detection device 600, as shown in fig. 12, fig. 12 shows a schematic structural diagram of the abnormal body detection device provided by the embodiment of the application, and the device may include the following modules.
The operation sequence acquisition module 10: the method comprises the steps of acquiring a main body operation sequence corresponding to a plurality of target flow objects in a preset period, wherein the main body operation sequence consists of main body words of associated flow bodies corresponding to the target flow objects, and the associated flow bodies are flow bodies with preset interactive operation between the associated flow bodies and the target flow objects in the preset period;
the main feature set construction module 20: the flow main body feature set is used for constructing a main body feature set based on main body operation sequences corresponding to the plurality of target flow objects, and the main body feature set comprises flow main body features of the associated flow main bodies corresponding to the plurality of target flow objects;
abnormality detection module 30: the method is used for carrying out abnormality detection on the related flow main body according to the flow main body characteristics to obtain an abnormal main body detection result.
In some embodiments, the subject feature set construction module 20 may include:
context extraction submodule: the method comprises the steps of using each subject word in a subject operation sequence as a central word, extracting contexts of the subject operation sequence to obtain target subject phrases corresponding to a plurality of target flow objects, wherein the target subject phrases comprise the central subject word and background subject words adjacent to the central subject word;
The main word set construction submodule: the method comprises the steps of constructing a main body word set according to main body operation sequences corresponding to a plurality of target flow objects;
and (3) a feature matrix construction submodule: a first word feature matrix and a second word feature matrix for constructing an initial word vector generation model based on the subject vocabulary;
prediction training sub-module: the method comprises the steps of performing context prediction training on an initial word vector generation model according to a central subject word and a background subject word to update a first word feature matrix and a second word feature matrix to obtain a target word vector generation model comprising the central word feature matrix and the background word feature matrix;
the main body feature set generation sub-module: and the central word feature matrix is used for generating a model according to the target word vector to construct a main body feature set.
In some embodiments, the feature matrix construction sub-module may include:
and a feature encoding unit: the method comprises the steps of respectively carrying out first feature coding processing and second feature coding processing on main words in a main word set to obtain first coding features and second coding features;
feature mapping unit: the method comprises the steps of performing feature mapping processing on a first coding feature and a second coding feature respectively to obtain an initial central word feature corresponding to the first coding feature and an initial background word feature corresponding to the second coding feature, wherein the initial central word feature and the initial background word feature are dense vectors with preset dimensions;
The first word feature matrix construction unit: the method comprises the steps of constructing a first word feature matrix based on initial center word features;
the second word feature matrix construction unit: the method comprises the steps of constructing a second word feature matrix based on initial background word features;
the number of lines of the first word feature matrix is the total number of main words of the main word set, the number of columns of the second word feature matrix is a preset dimension, the number of lines of the second word feature matrix is a preset dimension, and the number of columns of the second word feature matrix is the total number of multi-main words.
In some embodiments, the predictive training sub-module may include:
the central word characteristic searching unit: the method comprises the steps of using first coding features of a central subject word as input of an initial word vector generation model, and searching initial central word features corresponding to the first coding features in a first word feature matrix;
feature crossing unit: the method comprises the steps of performing feature cross processing on initial central word features corresponding to first coding features and second word feature matrixes to obtain a cross feature value set;
model loss determination unit: for determining model loss based on the set of intersection eigenvalues;
model training unit: and training the initial word vector generation model based on model loss to update the initial central word characteristics in the first word characteristic matrix and the initial background word characteristics in the second word characteristic matrix until the training ending condition is met, so as to obtain the target word vector generation model.
In some embodiments, the subject feature set construction module 20 may further include:
an increment sequence acquisition sub-module: the method comprises the steps of acquiring an increment main body operation sequence and an increment main body phrase corresponding to the increment main body operation sequence;
newly added subject word determination submodule: the method comprises the steps of determining a new subject word in an increment subject operation sequence, wherein the new subject word is a subject word which does not belong to a subject word set;
reference traffic body screening submodule: the method comprises the steps of screening a reference flow body from associated flow bodies corresponding to a plurality of target flow objects, wherein the reference flow body is a flow body with similar attribute among incremental flow bodies corresponding to newly added body words;
an incremental word feature generation sub-module: generating initial incremental body features of the incremental flow body from the flow body features of the reference flow body;
and a feature matrix updating sub-module: the method comprises the steps of updating a central word feature matrix and a background word feature matrix of a target word vector generation model according to initial increment main body features;
updating a training sub-module: the method comprises the steps of performing iterative training of context prediction on a target word vector generation model based on an increment subject phrase and a target subject phrase to obtain an updated word vector generation model;
The main body feature set updating sub-module: and constructing an updated subject feature set based on the center word feature matrix of the updated target model.
In some embodiments, the operation sequence acquisition module 10 may include:
an operation data acquisition sub-module: the method comprises the steps of acquiring object operation data of each of a plurality of target flow objects in a preset period, wherein the object operation data comprises a flow object identifier, a main word associated with a flow main body and operation time information of preset interactive operation;
the main body phrase combining sub-module: and the main body operation sequences corresponding to the plurality of target flow objects are obtained by carrying out combination processing on the main body words according to the flow object identifiers and the operation time information.
In some embodiments, the body phrase sub-module may include:
a sequencing and combining unit: based on the flow object identification and the operation time information, the method is used for respectively sequencing and combining the subject words corresponding to each of the plurality of target flow objects to obtain initial sequences corresponding to the plurality of target flow objects;
a deduplication processing unit: and the method is used for carrying out de-duplication processing on the continuous identical subject words in the initial sequence to obtain subject operation sequences corresponding to the plurality of target flow objects.
In some embodiments, anomaly detection module 30 may include:
clustering processing submodule: the flow main feature clustering method comprises the steps of clustering flow main features in a main feature set to obtain a main feature cluster;
the first body determination submodule: and the method is used for determining the main body characteristic cluster as an abnormal main body group when any associated flow main body corresponding to the main body characteristic cluster has an abnormal label.
In some embodiments, anomaly detection module 30 may include:
seed main body acquisition submodule: the seed exception main body is used for acquiring seed exception main bodies in the associated flow main bodies corresponding to the plurality of target flow objects;
seed feature determination submodule: the seed main body characteristics are used for determining the seed abnormal main body in the main body characteristic set;
target feature screening submodule: the method comprises the steps of screening target subject features matched with seed subject features from a subject feature set;
the second body determination submodule: and the associated flow body corresponding to the target body characteristic is determined to be the target abnormal body.
In some embodiments, the target feature screening sub-module may include:
similarity calculation unit: the method comprises the steps of performing similarity calculation on seed main body characteristics and flow main body characteristics in a main body characteristic set to obtain a characteristic similarity result;
Target feature determination unit: for determining target subject features from the set of subject features based on the feature similarity results.
It should be noted that the above apparatus embodiments and method embodiments are based on the same implementation manner.
The embodiment of the application provides an abnormal body detection device, which can be a terminal or a server, and comprises a processor and a memory, wherein at least one instruction or at least one section of program is stored in the memory, and the at least one instruction or the at least one section of program is loaded and executed by the processor to realize the abnormal body detection method provided by the embodiment of the method.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and abnormal body detection by running the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.
The method embodiment provided by the embodiment of the application can be executed in electronic equipment such as a mobile terminal, a computer terminal, a server or similar computing devices. Fig. 13 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present application. As shown in fig. 13, the electronic device 900 may vary considerably in configuration or performance, and may include one or more central processing units (Central Processing Units, CPU) 910 (the processor 910 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 930 for storing data, one or more storage media 920 (e.g., one or more mass storage devices) for storing applications 923 or data 922. Wherein memory 930 and storage medium 920 may be transitory or persistent storage. The program stored on the storage medium 920 may include one or more modules, each of which may include a series of instruction operations in the electronic device. Further, the central processor 910 may be configured to communicate with a storage medium 920 and execute a series of instruction operations in the storage medium 920 on the electronic device 900 . The electronic device 900 may also include one or more power supplies 960, one or more wired or wireless network interfaces 950, one or more input/output interfaces 940, and/or one or more operating systems 921, such as Windows Server TM ,Mac OS X TM ,Unix TM LinuxTM, freeBSDTM, etc.
The input-output interface 940 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communications provider of the electronic device 900. In one example, the input-output interface 940 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices through a base station to communicate with the internet. In one example, the input/output interface 940 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 13 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, electronic device 900 may also include more or fewer components than shown in FIG. 13, or have a different configuration than shown in FIG. 13.
Embodiments of the present application also provide a computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program related to implementing an abnormal body detection method in a method embodiment, where the at least one instruction or the at least one program is loaded and executed by the processor to implement the abnormal body detection method provided in the method embodiment.
Alternatively, in this embodiment, the storage medium may be located in at least one network server among a plurality of network servers of the computer network. Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods provided in the various alternative implementations described above.
As can be seen from the embodiments of the abnormal body detection method, apparatus, device, server, terminal, storage medium and program product provided by the present application, the technical solution of the present application is to obtain body operation sequences corresponding to a plurality of target flow objects in a preset period, where the body operation sequences are composed of body words of associated flow bodies corresponding to the target flow objects, and the associated flow bodies are flow bodies having preset interaction operations with the target flow objects in the preset period; constructing a main body feature set based on main body operation sequences corresponding to the plurality of target flow objects, wherein the main body feature set comprises flow main body features of the associated flow main bodies corresponding to the plurality of target flow objects; and then carrying out abnormality detection on the related flow main body according to the flow main body characteristics to obtain an abnormal main body detection result. Thus, a main body operation sequence is obtained based on the operation data of the target flow object, the flow main body characteristics capable of representing the flow main body are further constructed, the abnormal detection is carried out independently of the information provided by the flow main body, the reliability and the accuracy of an abnormal detection result are improved, the abnormal detection is carried out based on the flow main body characteristics, the calculation complexity is reduced, the detection efficiency is improved, and the resource occupation is reduced.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for apparatus, devices and storage medium embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.
It will be appreciated by those of ordinary skill in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program indicating that the relevant hardware is implemented, and the program may be stored in a computer readable storage medium, where the storage medium may be a read only memory, a magnetic disk or optical disk, etc.
The foregoing is only illustrative of the present application and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., within the spirit and principles of the present application.

Claims (14)

1. A method of detecting an abnormal subject, the method comprising:
acquiring main body operation sequences corresponding to a plurality of target flow objects in a preset period, wherein the main body operation sequences are formed by main body words of associated flow bodies corresponding to the target flow objects, and the associated flow bodies are flow bodies with preset interactive operations between the associated flow bodies and the target flow objects in the preset period;
constructing a main body feature set based on main body operation sequences corresponding to the plurality of target flow objects, wherein the main body feature set comprises flow main body features of associated flow main bodies corresponding to the plurality of target flow objects;
and carrying out anomaly detection on the related flow main body according to the flow main body characteristics to obtain an anomaly main body detection result.
2. The method of claim 1, wherein constructing a subject feature set based on the subject sequence of operations corresponding to the plurality of target traffic objects comprises:
Taking each subject word in the subject operation sequence as a central word, and extracting the context of the subject operation sequence to obtain target subject phrases corresponding to the target flow objects, wherein the target subject phrases comprise a central subject word and background subject words adjacent to the central subject word;
constructing a main body word set according to main body operation sequences corresponding to the plurality of target flow objects;
constructing a first word feature matrix and a second word feature matrix of an initial word vector generation model based on the main word set;
performing context prediction training on the initial word vector generation model according to the central subject word and the background subject word to update the first word feature matrix and the second word feature matrix to obtain a target word vector generation model comprising a central word feature matrix and a background word feature matrix;
and constructing the main body feature set according to the central word feature matrix of the target word vector generation model.
3. The method of claim 2, wherein constructing a first word feature matrix and a second word feature matrix of an initial word vector generation model based on the subject vocabulary comprises:
performing first feature coding processing and second feature coding processing on the main words in the main word set respectively to obtain first coding features and second coding features;
Performing feature mapping processing on the first coding feature and the second coding feature to obtain an initial central word feature corresponding to the first coding feature and an initial background word feature corresponding to the second coding feature, wherein the initial central word feature and the initial background word feature are dense vectors with preset dimensions;
constructing the first word feature matrix based on the initial center word feature;
constructing the second word feature matrix based on the initial background word features;
the number of lines of the first word feature matrix is the total number of the main words of the main word set, the number of columns of the second word feature matrix is the preset dimension, the number of lines of the second word feature matrix is the preset dimension, and the number of columns of the second word feature matrix is more than the total number of the main words.
4. The method of claim 3, wherein the performing context prediction training on the initial word vector generation model based on the center subject word and the background subject word to update the first word feature matrix and the second word feature matrix, and obtaining the target word vector generation model including the center word feature matrix and the background word feature matrix comprises:
Taking the first coding feature of the central subject word as the input of the initial word vector generation model, and searching initial central word features corresponding to the first coding feature in the first word feature matrix;
performing feature cross processing on the initial central word feature corresponding to the first coding feature and the second word feature matrix to obtain a cross feature value set;
determining a model loss based on the set of intersection eigenvalues;
training the initial word vector generation model based on the model loss to update the initial central word characteristics in the first word characteristic matrix and the initial background word characteristics in the second word characteristic matrix until the training ending condition is met, and obtaining the target word vector generation model.
5. The method according to claim 2, wherein the method further comprises:
acquiring an increment main body operation sequence and an increment main body phrase corresponding to the increment main body operation sequence;
determining a new subject word in the increment subject operation sequence, wherein the new subject word is a subject word not belonging to the subject word set;
screening reference flow bodies from the associated flow bodies corresponding to the target flow objects, wherein the reference flow bodies are flow bodies with similar attributes among incremental flow bodies corresponding to the newly added body words;
Generating initial incremental body features of the incremental flow body from the flow body features of the reference flow body;
updating a central word feature matrix and a background word feature matrix of the target word vector generation model according to the initial increment main body features;
performing iterative training of context prediction on the target word vector generation model based on the increment subject phrase and the target subject phrase to obtain an updated word vector generation model;
and constructing an updated main body feature set based on the central word feature matrix of the updated target model.
6. The method according to any one of claims 1 to 5, wherein the acquiring the main body operation sequences corresponding to the plurality of target flow objects within the preset period includes:
acquiring object operation data of each of the plurality of target flow objects in the preset period, wherein the object operation data comprises a flow object identifier, a subject word associated with a flow main body and operation time information of the preset interactive operation;
and carrying out combination processing on the subject words according to the flow object identifiers and the operation time information to obtain subject operation sequences corresponding to the plurality of target flow objects.
7. The method of claim 6, wherein the combining the subject words according to the operation time information to obtain the subject operation sequences corresponding to the plurality of target flow objects comprises:
based on the flow object identification and the operation time information, respectively sequencing and combining the subject words corresponding to each of the plurality of target flow objects to obtain initial sequences corresponding to the plurality of target flow objects;
and performing de-duplication treatment on the continuous identical subject words in the initial sequence to obtain subject operation sequences corresponding to the plurality of target flow objects.
8. The method according to any one of claims 1-5, wherein the performing anomaly detection on the associated traffic body according to the traffic body characteristics, and obtaining an anomaly body detection result includes:
clustering the flow main body features in the main body feature set to obtain a main body feature cluster;
and determining the main feature cluster group as an abnormal main group when any associated flow main body corresponding to the main feature cluster group has an abnormal label.
9. The method according to any one of claims 1-5, wherein the performing anomaly detection on the associated traffic body according to the traffic body characteristics, and obtaining an anomaly body detection result includes:
Acquiring seed abnormal bodies in the associated flow bodies corresponding to the plurality of target flow objects;
determining seed main body characteristics of the seed abnormal main body in the main body characteristic set;
screening target subject features matched with the seed subject features from the subject feature set;
and determining the associated flow main body corresponding to the target main body characteristic as a target abnormal main body.
10. The method of claim 9, wherein the screening out target subject features from the set of subject features that match the seed subject features comprises:
performing similarity calculation on the seed main body characteristics and the flow main body characteristics in the main body characteristic set to obtain a characteristic similarity result;
and determining the target subject feature from the subject feature set based on the feature similarity result.
11. An abnormal subject detection apparatus, the apparatus comprising:
an operation sequence acquisition module: the method comprises the steps of acquiring a main body operation sequence corresponding to a plurality of target flow objects in a preset period, wherein the main body operation sequence consists of main body words of associated flow bodies corresponding to the target flow objects, and the associated flow bodies are flow bodies with preset interactive operation between the associated flow bodies and the target flow objects in the preset period;
The main body feature set construction module: the flow main feature set is used for constructing a main feature set based on main operation sequences corresponding to the plurality of target flow objects, and the main feature set comprises flow main features of associated flow main bodies corresponding to the plurality of target flow objects;
an abnormality detection module: and the flow main body detection module is used for carrying out abnormality detection on the related flow main body according to the flow main body characteristics to obtain an abnormal main body detection result.
12. A computer-readable storage medium, wherein at least one instruction or at least one program is stored in the storage medium, the at least one instruction or the at least one program being loaded and executed by a processor to implement the abnormal body detection method of any one of claims 1-10.
13. A computer device, characterized in that it comprises a processor and a memory in which at least one instruction or at least one program is stored, which is loaded and executed by the processor to implement the abnormal body detection method according to any one of claims 1-10.
14. A computer program product or computer program comprising computer instructions which, when executed by a processor, implement the abnormal body detection method according to any one of claims 1-10.
CN202210434514.2A 2022-04-24 2022-04-24 Abnormal body detection method, device, equipment and storage medium Pending CN116992017A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210434514.2A CN116992017A (en) 2022-04-24 2022-04-24 Abnormal body detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210434514.2A CN116992017A (en) 2022-04-24 2022-04-24 Abnormal body detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116992017A true CN116992017A (en) 2023-11-03

Family

ID=88532596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210434514.2A Pending CN116992017A (en) 2022-04-24 2022-04-24 Abnormal body detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116992017A (en)

Similar Documents

Publication Publication Date Title
CN107463704B (en) Search method and device based on artificial intelligence
CN110147551B (en) Multi-category entity recognition model training, entity recognition method, server and terminal
CN110032632A (en) Intelligent customer service answering method, device and storage medium based on text similarity
CN111932386B (en) User account determining method and device, information pushing method and device, and electronic equipment
CN113553412B (en) Question-answering processing method, question-answering processing device, electronic equipment and storage medium
CN113158554B (en) Model optimization method and device, computer equipment and storage medium
CN111371767A (en) Malicious account identification method, malicious account identification device, medium and electronic device
CN113821592B (en) Data processing method, device, equipment and storage medium
CN113392209A (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN111324724A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN115759748A (en) Risk detection model generation method and device and risk individual identification method and device
CN113723115A (en) Open domain question-answer prediction method based on pre-training model and related equipment
CN111325578B (en) Sample determination method and device of prediction model, medium and equipment
CN110807097A (en) Method and device for analyzing data
CN111368552A (en) Network user group division method and device for specific field
CN117009631A (en) Method, device, equipment and storage medium for screening put objects
CN116992017A (en) Abnormal body detection method, device, equipment and storage medium
CN111615178B (en) Method and device for identifying wireless network type and model training and electronic equipment
CN113392289B (en) Search recommendation method and device and electronic equipment
CN115131058A (en) Account identification method, device, equipment and storage medium
CN110442767B (en) Method and device for determining content interaction platform label and readable storage medium
CN114925681A (en) Knowledge map question-answer entity linking method, device, equipment and medium
CN112417260B (en) Localized recommendation method, device and storage medium
CN114398482A (en) Dictionary construction method and device, electronic equipment and storage medium
CN114638308A (en) Method and device for acquiring object relationship, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination