Disclosure of Invention
The embodiment of the application aims to provide a scheme for realizing equipment characteristic extraction in wind control management.
In order to solve the technical problems, the embodiment of the application is realized as follows:
In one aspect, an embodiment of the present disclosure provides a feature extraction method for a device, including:
acquiring equipment information and environment information of equipment;
performing numerical conversion on the equipment information to generate a corresponding equipment information numerical value;
aggregating the equipment information values to generate equipment identifiers, wherein the equipment identifiers are used for identifying the same type of equipment;
generating a feature vector containing the equipment identifier and the environmental information, and determining the service processing times corresponding to the feature vector under the same time window;
and determining the service processing times corresponding to the feature vector as the feature value of the training sample corresponding to the equipment so as to train the equipment risk identification model.
On the other hand, the embodiment of the specification also provides a device risk identification method based on a risk identification model, which comprises the following steps:
Acquiring equipment information to be detected and environment information of equipment;
performing numerical conversion on the equipment information to generate a corresponding equipment information numerical value;
aggregating the equipment information values to generate equipment identifiers, wherein the equipment identifiers are used for identifying the same type of equipment;
Generating a feature vector containing the equipment identifier and the environmental information, acquiring the service processing times corresponding to the feature vector under the same time window, and determining the service processing times as a feature value of the feature vector of the equipment to be detected;
And based on the characteristic value of the characteristic vector of the equipment to be detected, evaluating the risk degree of the equipment to be detected by adopting the equipment risk identification model.
Correspondingly, the embodiment of the specification also provides a feature extraction device for equipment, which comprises:
The acquisition module acquires equipment information and environment information of the equipment;
The conversion module is used for carrying out numerical conversion on the equipment information to generate a corresponding equipment information numerical value;
The aggregation module is used for aggregating the equipment information values to generate equipment identifiers, wherein the equipment identifiers are used for identifying the same type of equipment;
the generation module generates a feature vector containing the equipment identifier and the environmental information, and determines the service processing times corresponding to the feature vector under the same time window;
And the characteristic value determining module is used for determining the service processing times corresponding to the characteristic vector as the characteristic value of the training sample corresponding to the equipment so as to train the equipment risk identification model.
Corresponding to another aspect, an embodiment of the present disclosure further provides an apparatus for identifying a risk of a device based on a device risk identification model, including:
the acquisition module is used for acquiring equipment information to be detected and environment information of equipment;
The conversion module is used for carrying out numerical conversion on the equipment information to generate a corresponding equipment information numerical value;
The aggregation module is used for aggregating the equipment information values to generate equipment identifiers, wherein the equipment identifiers are used for identifying the same type of equipment;
The generation module generates a feature vector containing the equipment identifier and the environmental information, acquires the service processing times corresponding to the feature vector under the same time window, and determines the service processing times as a feature value of the feature vector of the equipment to be detected;
and the risk identification module is used for evaluating the risk degree of the equipment to be detected by adopting the equipment risk identification model based on the characteristic value of the characteristic vector of the equipment to be detected.
According to the scheme provided by the embodiment of the specification, some equipment information and environment information which are generally difficult to tamper in equipment are collected, the characteristic vector of the equipment is formed in a combined mode, and the characteristic vector containing the characteristic value is generated by combining the service processing times, so that the characteristic value of a training sample can be used for model training and risk identification, the identification accuracy of a risk identification model to the equipment is improved, the equipment dimension information is prevented from being broken through by a single point, and the stability and the accuracy of equipment risk identification are integrally improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments of the disclosure.
Further, not all of the effects described above need be achieved in any of the embodiments of the present specification.
Detailed Description
In order for those skilled in the art to better understand the technical solutions in the embodiments of the present specification, the technical solutions in the embodiments of the present specification will be described in detail below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification shall fall within the scope of protection.
When current marketers conduct business promotions, there are often a large number of promotional campaigns, such as dispensing red packs, cash, vouchers, and the like. In this process, a common method for identifying equipment in the wind control system is to use equipment fingerprints. For example, a device is located using a local mac address, international mobile equipment identity (International Mobile Equipment Identity, IMEI), international mobile subscriber identity (International Mobile Subscriber Identity, IMSI), baseband, version number, etc. as a device fingerprint.
In this way, the black-producing team often modifies some key parameters of the mobile phone through tampering with the tool, thereby causing the fingerprint of the device to change and the uniqueness to be destroyed. The black product team can disguise an infinite number of devices through continuous tampering of a small number of devices, so that interception of the wind control system is bypassed, marketing rewards are received without limit, and loss is caused. The core problem is that the characteristics of the devices in the risk identification model are too single and can be easily bypassed.
Based on this, the present specification embodiment provides a feature extraction method for a device for use in risk recognition model training. As shown in fig. 1, fig. 1 is a schematic flow chart of a feature extraction method for a device according to an embodiment of the present disclosure, where the flow specifically includes the following steps:
s101, acquiring equipment information and environment information of the equipment.
As previously mentioned, the device may be a user terminal such as a cell phone, tablet, personal computer, or the like.
In this embodiment of the present disclosure, the device information may include the aforementioned IMEI, IMSI, baseband, and version number of the strong device information, and may also include weak device information such as a device brand, a device model, a processor frequency, a ring volume, a call volume, an alarm volume, a remaining battery power, a remaining device memory capacity, or a remaining device memory card capacity. In other words, the acquired device information is device information that is not easily or frequently modified by the user.
The context information in which the device is located may include a network protocol IP address of the device, a media access control MAC address of the device, or a real physical address of the device (e.g., latitude and longitude coordinates obtained by a location module of the device).
S103, performing numerical conversion on the equipment information to generate a corresponding equipment information numerical value.
The numerical conversion herein includes a variety of ways. In particular, for device information that is not variable (here, not variable refers to changing naturally during use by a user), such as a device brand, a device model, a processor frequency, and the like. The one-to-one mapping may be performed using a preset mapping table. As shown in table 1, table 1 is a mapping table of device brands and device information values provided in the embodiments of the present specification.
Equipment branding |
Device information value |
Apple |
1 |
Huawei |
2 |
Honor |
3 |
Vivo |
4 |
…… |
…… |
For variable device information such as ring volume, call volume, alarm volume, battery remaining power, device memory remaining capacity or device memory card remaining capacity, corresponding numerical conversion can be performed based on a preset algorithm based on the current value of the device information.
For example, for the aforementioned variable device information, a percentage coefficient in the device is obtained for the device information, which percentage coefficient is generally used to describe the remaining available proportion of the device information. And further, determining the coefficient interval of the percentage statement, and determining the equipment information value corresponding to the coefficient interval according to the preset interval value corresponding relation.
For example, it is assumed that five intervals are divided from 0 to 100% equidistantly for the remaining proportion of the battery charge in advance, and sequentially correspond to the values 1 to 5. Assuming that the remaining battery power of one device is 50%, it can be known that the percentage coefficient 50% corresponds to the coefficient interval [0.4,0.6], and further it can be known that the corresponding device information value is 3.
By performing interval numerical mapping on the variable device information, the deviation caused by micro fluctuation of the device information can be reduced, and the stability of the sample characteristics can be improved.
S105, aggregating the equipment information values to generate equipment identifiers, wherein the equipment identifiers are used for identifying the same type of equipment.
The aggregation may be to splice the device information values in a specified order, and generate a character string containing the device information values as a device identifier. For example, the spliced character string is "112141336", that is, the device brand 1, the device model 1, the processor frequency 2, the ring volume 1, the call volume 4, the alarm volume 1, the remaining battery level 3, the remaining device memory capacity 3, and the remaining device memory card capacity 6 are respectively represented.
Or further other operations may be performed on the different device information values, for example, the device information values of the invariable device information may be encoded separately, and the device information values of the variable device information may be further generalized and aggregated to obtain the device identifier.
It will be readily appreciated that although the resulting device identification may already characterize the device, in practice, devices of other users may often have the same or similar device information and thus the same device identification. In other words, the device identification is used to identify a class of devices that have the same or similar device information. As shown by the aforementioned device identifier "112141336".
S107, generating a feature vector containing the equipment identifier and the environment information, and determining the service processing times corresponding to the feature vector under the same time window.
On the basis that the device identification has been generated, the device information and the environment information can be combined to obtain a feature vector which can characterize the device. Specifically, a network protocol IP address of the device may be obtained, and a first feature vector devicetag _ip_variable_category containing a device identifier and the IP address is generated; or acquiring a Media Access Control (MAC) address of the equipment to generate a second feature vector devicetag _Mac_variable_category containing the equipment identifier and the MAC address; or acquiring the real physical address of the device, and generating a third feature vector devicetag _lbs_variable_category containing the device identification and the real physical address.
The three feature vectors may be used alone or together. That is, for a device, one or more of the three feature vectors may be included in the training samples corresponding to the device.
Further, the service processing times under the same time window corresponding to the feature vector can be obtained for the determined feature vector. The service processing times comprise transaction times, acquisition rewards times, account numbers and the like. The time window may be preset, for example, the first 24 hours of the current time.
For example, for the first feature vector devicetag _ip_variable_category, if the feature vector is in the form of "(112141336, ip 1)", the number of rewards N1, the number of transactions N2 or the number of accounts N3 identified at the device as "112141336" and the ip address as "ip1" need to be acquired for the full amount of sample data (typically, history data over a certain period of time).
And S109, determining the service processing times corresponding to the feature vector as the feature value of the training sample corresponding to the equipment so as to train the equipment risk identification model.
As described above, the corresponding feature value N1 may be determined, so that the feature value N1 corresponding to the first feature vector (112141336, ip 1) "is used as the first feature value of the training sample corresponding to the apparatus. Similarly, the sample may further include another eigenvalue N2, such as a first eigenvector, or a second eigenvalue corresponding to a second eigenvector, and so on.
As previously described, device identification is used to identify a class of devices, but after context information is joined, the resulting feature vector may then be used to identify a particular device. In practice, it is considered that under certain environmental information conditions, different devices are sufficiently distinguished from each other by the device information obtained by the embodiments of the present specification. For example, at one ip or one longitude and latitude coordinate, the device information of two devices is not substantially the same. Thus, the feature vector can be used as a sample feature in risk identification to participate in model training and scoring. Of course, it should be noted that, in the case that the foregoing feature vector is included in the model training, other conventional feature variables are not excluded from participating in the training and risk recognition as sample features.
According to the scheme provided by the embodiment of the specification, some equipment information and environment information which are generally difficult to tamper in equipment are collected, the characteristic vector of the equipment is formed in a combined mode, and the characteristic vector containing the characteristic value is generated by combining the service processing times, so that the characteristic value of a training sample can be used for model training and risk identification, the identification accuracy of a risk identification model to the equipment is improved, the equipment dimension information is prevented from being broken through by a single point, and the stability and the accuracy of equipment risk identification are integrally improved.
Further, in the embodiment of the present disclosure, the corresponding model training may be performed based on the feature values of the training samples corresponding to the foregoing apparatus. In particular, the manner in which the model is trained may include supervised training or unsupervised clustering training.
In other words, the embodiment of the description can determine in advance whether a device is at risk (i.e. whether the device information is a black machine) and can also perform feature value extraction on the black machine sample based on the foregoing steps to serve as a negative sample in training, so that in supervised learning, each training sample can be given a corresponding label (whether the device is a black machine or not) in practice, and thus, the first device risk identification model can be obtained according to the feature value training of the training sample and used for evaluating whether the device is a black machine.
Or without the label of each sample device, the embodiment of the specification can also perform corresponding unsupervised clustering model training based on the feature value of the sample feature, and can classify the devices with similar features correspondingly through the clustering training, thereby obtaining a second device risk identification model for classification, and the second device risk identification model is used for evaluating whether one device is a black machine or not.
On the other hand, after the foregoing device risk recognition model has been trained, the embodiment of the present disclosure further provides a device risk recognition method based on the foregoing device risk recognition model, as shown in fig. 2, and fig. 2 is a schematic flow chart of a device risk recognition method provided in the embodiment of the present disclosure, including:
S201, acquiring equipment information to be detected and environment information where equipment is located;
s203, performing numerical conversion on the equipment information to generate a corresponding equipment information numerical value;
S205, aggregating the equipment information values to generate equipment identifiers, wherein the equipment identifiers are used for identifying the same type of equipment;
S207, generating a feature vector containing the equipment identifier and the environmental information, acquiring the service processing times corresponding to the feature vector under the same time window, and determining the service processing times as a feature value of the feature vector of the equipment to be detected;
s209, based on the characteristic value of the characteristic vector of the equipment to be detected, evaluating the risk degree of the equipment to be detected by adopting the equipment risk identification model.
Correspondingly, the embodiment of the present disclosure further provides a feature extraction device for a device, as shown in fig. 3, and fig. 3 is a schematic structural diagram of the feature extraction device for a device provided in the embodiment of the present disclosure, including:
the acquisition module 301 acquires device information and environment information where the device is located;
the conversion module 303 performs numerical conversion on the device information to generate a corresponding device information numerical value;
The aggregation module 305 aggregates the device information values to generate a device identifier, where the device identifier is used to identify the same type of device;
A generating module 307, configured to generate a feature vector containing the device identifier and the environmental information, and determine the number of service processing times corresponding to the feature vector under the same time window;
and the feature value determining module 309 determines the number of service processing times corresponding to the feature vector as the feature value of the training sample corresponding to the device, so as to perform training of the device risk identification model.
Further, the device information includes at least one of a device brand, a device model, a processor frequency, a ring volume, a call volume, an alarm volume, a battery remaining capacity, a device memory remaining capacity, or a device memory card remaining capacity; correspondingly, the conversion module 303 obtains percentage coefficients of ring volume, call volume, alarm volume, battery remaining capacity, device memory remaining capacity or device memory card remaining capacity; and determining a coefficient interval to which the percentage coefficient belongs, and determining a device information value corresponding to the coefficient interval according to a preset interval value corresponding relation.
Further, the aggregation module 305 splices the device information values according to a specified sequence, and generates a character string containing the device information values; and determining the character string as a device identifier.
Further, the generating module 307 obtains the IP address of the device and generates a first feature vector containing the device identifier and the IP address; or acquiring a Media Access Control (MAC) address of the equipment to generate a second feature vector containing the equipment identifier and the MAC address; or acquiring the real physical address of the device, and generating a third feature vector containing the device identification and the real physical address.
Further, the device further includes a model training module 311 for performing supervised or unsupervised model training according to the feature values of the training samples corresponding to the equipment, so as to generate an equipment risk identification model.
Corresponding to another aspect, the embodiment of the present disclosure further provides an apparatus risk recognition device based on the foregoing apparatus risk recognition model, as shown in fig. 4, and fig. 4 is a schematic structural diagram of the apparatus risk recognition device provided in the embodiment of the present disclosure, including:
The acquisition module 401 acquires equipment information to be detected and environment information in which the equipment is located;
the conversion module 403 performs numerical conversion on the device information to generate a corresponding device information numerical value;
an aggregation module 405, configured to aggregate the device information values, and generate a device identifier, where the device identifier is used to identify a device of the same class;
The generating module 407 generates a feature vector containing the equipment identifier and the environmental information, acquires the service processing times corresponding to the feature vector under the same time window, and determines the service processing times as a feature value of the feature vector of the equipment to be detected;
and a risk identification module 409, configured to evaluate a risk degree of the device to be detected using the device risk identification model based on the feature value of the feature vector of the device to be detected.
The embodiments of the present disclosure also provide a computer device at least including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the feature extraction method shown in fig. 1 when executing the program.
The embodiments of the present disclosure also provide a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, where the processor implements the device risk identification method shown in fig. 2 when executing the program.
FIG. 5 illustrates a more specific hardware architecture diagram of a computing device provided by embodiments of the present description, which may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 implement communication connections therebetween within the device via a bus 1050.
The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided in the embodiments of the present disclosure.
The Memory 1020 may be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory ), static storage, dynamic storage, etc. Memory 1020 may store an operating system and other application programs, and when the embodiments of the present specification are implemented in software or firmware, the associated program code is stored in memory 1020 and executed by processor 1010.
The input/output interface 1030 is used to connect with an input/output module for inputting and outputting information. The input/output module may be configured as a component in a device (not shown) or may be external to the device to provide corresponding functionality. Wherein the input devices may include a keyboard, mouse, touch screen, microphone, various types of sensors, etc., and the output devices may include a display, speaker, vibrator, indicator lights, etc.
Communication interface 1040 is used to connect communication modules (not shown) to enable communication interactions of the present device with other devices. The communication module may implement communication through a wired manner (such as USB, network cable, etc.), or may implement communication through a wireless manner (such as mobile network, WIFI, bluetooth, etc.).
Bus 1050 includes a path for transferring information between components of the device (e.g., processor 1010, memory 1020, input/output interface 1030, and communication interface 1040).
It should be noted that although the above-described device only shows processor 1010, memory 1020, input/output interface 1030, communication interface 1040, and bus 1050, in an implementation, the device may include other components necessary to achieve proper operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the feature extraction method shown in fig. 1.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the device risk identification method shown in fig. 2.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
From the foregoing description of embodiments, it will be apparent to those skilled in the art that the present embodiments may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be embodied in essence or what contributes to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present specification.
The system, method, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the method embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The above-described method embodiments are merely illustrative, in that the modules illustrated as separate components may or may not be physically separate, and the functions of the modules may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present disclosure. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely a specific implementation of the embodiments of this disclosure, and it should be noted that, for a person skilled in the art, several improvements and modifications may be made without departing from the principles of the embodiments of this disclosure, and these improvements and modifications should also be considered as protective scope of the embodiments of this disclosure.