Specific embodiment
With reference to the accompanying drawing, the scheme provided this specification is described.
Before introducing the recognition methods of exception object of specification one or more embodiment offer, first to this method
Inventive concept is described.
Firstly, the identification and control of telephone fraud are always the key points and difficulties in air control field.Therefore, this programme will be to different
Normal telephone number is identified.Further, it is also possible to will have similar characteristics with telephone number, and generally also can be to user's reality
The website and app for applying fraudulent act are also identified.
Secondly, being normally based on static data in traditional technology to carry out exception object identification.Although these static datas
The conspicuousness of certain Behavior preference or behavior in statistical significance of exception object in history can be depicted, behavior is disclosed
Potential risks.But these static datas are normally based on the statistics of single behavior.Individually count some behavior number this
The feature of sample can bring it is bigger bother because user has different behavior motives under different environment or scene.
It is usually inaccurate to be identified to exception object to being based only on static data.
For the accuracy for improving exception object identification, this programme is attempted all behavior combinations together, with identifying rows
To combine corresponding behavior motive and risk.And behavior sequence is so more preferable that combine " context ", from the system where user
Column behavior is set out.Therefore, this programme is when identifying exception object, using the behavior sequence of object as relevant feature.
It should be noted that behavior sequence refers to the operation behavior history of user according to group after the arrangement of time order of occurrence
At sequence.It comprises the order informations etc. that behavior event in behavior event itself and certain section of time window occurs.For example,
Behavior sequence in past 1 hour can be expressed as: " A- > B- > C- > D ", wherein A-D can be used to indicate that different user is directed to
The remarks title that certain an object is stored.It should be noted that, although " B- > C- > A- > D " is all contained with " A- > B- > C- > D "
Same behavior event, but become two kinds of entirely different behavior patterns because of order of occurrence difference.
Finally, due to which this programme simultaneously using behavior sequence as input feature vector, therefore, in exception object identification, considers
Using for portraying the input feature vector with sequence characteristic Recognition with Recurrent Neural Network (Recurrent Neural Network,
RNN) model carries out the classification of exception object.And due to that may include the remarks title of object in behavior sequence, it can not be direct
Input computer.Therefore, it may be considered that coding or vectorization are carried out to behavior sequence.Specifically, term vector can be used
Algorithm (e.g., word2vector or cw2vec etc.) or text classification (fasttext) algorithm encode behavior sequence
Or vectorization.
Above-mentioned is exactly the inventive concept for the scheme that this specification provides, and is based on the inventive concept, so that it may obtain this explanation
The scheme that book provides.The scheme provided below this specification is further elaborated:
Fig. 1 is the recognition methods schematic diagram for the exception object that this specification provides.In Fig. 1, N number of user is collected first and is directed to
Each object in respective object set executes generated storage information when storage behavior.Here object may include but not
It is limited to telephone number, website and app etc..For by taking object is telephone number as an example, corresponding object set can be
Address list.The content of address list is the very important supplement to existing air control data.Above-mentioned storage information may include but not
It is limited to remarks title and the storage time etc. of each object.Later, the storage information based on collection, obtains the quiet of each object
The statistical data of state and dynamically store behavior sequence.According to preset encryption algorithm, storage behavior sequence is encoded.
Finally by after coding storage behavior sequence and statistical data spliced after input classifier, with identify each object whether be
Exception object.
Fig. 2 is the recognition methods flow chart for the exception object that this specification one embodiment provides.The execution of the method
Main body can be the equipment with processing capacity: server or system or device, as shown in Fig. 2, the method specifically may be used
To include:
Step 202, it obtains produced when each object execution storage behavior that multiple users are directed in respective object set
Storage information.
Here object can include but is not limited to telephone number, website and app etc..Using object as telephone number
For for, corresponding object set can be address list.Above-mentioned storage information can include but is not limited to the standby of each object
Infuse title and storage time etc..It is understood that above-mentioned storage information can be from address list when object is telephone number
Middle acquisition.Namely the address list of multiple users is obtained first, then obtain each phone number from the respective address list of multiple users
The storage information of code.Again by object be website for for, corresponding storage information can be obtained from browser.
It should be noted that in addition to above-mentioned storage information, (e.g., this programme can also obtain corresponding device identification
International mobile equipment identification number (International Mobile Equipment Identification Number,
IMEI)).Further, it is also possible to record the corresponding relationship between user, object and storage information.
For by taking object is telephone number as an example, above-mentioned corresponding relationship can be as shown in table 1.
Table 1
It should be understood that the content of table 1 is that schematically, the corresponding relationship that this specification embodiment provides is not limited to institute as above
It states.For example, can also be including device identification etc. in table 1, this specification is not construed as limiting this.
It should be noted that above-mentioned steps 202 can be and periodically execute.In order to by same user former and later two
After the storage information in period compares, determine whether the user is directed to certain an object and performs delete operation etc..
Step 204, according to storage information, the static statistical data and dynamic storage behavior sequence of each object are obtained
Column.
Here static statistical data be usually count and data mining by way of obtain, may include with
Under it is one or more: object in past several days by how many a users storages, object in past several days by storage number of days, object
Number etc. is deleted in past several days whether past several days are stored as target designation (e.g., cheat etc.) and object
Deng.
It, can be based on the content of table 1, to obtain the statistical data of above-mentioned static state for by taking object is telephone number as an example.
Wherein, telephone number past several days be deleted number can by compare the address list in each former and later two periods of user come
It obtains.
Due to cannot sufficiently portray the feature of exception object only by static statistical data, this programme also into
One step obtains the dynamic storage behavior sequence of each object, to improve the accuracy rate and coverage rate of the identification of exception object.It deposits
Storage behavior sequence can depict the mode that certain behavior combination of exception object is constituted more accurately, than using single behavior
Statistical nature it is more accurate.
In one implementation, above-mentioned according to storage information, the dynamic storage behavior sequence for obtaining each object can
To include:
To each object in each object, the object is filtered out from the remarks title and storage time of each object
Remarks title and storage time.According to the storage time of the object, the remarks title of the object is ranked up.According to row
Remarks title after sequence generates the storage behavior sequence of the object.
As an example it is assumed that telephone number: 186 ×××× ××××s are stored by three different users to respective
In address list, wherein user A stored the telephone number on September 1st, 2017, and its remarks is entitled: cheat;User B in
Store the telephone number on September 10th, 2017, and its remarks is entitled: fraudster;User C was stored on September 22nd, 2017
The telephone number, and its remarks is entitled: tricker.It, can be by corresponding remarks title so according to above-mentioned storage time
Sequence are as follows: cheat, fraudster, fraudster.According to the ranking results, so that it may generate the storage behavior sequence of above-mentioned telephone number
Column: cheat -> fraudster -> tricker.
It can thus be seen that the storage behavior sequence in this programme had both reflected the order information that object is stored by user,
Content information is reflected, again so as to more accurately portray the feature of object.
It is understood that above-mentioned example is to generate the storage behavior sequence to elephant from the dimension of user.Certainly, in reality
In the application of border, above-mentioned storage behavior sequence can also be generated from other dimensions (e.g., the dimension of equipment), this specification to this not
It limits.When the dimension from equipment is to generate the storage behavior sequence to elephant, then can be obtained while obtaining and storing information
Device identification is taken, the generating mode of the storage behavior sequence of the dimension can be same as above, does not repeat again herein.
Step 206, according to preset encryption algorithm, storage behavior sequence is encoded.
Here preset encryption algorithm can include but is not limited to term vector algorithm (e.g., e.g., word2vector or
Person cw2vec etc.) and text classification algorithm (fasttext) etc..
By the storage behavior sequence of above-mentioned generation: for for cheat -> fraudster -> tricker, in the storage sequence
Each remarks title is essentially character string, therefore, can be converted each character string to by above-mentioned preset encryption algorithm
Corresponding vector.Later, each vector is mutually spliced, so that it may the storage behavior sequence after being encoded.
Step 208, the storage behavior sequence after coding is spliced with statistical data, to obtain the splicing of each object
Data afterwards.
Since static statistical data is usually some numbers, e.g., stores number and delete number etc, it can be with
Directly input classifier.Therefore, directly the storage behavior sequence after coding can be spliced with statistical data, to obtain
Spliced data.It is understood that spliced data here are various dimensions vector.
Step 210, spliced data are inputted into classifier, to identify whether each object is exception object.
In one implementation, which can be RNN model or shot and long term memory network (Long Short-
Term Memory, LSTM) model etc..Specifically, after spliced data are inputted classifier, it is each right to export
It as the probability for exception object and is not the probability of exception object.Based on two probability, whether each object can be identified
For exception object.
To sum up, this programme is directed to a kind of storage behavior sequence of combination object, the method to identify exception object.It deposits
Storage behavior sequence intuitively reflects the modus operandi of fraudster, and analysis of strategies teacher can be assisted to be easily analyzed case fraud
Gimmick improves working efficiency.In addition, this programme is by the stored behavior sequence entirety of object, (sequencing including behavior is believed
Breath etc.) it is used as research object, portray the behavioural characteristic of exception object.To enrich fraud system in air control system, it is
Characterizations provide more effective informations.Specifically, the storage behavior sequence of telephone number is introduced during telephone fraud
Column feature can promote the accuracy rate and coverage rate of fraudulent call identification significantly.
The recognition methods for the exception object that this specification one or more embodiment provides first obtains the both sides of object
Data: dynamic data and static data have been carried out fusion and characteristic processing, carved more accurately by static data and dynamic data
The mode that certain behavior combination of exception object is constituted is drawn, than using the statistical nature of single behavior more accurate.
Below by taking object is telephone number as an example, its identification process is illustrated.It should be noted that due to address list
In would generally include various information, therefore, below based on the content of address list, to illustrate the knowledge of fraudulent call
Other process.
Fig. 3 is the recognition methods flow chart for the fraudulent call that this specification provides.As shown in figure 3, the method can wrap
Include following steps:
Step 302, the address list of multiple users is obtained.
It wherein, may include the information such as telephone number, remarks title and the storage time of contact person in address list.Its
In, remarks title and storage time may be collectively referred to as storage information corresponding with telephone number.
After the address list for getting multiple users, corresponding relationship as shown in Table 1 can establish.In addition, in table 1 also
It may include device identification etc., this specification is not construed as limiting this.
It should be noted that above-mentioned steps 302 can be and periodically execute.In order to by same user former and later two
After the address list in period compares, determine whether the user is directed to a certain telephone number and performs delete operation etc..
Step 304, it according to the content of address list, obtains the static statistical data of each telephone number and dynamically deposits
Store up behavior sequence.
Here static statistical data be usually count and data mining by way of obtain, may include with
Under it is one or more: telephone number is stored by how many a users storages, telephone number in past several days in past several days
Number of days, telephone number are deleted number and telephone number in past several days and whether are stored as " cheat " in past several days
Etc..It specifically, can be based on the content of table 1, to obtain the statistical data of above-mentioned static state.Wherein, if telephone number was in the past
Dry day is deleted number and can be obtained by comparing the address list in each former and later two periods of user.
The content for not making full use of address list due to only obtaining static statistical data, so the present embodiment also obtains
The dynamic storage behavior sequence of telephone number is taken.It is so more preferable that combine " context " to store behavior sequence, where from user
A series of behaviors set out, comprehensively consider all behavior combinations corresponding behavior motive and risk together.The storage line
It is sequence compared with above-mentioned statistical data, fraudulent call certain Behavior preference in history can be depicted or behavior exists
Conspicuousness in statistical significance discloses behavior potential risks.The difference is that storage behavior sequence can more accurately
The mode that certain behavior combination of fraudulent call is constituted is depicted, than using the statistical nature of single behavior more accurate.
In one implementation, the above-mentioned content according to address list obtains the dynamic storage behavior of 1 telephone number
Sequence may include:
The remarks title and storage time of the telephone number are filtered out from the address list of multiple users.According to the phone
The storage time of number is ranked up the remarks title of the telephone number.According to the remarks title after sequence, the phone is generated
The storage behavior sequence of number.
It is understood that referring to above-mentioned acquisition methods, each phone number in the available address list to multiple users
The dynamic storage behavior sequence of code.
As an example it is assumed that telephone number: 186 ×××× ××××s are stored by three different users to respective
In address list, wherein user A stored the telephone number on September 1st, 2017, and its remarks is entitled: cheat;User B in
Store the telephone number on September 10th, 2017, and its remarks is entitled: fraudster;User C was stored on September 22nd, 2017
The telephone number, and its remarks is entitled: tricker.It, can be by corresponding remarks title so according to above-mentioned storage time
Sequence are as follows: cheat, fraudster, fraudster.According to the ranking results, so that it may generate the storage behavior sequence of above-mentioned telephone number
Column: cheat -> fraudster -> tricker.
It is understood that being the storage behavior sequence for generating telephone number from the dimension of user in this programme.Certainly,
In practical applications, above-mentioned storage behavior sequence, this specification pair can also be generated from other dimensions (e.g., the dimension of equipment)
This is not construed as limiting.When storage behavior sequence of the dimension from equipment to generate telephone number, then address list can be being obtained
Device identification is obtained simultaneously, the generating mode of the storage behavior sequence of the dimension can be same as above, does not repeat again herein.
Step 306, according to preset encryption algorithm, storage behavior sequence is encoded.
Here preset encryption algorithm can include but is not limited to term vector algorithm (e.g., word2vector or
Cw2vec etc.) and text classification algorithm (fasttext) etc..
By the storage behavior sequence of above-mentioned generation: for for cheat -> fraudster -> tricker, in the storage sequence
Each remarks title is essentially character string, therefore, can be converted each character string to by above-mentioned preset encryption algorithm
Corresponding vector.Later, each vector is mutually spliced, so that it may the storage behavior sequence after being encoded.
Step 308, the storage behavior sequence after coding is spliced with statistical data, to obtain each telephone number
Spliced data.
Since static statistical data is usually some numbers, e.g., stores number and delete number etc, it can be with
Directly input classifier.Therefore, directly the storage behavior sequence after coding can be spliced with statistical data, to obtain
Spliced data.It is understood that spliced data here are various dimensions vector.
Step 310, spliced data are inputted into classifier, to identify whether each telephone number is fraudulent call.
In one implementation, which can be RNN model or shot and long term memory network (Long Short-
Term Memory, LSTM) model etc..Specifically, after spliced data are inputted classifier, each electricity can be exported
Words number be the probability of fraudulent call and be not fraudulent call probability.Based on two probability, each phone can be identified
Whether number is fraudulent call.
This specification embodiment is directed to a kind of behavior sequence for storing telephone number according to user and knows as feature
The method of other fraudulent call.It is mainly characterized by the stored behavior sequence of telephone number is whole (successive suitable including behavior
Sequence information etc.) it is used as research object, portray the behavioural characteristic of swindler.In this way, fraudulent call can be promoted significantly
The accuracy rate and coverage rate of identification.
Recognition methods with above-mentioned exception object accordingly, a kind of exception object that this specification one embodiment also provides
Identification device, as shown in figure 4, the apparatus may include:
Acquiring unit 402 executes storage behavior for each object in respective object set for obtaining multiple users
When generated storage information.
Here object may include following any: telephone number, station address and app etc..
Acquiring unit 402 is also used to obtain the static statistical data of each object and dynamic according to storage information
Store behavior sequence.
Here static statistical data may include one or more of: object is used in past several days by how many
Whether family storage, object are stored as target designation and right in past several days by storage number of days, object in past several days
As being deleted number in past several days.
Above-mentioned storage information may include the remarks title and storage time of each object.
Acquiring unit 402 specifically can be used for:
To each object in each object, object is filtered out from the remarks title and storage time of each object
Remarks title and storage time.
According to the storage time of object, the remarks title of object is ranked up.
According to the remarks title after sequence, the storage behavior sequence of object is generated.
Coding unit 404, for according to preset encryption algorithm, storage behavior sequence that acquiring unit 402 is obtained into
Row coding.
Preset encryption algorithm may include following any: term vector algorithm and text classification algorithm etc..
Concatenation unit 406 splices for the storage behavior sequence after encoding coding unit 404 with statistical data,
To obtain the spliced data of each object.
Recognition unit 408, the spliced data for obtaining concatenation unit 406 input classifier, each to identify
Whether object is exception object.
The function of each functional module of this specification above-described embodiment device can pass through each step of above method embodiment
Rapid to realize, therefore, the specific work process for the device that this specification one embodiment provides does not repeat again herein.
The identification device for the exception object that this specification one embodiment provides, acquiring unit 402 obtain multiple user's needles
Generated storage information when executing storage behavior to each object in respective object set.Acquiring unit 402 is according to storage
Information obtains the static statistical data of each object and dynamically stores behavior sequence.Coding unit 404 is according to preset
Encryption algorithm encodes storage behavior sequence.Concatenation unit 406 by after coding storage behavior sequence and statistical data into
Row splicing, to obtain the spliced data of each object.Spliced data are inputted classifier by recognition unit 408, to know
Whether other each object is exception object.Thus, it is possible to improve the accuracy of exception object identification.
Accordingly, this specification embodiment additionally provides a kind of knowledge of exception object for recognition methods with above-mentioned exception object
Other equipment, as shown in figure 5, the equipment may include: memory 502, one or more processors 504 and one or more journeys
Sequence.Wherein, which is stored in memory 502, and is configured to be held by one or more processors 504
Row, the program perform the steps of when being executed by processor 504
Generated storage when multiple users execute storage behavior for each object in respective object set is obtained to believe
Breath.
According to storage information, obtains the static statistical data of each object and dynamically store behavior sequence.
According to preset encryption algorithm, storage behavior sequence is encoded.
Storage behavior sequence after coding is spliced with statistical data, to obtain the spliced number of each object
According to.
Spliced data are inputted into classifier, to identify whether each object is exception object.
The accurate of exception object identification can be improved in the identification equipment for the exception object that this specification one embodiment provides
Property.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for equipment reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The step of method in conjunction with described in this disclosure content or algorithm can realize in a manner of hardware,
It can be and the mode of software instruction is executed by processor to realize.Software instruction can be made of corresponding software module, software
Module can be stored on RAM memory, flash memory, ROM memory, eprom memory, eeprom memory, register, hard
Disk, mobile hard disk, CD-ROM or any other form well known in the art storage medium in.A kind of illustrative storage Jie
Matter is coupled to processor, to enable a processor to from the read information, and information can be written to the storage medium.
Certainly, storage medium is also possible to the component part of processor.Pocessor and storage media can be located in ASIC.In addition, should
ASIC can be located in server.Certainly, pocessor and storage media can also be used as discrete assembly and be present in server.
Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention
It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions
Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.
Computer-readable medium includes computer storage media and communication media, and wherein communication media includes convenient for from a place to another
Any medium of one place transmission computer program.Storage medium can be general or specialized computer can access it is any
Usable medium.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims
It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment
It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable
Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can
With or may be advantageous.
Above-described specific embodiment has carried out into one the purpose of this specification, technical scheme and beneficial effects
Step is described in detail, it should be understood that being not used to limit this foregoing is merely the specific embodiment of this specification
The protection scope of specification, all any modifications on the basis of the technical solution of this specification, made, change equivalent replacement
Into etc., it should all include within the protection scope of this specification.