CN110135566A - Registration user name detection method based on bis- Classification Neural model of LSTM - Google Patents
Registration user name detection method based on bis- Classification Neural model of LSTM Download PDFInfo
- Publication number
- CN110135566A CN110135566A CN201910425791.5A CN201910425791A CN110135566A CN 110135566 A CN110135566 A CN 110135566A CN 201910425791 A CN201910425791 A CN 201910425791A CN 110135566 A CN110135566 A CN 110135566A
- Authority
- CN
- China
- Prior art keywords
- data
- user name
- lstm
- bis
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a kind of registration user name detection methods based on bis- Classification Neural model of LSTM, the following steps are included: pre-process to training data and test data, wherein training data includes normal users name data and the username data that generates at random;Pretreated username data is encoded, the length of unified each data;Data after coding are carried out with the serializing of character level;Bis- Classification Neural model of LSTM is built by the training data after pretreatment, coding and serializing, forms the detection model of registration user name;It will be after pretreatment, coding and serializing in test data input detection model, the probability that test data is identified as exceptional sample is P by detection model, when P is more than or equal to abnormal probability threshold value, which is identified as exceptional sample, is otherwise identified as normal sample.The present invention registers whether user name as what is generated at random has good detection effect to platform.
Description
Technical field
The present invention relates to web applications, depth learning technology field, especially a kind of to be based on bis- Classification Neural mould of LSTM
The registration user name detection method of type.
Background technique
In recent years, web application becomes increasingly popular, and preferably services and keep here user to provide, many platforms provide use
Family registering functional, and registered to user is open, some problems are also following, and on the one hand open registration, can allow some not useful
The large batch of malicious registration account of the user of the heart, may cause network security problem.On the other hand, there are the use of magnanimity in platform
Family, user quality is irregular, certainly will will affect the related movable efficiency of operation of subsequent progress.
Summary of the invention
The user name generated at random or arbitrarily is batch malicious registration and the low-quality feature that shares of registration user, this
The naming rule of a little user names does not often meet the nomenclature rule of phonetic and English;To solve problems of the prior art,
The purpose of the invention is to detection platforms to register the doubtful random malicious user for generating user name and low quality user in user,
It is identified to batch registration identification, low quality user and reference is provided, propose a kind of note based on bis- Classification Neural model of LSTM
Volume user name detection method.
To achieve the above object, the technical solution adopted by the present invention is that: one kind be based on bis- Classification Neural model of LSTM
Registration user name detection method, comprising the following steps:
Step 1: pre-processing to training data and test data, wherein training data includes normal users name data
With the username data generated at random;
Step 2: being encoded to pretreated username data, the length of unified each data;
Step 3: the data after coding are carried out with the serializing of character level;
Step 4: bis- Classification Neural model of LSTM is built by the training data after pretreatment, coding and serializing,
Form the detection model of registration user name;
Step 5: will be after pretreatment, coding and serializing in test data input detection model, detection model will be surveyed
The probability that examination data are identified as exceptional sample is P, when P is more than or equal to abnormal probability threshold value, which is identified as different
Normal sample, is otherwise identified as normal sample.
As a preferred embodiment, carrying out pretreatment tool to training data and test data in the step 1
Body includes: the character for removing all non-English words in training data and test data, and the English character of capitalization is converted
The suffix of mailbox type is removed if the user name in data is name for the English character of small letter.
Data encode as another preferred embodiment, in the step 2 specific as follows: preprocessed
It only include English alphabet, corresponding 26 codings of 26 English alphabets, the length of the coded sequence of unified each data in data afterwards
Degree, the inadequate zero padding of length, what length was more than be truncated.
As another preferred embodiment, the data after coding are carried out in the step 3 to serialize specific packet
It includes: Feature Mapping being carried out to each letter using term vector technology, each letter corresponds to the vector of a regular length, specifically
It is embedded in by Embedding word and data is mapped as embeded matrix, if 32 dimension of output, the coding of each letter are mapped to
The vector of one 32 dimension, each user name sample become the matrix of a 1*20*32.
As another preferred embodiment, in the step 4, the bis- Classification Neural model of LSTM built is such as
Under:
First layer is embedding layers, and the sample of input is the character string that sequence length is equal to 20, is passed through
After embedding layers of coding mapping, the matrix that each output is 20*32, n sample is expressed as n*20*32;
The second layer is LSTM layers, and the matrix that input dimension is n*20*132, wherein n indicates user name sample strip number, output
Dimension is 64 dimensions, and the result dimension for exporting each time step is n*20*64;
Third layer is flatten layers, converts data to the dimension of n*1280;
4th layer is full articulamentum, and output dimension is 64, and data become the dimension of n*64 after this layer, activate in the layer
Function is ReLU;
Layer 5 is output layer, and output dimension is 2, and activation primitive is Softmax in the layer, and the loss function of model is to hand over
Entropy loss function is pitched, optimal way is adam optimization algorithm.
As another preferred embodiment, reduce detection model to test data by increasing abnormal probability threshold value
Wrong report.
The beneficial effects of the present invention are: training mould with the username data generated at random by using normal users name data
Type, in the prediction result to new data, as two disaggregated models, the accuracy rate and recall rate of every one kind data are on 95% left side
The right side, the user name that more can effectively distinguish normal users name and generate at random.Since requirement in actual use scene will just
Normal specimen discerning is that the ratio (rate of false alarm) of exceptional sample is small as far as possible, and the probability threshold value of exceptional sample is determined as by setting
It can control rate of false alarm, exceptional sample be accordingly determined as that normal probability will increase.
Detailed description of the invention
Fig. 1 is the method flow block diagram of the embodiment of the present invention;
Fig. 2 is bis- Classification Neural model structure of LSTM in the embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described in detail with reference to the accompanying drawing.
Embodiment:
As shown in Figure 1, a kind of registration user name detection method based on bis- Classification Neural model of LSTM, including it is following
Step:
1, the user name in training data and test data is pre-processed, the word including removing all non-English words
Symbol, capitalization turn small letter, if user name is name, remove the suffix of mailbox type;Such as: [email protected] is pre-
Output that treated is goodname.
2, the letter in pretreated username data is encoded, the length of unified each sample data;Processing
User name afterwards only includes English alphabet, has 26 letters corresponding 26 to encode, such as a:1, b:2, c:3 and so on, finally
The length of the coded sequence of unification user name is 15, and inadequate zero padding, be more than be truncated;Such as: user name
Become sequence [0,0,0,0,0,0,0,7,15,15,4,14,1,13,5] after " Goodname123 " coding uniform length.
3, Feature Mapping carried out to each letter using term vector technology, each corresponding regular length of letter to
Amount;
It is embedded in by Embedding word and data is mapped as embeded matrix, if 32 dimension of output, the coding of each letter
It is mapped to the vector of one 32 dimension, each user name sample becomes the matrix of a 1*20*32.
4, bis- Classification Neural model of LSTM as shown in Figure 2 is built:
First layer is embedding layers, and the sample of input is the character string that sequence length is equal to 20, is passed through
After embedding layers of coding mapping, the matrix that each output is 20*32, n sample is expressed as n*20*32;
The second layer is one LSTM layers, matrix (n indicates user name sample strip number) output that input dimension is n*20*132
Dimension is 64 dimensions, and the result dimension for exporting each time step is n*20*64;
Third layer is flatten layers, converts data to the dimension of n*1280;
4th layer is a full articulamentum, and output dimension is 64, and data become the dimension of n*64 after this layer, activate letter
Number is ReLU;
Layer 5 is output layer, and output dimension is 2, activation primitive Softmax, and the loss function of model is cross entropy damage
Function is lost, optimal way is adam optimization algorithm.
5, model wrong report amendment:
Detection model normal template be determined as exceptional sample or exceptional sample be determined as normal sample be known as report by mistake,
It actually uses in scene, the loss of wrong report bring is much larger than failing to report, so setting a kind of mechanism of setting probability threshold value to control
Wrong report.Threshold value can be increased to reduce wrong report, increased to a certain degree correspondingly, failing to report and having.
The present embodiment is further described below:
When batch registration behavior occurs for platform, the user of malicious registration is possible to generate a large amount of user name at random, can
To combine some appropriate rules, judge that batch registration possibility occurs for these users.
If some users only careless on probation lower platform, when registration, may at will fill in a login name, detect
Of this sort user name out can be marked, and successive stage can reduce these when carrying out user's operation
The priority of user reduces operation cost, improves efficiency of operation.
The method detected using random user name, detects the doubtful user name generated at random, can be to a certain extent
Reference is provided for malicious registration behavior, and the low quality user of a part can be filtered out based on the nomenclature rule of user name.
This method pre-processes user name, the serializing for line character grade of going forward side by side, and is expressed using LSTM in time series
The advantage of aspect, the user name generated using normal users name and at random carry out model training, learn two kinds of user name intercharacters
Collocating rule establish the two classification minds of LSTM to learn the naming rule of normal users name and the user name generated at random
Through network model, whether user name, which as what is generated at random has good detection effect, is registered to platform.
Random user name detection model is the model of one two classification, and existing sorting algorithm all supports two classification, but sharp
It can handle sequence problem with LSTM, because the spelling of phonetic or English word regardless of Chinese character, all follows specific character
Combination rule, the front and back collocation sequence of character be on the finally formed semanteme of word it is influential, can be fine using LSTM
Capture this relationship, and then learn to the difference in normal users and the abnormal user name name generated at random.
A specific embodiment of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention
Protect range.
Claims (6)
1. a kind of registration user name detection method based on bis- Classification Neural model of LSTM, which is characterized in that including following
Step:
Step 1: pre-processed to training data and test data, wherein training data include normal users name data with
The username data that machine generates;
Step 2: being encoded to pretreated username data, the length of unified each data;
Step 3: the data after coding are carried out with the serializing of character level;
Step 4: building bis- Classification Neural model of LSTM by the training data after pretreatment, coding and serializing, formed
Register the detection model of user name;
Step 5: will be after pretreatment, coding and serializing in test data input detection model, detection model will test number
It is P according to the probability for being identified as exceptional sample, when P is more than or equal to abnormal probability threshold value, which is identified as abnormal sample
This, is otherwise identified as normal sample.
2. the registration user name detection method according to claim 1 based on bis- Classification Neural model of LSTM, special
Sign is, in the step 1, carries out pretreatment to training data and test data and specifically includes: removal training data and test
The character of all non-English words in data, and the English character of capitalization is converted to the English character of small letter, if data
In user name be name, then remove the suffix of mailbox type.
3. the registration user name detection method according to claim 2 based on bis- Classification Neural model of LSTM, special
Sign is, encode to data in the step 2 specific as follows: in data after pretreatment only include English alphabet, 26
Corresponding 26 codings of a English alphabet, the length of the coded sequence of unified each data, the inadequate zero padding of length, what length was more than
It is truncated.
4. the registration user name detection method according to claim 3 based on bis- Classification Neural model of LSTM, special
Sign is, carries out serializing to the data after coding in the step 3 and specifically includes: using term vector technology to each letter
Feature Mapping is carried out, the vector of each corresponding regular length of letter is embedded in particular by Embedding word and reflects data
It penetrates as embeded matrix, if 32 dimension of output, the coding of each letter are mapped to one 32 vector tieed up, each user name sample
Originally become the matrix of a 1*20*32.
5. the registration user name detection method according to claim 4 based on bis- Classification Neural model of LSTM, special
Sign is, in the step 4, the bis- Classification Neural model of LSTM built is as follows:
First layer is embedding layers, and the sample of input is the character string that sequence length is equal to 20, by embedding layers
Coding mapping after, the matrix that each output is 20*32, n sample be expressed as n*20*32;
The second layer is LSTM layers, and the matrix that input dimension is n*20*132, wherein n indicates user name sample strip number, exports dimension
The result dimension tieed up for 64, and export each time step is n*20*64;
Third layer is flatten layers, converts data to the dimension of n*1280;
4th layer is full articulamentum, and output dimension is 64, and data become the dimension of n*64 after this layer, activation primitive in the layer
For ReLU;
Layer 5 is output layer, and output dimension is 2, and activation primitive is Softmax in the layer, and the loss function of model is cross entropy
Loss function, optimal way are adam optimization algorithm.
6. the registration user name detection method according to claim 1 or 5 based on bis- Classification Neural model of LSTM,
It is characterized in that, reduces wrong report of the detection model to test data by increasing abnormal probability threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910425791.5A CN110135566A (en) | 2019-05-21 | 2019-05-21 | Registration user name detection method based on bis- Classification Neural model of LSTM |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910425791.5A CN110135566A (en) | 2019-05-21 | 2019-05-21 | Registration user name detection method based on bis- Classification Neural model of LSTM |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110135566A true CN110135566A (en) | 2019-08-16 |
Family
ID=67572294
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910425791.5A Pending CN110135566A (en) | 2019-05-21 | 2019-05-21 | Registration user name detection method based on bis- Classification Neural model of LSTM |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135566A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800053A (en) * | 2021-01-05 | 2021-05-14 | 深圳索信达数据技术有限公司 | Data model generation method, data model calling device, data model equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105975504A (en) * | 2016-04-28 | 2016-09-28 | 中国科学院计算技术研究所 | Recurrent neural network-based social network message burst detection method and system |
CN107463703A (en) * | 2017-08-16 | 2017-12-12 | 电子科技大学 | English social media account number classification method based on information gain |
CN108197087A (en) * | 2018-01-18 | 2018-06-22 | 北京奇安信科技有限公司 | Character code recognition methods and device |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
CN108898015A (en) * | 2018-06-26 | 2018-11-27 | 暨南大学 | Application layer dynamic intruding detection system and detection method based on artificial intelligence |
CN109101552A (en) * | 2018-07-10 | 2018-12-28 | 东南大学 | A kind of fishing website URL detection method based on deep learning |
CN109308494A (en) * | 2018-09-27 | 2019-02-05 | 厦门服云信息科技有限公司 | LSTM Recognition with Recurrent Neural Network model and network attack identification method based on this model |
CN109522454A (en) * | 2018-11-20 | 2019-03-26 | 四川长虹电器股份有限公司 | The method for automatically generating web sample data |
KR20190051574A (en) * | 2017-11-07 | 2019-05-15 | 고려대학교 산학협력단 | Device and method for providing nationality information of user name using neural networks |
-
2019
- 2019-05-21 CN CN201910425791.5A patent/CN110135566A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105975504A (en) * | 2016-04-28 | 2016-09-28 | 中国科学院计算技术研究所 | Recurrent neural network-based social network message burst detection method and system |
CN107463703A (en) * | 2017-08-16 | 2017-12-12 | 电子科技大学 | English social media account number classification method based on information gain |
KR20190051574A (en) * | 2017-11-07 | 2019-05-15 | 고려대학교 산학협력단 | Device and method for providing nationality information of user name using neural networks |
CN108197087A (en) * | 2018-01-18 | 2018-06-22 | 北京奇安信科技有限公司 | Character code recognition methods and device |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
CN108898015A (en) * | 2018-06-26 | 2018-11-27 | 暨南大学 | Application layer dynamic intruding detection system and detection method based on artificial intelligence |
CN109101552A (en) * | 2018-07-10 | 2018-12-28 | 东南大学 | A kind of fishing website URL detection method based on deep learning |
CN109308494A (en) * | 2018-09-27 | 2019-02-05 | 厦门服云信息科技有限公司 | LSTM Recognition with Recurrent Neural Network model and network attack identification method based on this model |
CN109522454A (en) * | 2018-11-20 | 2019-03-26 | 四川长虹电器股份有限公司 | The method for automatically generating web sample data |
Non-Patent Citations (1)
Title |
---|
方明 等: "一种新型智能僵尸粉甄别方法", 《计算机工程》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800053A (en) * | 2021-01-05 | 2021-05-14 | 深圳索信达数据技术有限公司 | Data model generation method, data model calling device, data model equipment and storage medium |
CN112800053B (en) * | 2021-01-05 | 2021-12-24 | 深圳索信达数据技术有限公司 | Data model generation method, data model calling device, data model equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083831B (en) | Chinese named entity identification method based on BERT-BiGRU-CRF | |
CN109450845B (en) | Detection method for generating malicious domain name based on deep neural network algorithm | |
CN109977416A (en) | A kind of multi-level natural language anti-spam text method and system | |
CN108573047A (en) | A kind of training method and device of Module of Automatic Chinese Documents Classification | |
CN110442707A (en) | A kind of multi-tag file classification method based on seq2seq | |
CN112836496B (en) | Text error correction method based on BERT and feedforward neural network | |
CN109670036B (en) | Automatic news comment generation method and device | |
CN109413028A (en) | SQL injection detection method based on convolutional neural networks algorithm | |
CN110533570A (en) | A kind of general steganography method based on deep learning | |
CN111866004B (en) | Security assessment method, apparatus, computer system, and medium | |
CN112149420A (en) | Entity recognition model training method, threat information entity extraction method and device | |
CN114490953B (en) | Method for training event extraction model, method, device and medium for extracting event | |
CN109993169A (en) | One kind is based on character type method for recognizing verification code end to end | |
CN110263164A (en) | A kind of Sentiment orientation analysis method based on Model Fusion | |
CN107992211A (en) | A kind of Chinese character spelling wrong word correcting method based on CNN-LSTM | |
CN109325125B (en) | Social network rumor detection method based on CNN optimization | |
CN114282527A (en) | Multi-language text detection and correction method, system, electronic device and storage medium | |
CN111753290A (en) | Software type detection method and related equipment | |
CN111104513A (en) | Short text classification method for game platform user question-answer service | |
CN110009025A (en) | A kind of semi-supervised additive noise self-encoding encoder for voice lie detection | |
CN106803091A (en) | A kind of recognition methods of note denomination and system | |
CN114266254A (en) | Text named entity recognition method and system | |
CN106600283A (en) | Method and system for identifying the name nationalities as well as method and system for determining transaction risk | |
CN112364837A (en) | Bill information identification method based on target detection and text identification | |
CN110135566A (en) | Registration user name detection method based on bis- Classification Neural model of LSTM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190816 |
|
RJ01 | Rejection of invention patent application after publication |