CN110135566A - Registration user name detection method based on bis- Classification Neural model of LSTM - Google Patents

Registration user name detection method based on bis- Classification Neural model of LSTM Download PDF

Info

Publication number
CN110135566A
CN110135566A CN201910425791.5A CN201910425791A CN110135566A CN 110135566 A CN110135566 A CN 110135566A CN 201910425791 A CN201910425791 A CN 201910425791A CN 110135566 A CN110135566 A CN 110135566A
Authority
CN
China
Prior art keywords
data
user name
lstm
bis
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910425791.5A
Other languages
Chinese (zh)
Inventor
普雪飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201910425791.5A priority Critical patent/CN110135566A/en
Publication of CN110135566A publication Critical patent/CN110135566A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a kind of registration user name detection methods based on bis- Classification Neural model of LSTM, the following steps are included: pre-process to training data and test data, wherein training data includes normal users name data and the username data that generates at random;Pretreated username data is encoded, the length of unified each data;Data after coding are carried out with the serializing of character level;Bis- Classification Neural model of LSTM is built by the training data after pretreatment, coding and serializing, forms the detection model of registration user name;It will be after pretreatment, coding and serializing in test data input detection model, the probability that test data is identified as exceptional sample is P by detection model, when P is more than or equal to abnormal probability threshold value, which is identified as exceptional sample, is otherwise identified as normal sample.The present invention registers whether user name as what is generated at random has good detection effect to platform.

Description

Registration user name detection method based on bis- Classification Neural model of LSTM
Technical field
The present invention relates to web applications, depth learning technology field, especially a kind of to be based on bis- Classification Neural mould of LSTM The registration user name detection method of type.
Background technique
In recent years, web application becomes increasingly popular, and preferably services and keep here user to provide, many platforms provide use Family registering functional, and registered to user is open, some problems are also following, and on the one hand open registration, can allow some not useful The large batch of malicious registration account of the user of the heart, may cause network security problem.On the other hand, there are the use of magnanimity in platform Family, user quality is irregular, certainly will will affect the related movable efficiency of operation of subsequent progress.
Summary of the invention
The user name generated at random or arbitrarily is batch malicious registration and the low-quality feature that shares of registration user, this The naming rule of a little user names does not often meet the nomenclature rule of phonetic and English;To solve problems of the prior art, The purpose of the invention is to detection platforms to register the doubtful random malicious user for generating user name and low quality user in user, It is identified to batch registration identification, low quality user and reference is provided, propose a kind of note based on bis- Classification Neural model of LSTM Volume user name detection method.
To achieve the above object, the technical solution adopted by the present invention is that: one kind be based on bis- Classification Neural model of LSTM Registration user name detection method, comprising the following steps:
Step 1: pre-processing to training data and test data, wherein training data includes normal users name data With the username data generated at random;
Step 2: being encoded to pretreated username data, the length of unified each data;
Step 3: the data after coding are carried out with the serializing of character level;
Step 4: bis- Classification Neural model of LSTM is built by the training data after pretreatment, coding and serializing, Form the detection model of registration user name;
Step 5: will be after pretreatment, coding and serializing in test data input detection model, detection model will be surveyed The probability that examination data are identified as exceptional sample is P, when P is more than or equal to abnormal probability threshold value, which is identified as different Normal sample, is otherwise identified as normal sample.
As a preferred embodiment, carrying out pretreatment tool to training data and test data in the step 1 Body includes: the character for removing all non-English words in training data and test data, and the English character of capitalization is converted The suffix of mailbox type is removed if the user name in data is name for the English character of small letter.
Data encode as another preferred embodiment, in the step 2 specific as follows: preprocessed It only include English alphabet, corresponding 26 codings of 26 English alphabets, the length of the coded sequence of unified each data in data afterwards Degree, the inadequate zero padding of length, what length was more than be truncated.
As another preferred embodiment, the data after coding are carried out in the step 3 to serialize specific packet It includes: Feature Mapping being carried out to each letter using term vector technology, each letter corresponds to the vector of a regular length, specifically It is embedded in by Embedding word and data is mapped as embeded matrix, if 32 dimension of output, the coding of each letter are mapped to The vector of one 32 dimension, each user name sample become the matrix of a 1*20*32.
As another preferred embodiment, in the step 4, the bis- Classification Neural model of LSTM built is such as Under:
First layer is embedding layers, and the sample of input is the character string that sequence length is equal to 20, is passed through After embedding layers of coding mapping, the matrix that each output is 20*32, n sample is expressed as n*20*32;
The second layer is LSTM layers, and the matrix that input dimension is n*20*132, wherein n indicates user name sample strip number, output Dimension is 64 dimensions, and the result dimension for exporting each time step is n*20*64;
Third layer is flatten layers, converts data to the dimension of n*1280;
4th layer is full articulamentum, and output dimension is 64, and data become the dimension of n*64 after this layer, activate in the layer Function is ReLU;
Layer 5 is output layer, and output dimension is 2, and activation primitive is Softmax in the layer, and the loss function of model is to hand over Entropy loss function is pitched, optimal way is adam optimization algorithm.
As another preferred embodiment, reduce detection model to test data by increasing abnormal probability threshold value Wrong report.
The beneficial effects of the present invention are: training mould with the username data generated at random by using normal users name data Type, in the prediction result to new data, as two disaggregated models, the accuracy rate and recall rate of every one kind data are on 95% left side The right side, the user name that more can effectively distinguish normal users name and generate at random.Since requirement in actual use scene will just Normal specimen discerning is that the ratio (rate of false alarm) of exceptional sample is small as far as possible, and the probability threshold value of exceptional sample is determined as by setting It can control rate of false alarm, exceptional sample be accordingly determined as that normal probability will increase.
Detailed description of the invention
Fig. 1 is the method flow block diagram of the embodiment of the present invention;
Fig. 2 is bis- Classification Neural model structure of LSTM in the embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described in detail with reference to the accompanying drawing.
Embodiment:
As shown in Figure 1, a kind of registration user name detection method based on bis- Classification Neural model of LSTM, including it is following Step:
1, the user name in training data and test data is pre-processed, the word including removing all non-English words Symbol, capitalization turn small letter, if user name is name, remove the suffix of mailbox type;Such as: [email protected] is pre- Output that treated is goodname.
2, the letter in pretreated username data is encoded, the length of unified each sample data;Processing User name afterwards only includes English alphabet, has 26 letters corresponding 26 to encode, such as a:1, b:2, c:3 and so on, finally The length of the coded sequence of unification user name is 15, and inadequate zero padding, be more than be truncated;Such as: user name Become sequence [0,0,0,0,0,0,0,7,15,15,4,14,1,13,5] after " Goodname123 " coding uniform length.
3, Feature Mapping carried out to each letter using term vector technology, each corresponding regular length of letter to Amount;
It is embedded in by Embedding word and data is mapped as embeded matrix, if 32 dimension of output, the coding of each letter It is mapped to the vector of one 32 dimension, each user name sample becomes the matrix of a 1*20*32.
4, bis- Classification Neural model of LSTM as shown in Figure 2 is built:
First layer is embedding layers, and the sample of input is the character string that sequence length is equal to 20, is passed through After embedding layers of coding mapping, the matrix that each output is 20*32, n sample is expressed as n*20*32;
The second layer is one LSTM layers, matrix (n indicates user name sample strip number) output that input dimension is n*20*132 Dimension is 64 dimensions, and the result dimension for exporting each time step is n*20*64;
Third layer is flatten layers, converts data to the dimension of n*1280;
4th layer is a full articulamentum, and output dimension is 64, and data become the dimension of n*64 after this layer, activate letter Number is ReLU;
Layer 5 is output layer, and output dimension is 2, activation primitive Softmax, and the loss function of model is cross entropy damage Function is lost, optimal way is adam optimization algorithm.
5, model wrong report amendment:
Detection model normal template be determined as exceptional sample or exceptional sample be determined as normal sample be known as report by mistake, It actually uses in scene, the loss of wrong report bring is much larger than failing to report, so setting a kind of mechanism of setting probability threshold value to control Wrong report.Threshold value can be increased to reduce wrong report, increased to a certain degree correspondingly, failing to report and having.
The present embodiment is further described below:
When batch registration behavior occurs for platform, the user of malicious registration is possible to generate a large amount of user name at random, can To combine some appropriate rules, judge that batch registration possibility occurs for these users.
If some users only careless on probation lower platform, when registration, may at will fill in a login name, detect Of this sort user name out can be marked, and successive stage can reduce these when carrying out user's operation The priority of user reduces operation cost, improves efficiency of operation.
The method detected using random user name, detects the doubtful user name generated at random, can be to a certain extent Reference is provided for malicious registration behavior, and the low quality user of a part can be filtered out based on the nomenclature rule of user name.
This method pre-processes user name, the serializing for line character grade of going forward side by side, and is expressed using LSTM in time series The advantage of aspect, the user name generated using normal users name and at random carry out model training, learn two kinds of user name intercharacters Collocating rule establish the two classification minds of LSTM to learn the naming rule of normal users name and the user name generated at random Through network model, whether user name, which as what is generated at random has good detection effect, is registered to platform.
Random user name detection model is the model of one two classification, and existing sorting algorithm all supports two classification, but sharp It can handle sequence problem with LSTM, because the spelling of phonetic or English word regardless of Chinese character, all follows specific character Combination rule, the front and back collocation sequence of character be on the finally formed semanteme of word it is influential, can be fine using LSTM Capture this relationship, and then learn to the difference in normal users and the abnormal user name name generated at random.
A specific embodiment of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously Limitations on the scope of the patent of the present invention therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to guarantor of the invention Protect range.

Claims (6)

1. a kind of registration user name detection method based on bis- Classification Neural model of LSTM, which is characterized in that including following Step:
Step 1: pre-processed to training data and test data, wherein training data include normal users name data with The username data that machine generates;
Step 2: being encoded to pretreated username data, the length of unified each data;
Step 3: the data after coding are carried out with the serializing of character level;
Step 4: building bis- Classification Neural model of LSTM by the training data after pretreatment, coding and serializing, formed Register the detection model of user name;
Step 5: will be after pretreatment, coding and serializing in test data input detection model, detection model will test number It is P according to the probability for being identified as exceptional sample, when P is more than or equal to abnormal probability threshold value, which is identified as abnormal sample This, is otherwise identified as normal sample.
2. the registration user name detection method according to claim 1 based on bis- Classification Neural model of LSTM, special Sign is, in the step 1, carries out pretreatment to training data and test data and specifically includes: removal training data and test The character of all non-English words in data, and the English character of capitalization is converted to the English character of small letter, if data In user name be name, then remove the suffix of mailbox type.
3. the registration user name detection method according to claim 2 based on bis- Classification Neural model of LSTM, special Sign is, encode to data in the step 2 specific as follows: in data after pretreatment only include English alphabet, 26 Corresponding 26 codings of a English alphabet, the length of the coded sequence of unified each data, the inadequate zero padding of length, what length was more than It is truncated.
4. the registration user name detection method according to claim 3 based on bis- Classification Neural model of LSTM, special Sign is, carries out serializing to the data after coding in the step 3 and specifically includes: using term vector technology to each letter Feature Mapping is carried out, the vector of each corresponding regular length of letter is embedded in particular by Embedding word and reflects data It penetrates as embeded matrix, if 32 dimension of output, the coding of each letter are mapped to one 32 vector tieed up, each user name sample Originally become the matrix of a 1*20*32.
5. the registration user name detection method according to claim 4 based on bis- Classification Neural model of LSTM, special Sign is, in the step 4, the bis- Classification Neural model of LSTM built is as follows:
First layer is embedding layers, and the sample of input is the character string that sequence length is equal to 20, by embedding layers Coding mapping after, the matrix that each output is 20*32, n sample be expressed as n*20*32;
The second layer is LSTM layers, and the matrix that input dimension is n*20*132, wherein n indicates user name sample strip number, exports dimension The result dimension tieed up for 64, and export each time step is n*20*64;
Third layer is flatten layers, converts data to the dimension of n*1280;
4th layer is full articulamentum, and output dimension is 64, and data become the dimension of n*64 after this layer, activation primitive in the layer For ReLU;
Layer 5 is output layer, and output dimension is 2, and activation primitive is Softmax in the layer, and the loss function of model is cross entropy Loss function, optimal way are adam optimization algorithm.
6. the registration user name detection method according to claim 1 or 5 based on bis- Classification Neural model of LSTM, It is characterized in that, reduces wrong report of the detection model to test data by increasing abnormal probability threshold value.
CN201910425791.5A 2019-05-21 2019-05-21 Registration user name detection method based on bis- Classification Neural model of LSTM Pending CN110135566A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910425791.5A CN110135566A (en) 2019-05-21 2019-05-21 Registration user name detection method based on bis- Classification Neural model of LSTM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910425791.5A CN110135566A (en) 2019-05-21 2019-05-21 Registration user name detection method based on bis- Classification Neural model of LSTM

Publications (1)

Publication Number Publication Date
CN110135566A true CN110135566A (en) 2019-08-16

Family

ID=67572294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910425791.5A Pending CN110135566A (en) 2019-05-21 2019-05-21 Registration user name detection method based on bis- Classification Neural model of LSTM

Country Status (1)

Country Link
CN (1) CN110135566A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800053A (en) * 2021-01-05 2021-05-14 深圳索信达数据技术有限公司 Data model generation method, data model calling device, data model equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975504A (en) * 2016-04-28 2016-09-28 中国科学院计算技术研究所 Recurrent neural network-based social network message burst detection method and system
CN107463703A (en) * 2017-08-16 2017-12-12 电子科技大学 English social media account number classification method based on information gain
CN108197087A (en) * 2018-01-18 2018-06-22 北京奇安信科技有限公司 Character code recognition methods and device
CN108764268A (en) * 2018-04-02 2018-11-06 华南理工大学 A kind of multi-modal emotion identification method of picture and text based on deep learning
CN108898015A (en) * 2018-06-26 2018-11-27 暨南大学 Application layer dynamic intruding detection system and detection method based on artificial intelligence
CN109101552A (en) * 2018-07-10 2018-12-28 东南大学 A kind of fishing website URL detection method based on deep learning
CN109308494A (en) * 2018-09-27 2019-02-05 厦门服云信息科技有限公司 LSTM Recognition with Recurrent Neural Network model and network attack identification method based on this model
CN109522454A (en) * 2018-11-20 2019-03-26 四川长虹电器股份有限公司 The method for automatically generating web sample data
KR20190051574A (en) * 2017-11-07 2019-05-15 고려대학교 산학협력단 Device and method for providing nationality information of user name using neural networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975504A (en) * 2016-04-28 2016-09-28 中国科学院计算技术研究所 Recurrent neural network-based social network message burst detection method and system
CN107463703A (en) * 2017-08-16 2017-12-12 电子科技大学 English social media account number classification method based on information gain
KR20190051574A (en) * 2017-11-07 2019-05-15 고려대학교 산학협력단 Device and method for providing nationality information of user name using neural networks
CN108197087A (en) * 2018-01-18 2018-06-22 北京奇安信科技有限公司 Character code recognition methods and device
CN108764268A (en) * 2018-04-02 2018-11-06 华南理工大学 A kind of multi-modal emotion identification method of picture and text based on deep learning
CN108898015A (en) * 2018-06-26 2018-11-27 暨南大学 Application layer dynamic intruding detection system and detection method based on artificial intelligence
CN109101552A (en) * 2018-07-10 2018-12-28 东南大学 A kind of fishing website URL detection method based on deep learning
CN109308494A (en) * 2018-09-27 2019-02-05 厦门服云信息科技有限公司 LSTM Recognition with Recurrent Neural Network model and network attack identification method based on this model
CN109522454A (en) * 2018-11-20 2019-03-26 四川长虹电器股份有限公司 The method for automatically generating web sample data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
方明 等: "一种新型智能僵尸粉甄别方法", 《计算机工程》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800053A (en) * 2021-01-05 2021-05-14 深圳索信达数据技术有限公司 Data model generation method, data model calling device, data model equipment and storage medium
CN112800053B (en) * 2021-01-05 2021-12-24 深圳索信达数据技术有限公司 Data model generation method, data model calling device, data model equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110083831B (en) Chinese named entity identification method based on BERT-BiGRU-CRF
CN109450845B (en) Detection method for generating malicious domain name based on deep neural network algorithm
CN109977416A (en) A kind of multi-level natural language anti-spam text method and system
CN108573047A (en) A kind of training method and device of Module of Automatic Chinese Documents Classification
CN110442707A (en) A kind of multi-tag file classification method based on seq2seq
CN112836496B (en) Text error correction method based on BERT and feedforward neural network
CN109670036B (en) Automatic news comment generation method and device
CN109413028A (en) SQL injection detection method based on convolutional neural networks algorithm
CN110533570A (en) A kind of general steganography method based on deep learning
CN111866004B (en) Security assessment method, apparatus, computer system, and medium
CN112149420A (en) Entity recognition model training method, threat information entity extraction method and device
CN114490953B (en) Method for training event extraction model, method, device and medium for extracting event
CN109993169A (en) One kind is based on character type method for recognizing verification code end to end
CN110263164A (en) A kind of Sentiment orientation analysis method based on Model Fusion
CN107992211A (en) A kind of Chinese character spelling wrong word correcting method based on CNN-LSTM
CN109325125B (en) Social network rumor detection method based on CNN optimization
CN114282527A (en) Multi-language text detection and correction method, system, electronic device and storage medium
CN111753290A (en) Software type detection method and related equipment
CN111104513A (en) Short text classification method for game platform user question-answer service
CN110009025A (en) A kind of semi-supervised additive noise self-encoding encoder for voice lie detection
CN106803091A (en) A kind of recognition methods of note denomination and system
CN114266254A (en) Text named entity recognition method and system
CN106600283A (en) Method and system for identifying the name nationalities as well as method and system for determining transaction risk
CN112364837A (en) Bill information identification method based on target detection and text identification
CN110135566A (en) Registration user name detection method based on bis- Classification Neural model of LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816

RJ01 Rejection of invention patent application after publication