CN109657231A - A kind of long SMS compressing method and system - Google Patents

A kind of long SMS compressing method and system Download PDF

Info

Publication number
CN109657231A
CN109657231A CN201811333876.2A CN201811333876A CN109657231A CN 109657231 A CN109657231 A CN 109657231A CN 201811333876 A CN201811333876 A CN 201811333876A CN 109657231 A CN109657231 A CN 109657231A
Authority
CN
China
Prior art keywords
feature
word
vocabulary
long sms
target signature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811333876.2A
Other languages
Chinese (zh)
Other versions
CN109657231B (en
Inventor
黄晓波
黄巨涛
林强
唐亮亮
陈守明
肖建毅
臧笑宇
王飞鸣
吴丽琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Information Center of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Information Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Information Center of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN201811333876.2A priority Critical patent/CN109657231B/en
Publication of CN109657231A publication Critical patent/CN109657231A/en
Application granted granted Critical
Publication of CN109657231B publication Critical patent/CN109657231B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

A kind of long SMS compressing method provided herein, comprising: word segmentation processing is carried out to long SMS using segmentation methods and obtains corresponding feature vocabulary and feature word space;Corresponding feature vector is obtained using feature vocabulary and feature word space;Feature vector substitution BP neural network is trained, output vector is obtained;According to output vector, the target signature word in feature vocabulary is replaced using the short word of equal value of simplifying in feature dictionary, forms target signature vocabulary;Target signature vocabulary is handled using segmentation methods corresponding algorithm for inversion, obtains target long SMS.As it can be seen that the Feature Words simplifying of equal value short word be substituted in long SMS of this method in feature dictionary, simplify long SMS, reduce the item number of the normal short message splitted into, and then save the cost.The application also provides a kind of long SMS simplified system, equipment and computer readable storage medium, all has above-mentioned beneficial effect.

Description

A kind of long SMS compressing method and system
Technical field
This application involves IT technical field of information communication, in particular to a kind of long SMS compressing method, system, equipment and meter Calculation machine readable storage medium storing program for executing.
Background technique
SMS platform is responsible for sending routine office work short message and business service short message, such as power off notifying, button electricity charge notice, typhoon Early warning etc..Daily short message traffic volume is larger, with the development of business, short message quantity forwarded is also being gradually increasing.
But standard short message agreement provide every short message send maximum length be 140 bytes, i.e., at most allow include 70 Chinese characters.And in actual use, the short message of transmission generally all contains hundreds of bytes, and some short message length even up to counts Thousand even a bytes up to ten thousand.Therefore, it when carrying out short message transmission, needs first to split into long SMS into normal short message and sends.With Family mobile phone assembles rule by long SMS and carries out short message merging, finally obtain a complete long SMS after receiving short message.Short message A kind of chargeable service provided by operator is provided, is to carry out charging by item, enterprise sends the expense that short message generally presses 5 points/item With being settled accounts, the short message amount that enterprise sends is more, and the short message for needing to pay is closed the account higher.
Therefore, how long SMS is simplified, reduces the item number of the normal short message splitted into, and then save the cost is ability Field technique personnel's technical issues that need to address.
Summary of the invention
The purpose of the application is to provide a kind of long SMS compressing method, system, equipment and computer readable storage medium, energy It is enough that long SMS is simplified, reduce the item number of the normal short message splitted into, and then save the cost.
In order to solve the above technical problems, the application provides a kind of long SMS compressing method, comprising:
Word segmentation processing is carried out to long SMS using segmentation methods and obtains corresponding feature vocabulary and feature word space;
Corresponding feature vector is obtained using the feature vocabulary and the feature word space;
Described eigenvector substitution BP neural network is trained, output vector is obtained;
According to the output vector, short word of equal value is simplified to the target spy in the feature vocabulary using in feature dictionary Sign word is replaced, and forms target signature vocabulary;
The target signature vocabulary is handled using the segmentation methods corresponding algorithm for inversion, obtains target length Letter.
Preferably, described that corresponding feature vocabulary and Feature Words are obtained to long SMS progress word segmentation processing using segmentation methods Space, comprising:
Word segmentation processing is carried out to the long SMS using the segmentation methods and obtains each Feature Words and corresponding Feature Words Space, each stop words and corresponding stop words space;
Each stop words and corresponding stop words space are filtered according to deactivated dictionary, obtains the feature vocabulary and institute State feature word space.
Preferably, after described eigenvector substitution BP neural network being trained, further includes:
The corresponding feature vocabulary of described eigenvector is saved into the feature dictionary using the BP neural network.
Preferably, it is described using in feature dictionary simplify short word of equal value to the target signature word in the feature vocabulary into Row replacement, comprising:
The mapping relations between short word of equal value and the target signature word are simplified described in establish in the feature dictionary;
The target signature word is replaced using the equivalence short word of simplifying.
The application also provides a kind of long SMS simplified system, comprising:
Word segmentation processing module, for using segmentation methods to long SMS carry out word segmentation processing obtain corresponding feature vocabulary and Feature word space;
Feature vector obtain module, for using the feature vocabulary and the feature word space obtain corresponding feature to Amount;
BP neural network training module is exported for described eigenvector substitution BP neural network to be trained Vector;
Target signature word replacement module, for simplifying short word of equal value using in feature dictionary according to the output vector Target signature word in the feature vocabulary is replaced, target signature vocabulary is formed;
Target signature vocabulary processing module, for utilizing the corresponding algorithm for inversion of the segmentation methods to the target signature word Table is handled, and target long SMS is obtained.
Preferably, the word segmentation processing module includes:
Word segmentation processing unit obtains each feature for carrying out word segmentation processing to the long SMS using the segmentation methods Word and corresponding feature word space, each stop words and corresponding stop words space;
Filter element, for obtaining institute according to the dictionary each stop words of filtering and corresponding stop words space is deactivated State feature vocabulary and the feature word space.
Preferably, the long SMS simplified system further include:
Memory module, for being saved the corresponding feature vocabulary of described eigenvector to described using the BP neural network In feature dictionary.
Preferably, the target signature word replacement module includes:
Mapping relations establish unit, special for simplifying short word of equal value and the target described in the foundation in the feature dictionary Levy the mapping relations between word;
Target signature word replacement unit is replaced the target signature word for simplifying short word of equal value described in utilization.
The application also provides a kind of equipment, comprising:
Memory and processor;Wherein, the memory is for storing computer program, and the processor is for executing institute The step of long SMS compressing method described above is realized when stating computer program.
The application also provides a kind of computer readable storage medium, and the computer-readable recording medium storage has computer The step of program, the computer program realizes long SMS compressing method described above when being executed by processor.
A kind of long SMS compressing method provided herein, comprising: long SMS is carried out at participle using segmentation methods Reason obtains corresponding feature vocabulary and feature word space;Corresponding spy is obtained using the feature vocabulary and the feature word space Levy vector;Described eigenvector substitution BP neural network is trained, output vector is obtained;According to the output vector, benefit The target signature word in the feature vocabulary is replaced with the short word of equal value of simplifying in feature dictionary, forms target signature word Table;The target signature vocabulary is handled using the segmentation methods corresponding algorithm for inversion, obtains target long SMS.
This method is obtaining corresponding feature vocabulary and Feature Words sky to long SMS progress word segmentation processing using segmentation methods Between after, corresponding feature vector is obtained using the feature vocabulary and the feature word space, then by described eigenvector generation Enter BP neural network to be trained to obtain output vector, further according to the output vector, simplifies equivalence using in feature dictionary Short word is replaced the target signature word in the feature vocabulary, forms target signature vocabulary, is finally calculated using the participle The corresponding algorithm for inversion of method handles the target signature vocabulary, obtains target long SMS.As it can be seen that this method feature dictionary In the Feature Words simplifying short word of equal value and being substituted in long SMS, long SMS is simplified, reduce split into it is common short The item number of letter, and then save the cost.The application also provides a kind of long SMS simplified system, equipment and computer-readable storage medium Matter all has above-mentioned beneficial effect, and details are not described herein.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of application for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of long SMS compressing method provided by the embodiment of the present application;
Fig. 2 is the topological structure schematic diagram of BP neural network provided by the embodiment of the present application;
Fig. 3 is BP neural network learning error curve provided by the embodiment of the present application;
Fig. 4 is a kind of structural block diagram of long SMS simplified system provided by the embodiment of the present application.
Specific embodiment
The core of the application is to provide a kind of long SMS compressing method, can simplify to long SMS, reduces splitting into The item number of normal short message, and then save the cost.Another core of the application is to provide a kind of long SMS simplified system, equipment and meter Calculation machine readable storage medium storing program for executing.
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
With the development of business, short message quantity forwarded is also being gradually increasing, but standard short message agreement provides every short message The maximum length of transmission is 140 bytes, i.e., at most allows comprising 70 Chinese characters.Therefore, when carrying out short message transmission, need by Long SMS first splits into a plurality of normal short message and is sent.Moreover, operator is to carry out charging by item, it is general that enterprise sends short message It is settled accounts by the expense of 5 points/item, the short message amount that enterprise sends is more, and the short message for needing to pay is closed the account higher.The application Embodiment can simplify long SMS, reduce the item number of the normal short message splitted into, and then save the cost.Specifically please refer to figure 1, Fig. 1 is a kind of flow chart of long SMS compressing method provided by the embodiment of the present application, which specifically wraps It includes:
S101, corresponding feature vocabulary and feature word space are obtained to long SMS progress word segmentation processing using segmentation methods;
The embodiment of the present application before this using segmentation methods to long SMS carry out word segmentation processing obtain corresponding feature vocabulary and Feature word space, segmentation methods are the most mature parts of neural voice program (NLP), common for the industry overwhelming majority to ask Topic all segmentation methods can be used to solve, and at most only need to optimize dictionary.Type of segmentation methods is not especially limited at this, Corresponding setting should be made according to the actual situation by those skilled in the art, such as can be positive segmentation methods, can also be inverse To segmentation methods.The processing result that segmentation methods carry out word segmentation processing to long SMS is to obtain corresponding feature vocabulary and feature Word space is not especially limited word segmentation processing process.There are one or more features word, the tool of Feature Words in this feature vocabulary Body quantity is not limited thereto, and corresponding setting should be made according to the actual situation by those skilled in the art.Further, if it is special There are multiple Feature Words in sign vocabulary, then each Feature Words are all different.Usually long SMS is divided using reverse segmentation methods Word handles to obtain corresponding feature vocabulary and feature word space, the thought one of the thought of reverse segmentation methods and positive segmentation methods It causes, difference is the cutting since the end of article or sentence, and a word of foremost is subtracted if unsuccessful.Such as word Symbol string " failure that handling machine occurs ", method of reverse word segmentation result are as follows: failure, generation, machine, processing.
It further, is the speed for promoting reverse segmentation methods, the embodiment of the present application generallys use BM (Boyer-Moore) Algorithm carries out substring lookup, and in the algorithm for searching substring, BM algorithm is that current calculated performance is higher One kind, 3-5 times faster than KMP algorithm under normal circumstances.BM algorithm is from left to right, and to be compared when Move Mode string Compared with when be from right to left.BM algorithm actually includes two parallel algorithms, and bad char's algorithm is become reconciled suffix algorithm.This The purpose of two kinds of algorithms is exactly that model string is allowed to move right every time distance as big as possible.
S102, corresponding feature vector is obtained using feature vocabulary and feature word space;
The embodiment of the present application is obtaining corresponding feature vocabulary and spy to long SMS progress word segmentation processing using segmentation methods After levying word space, corresponding feature vector is obtained using feature vocabulary and feature word space.This feature vector is that a two dimension is empty Between vector, this feature vector is usually set as Vk, each feature vector uses vk1, vk2, vk3 respectively ... ..., vkn indicate.Its In, k is note numbering, and n is characterized the quantity of Feature Words in vocabulary.Because the quantity of quantity and Feature Words to short message is not made Limit, thus the range of k and n be [1, ∞).
S103, feature vector substitution BP neural network is trained, obtains output vector;
The embodiment of the present application is after obtaining corresponding feature vector using feature vocabulary and feature word space, by feature vector It substitutes into BP neural network to be trained, obtains output vector.Because feature vector is a two-dimensional space vector, export to Amount is also a two-dimensional space vector.The output vector is usually set as Ok, output vector uses ok1, ok2, ok3 respectively ... ..., Okn is indicated.Wherein, k is note numbering, and n is characterized the quantity of Feature Words in vocabulary.Because quantity and Feature Words to short message Quantity is not construed as limiting, thus the range of k and n be [1, ∞).The output vector for simplifying short word replacement of equal value will be can be used to set It is set to 1, remaining output vector is set as 0.
The topological structure of BP neural network includes input layer (input), hidden layer (hide layer) in the embodiment of the present application With output layer (output layer), as shown in Fig. 2, Fig. 2 be the embodiment of the present application provided by BP neural network topology knot Structure schematic diagram, x0, x1 in figure, x2 ..., xn respectively indicate each feature vector, Vn and Wn belong to hidden layer, and y0 indicates defeated Outgoing vector.The input layer number and output layer neuron number of BP neural network are set according to the specific requirement of application Fixed, input layer number depends on the dimension of data.When the embodiment of the present application carries out simplifying optimization to long SMS, by long SMS Input of the corresponding feature vector of the Feature Words for including as neural network, so the input layer number of BP neural network Equal to the dimension of feature vector.For example, input is selected as 500 × 1 matrix, i.e. input number of nodes is 500, is equal to input mind Through first number.When long SMS some Feature Words can find equivalencing word, input corresponding in output neuron after neural network It is 1 on position, 0 is exported if it can not find.
It is generally believed that network error can be reduced, improve precision by increasing hidden layer number, but also complicate network, to increase The training time of network and the tendency of appearance " over-fitting " are added.The implicit number of plies is more, and neural network learning speed is slower, Hornik etc. is early proved: if input layer and output layer are using linear transfer function, hidden layer uses Sigmoid transfer function, Under the conditions of reasonable structure and appropriate weight, the MLP network containing a hidden layer can approach any reasonable letter with arbitrary accuracy Number, this is an existence conclusion.It can refer to this point when designing BP neural network, 3 layers of BP neural network should be paid the utmost attention to (having 1 hidden layer).Generally, lower error is obtained by increasing the number of hidden nodes, training effect is than increasing hidden layer Number is easier to realize.
In BP neural network, the selection of the number of hidden nodes is extremely important, it is not only to the neural network model of foundation Performance influences immediate cause that is very big, and occurring " over-fitting " when being trained, but theoretically there are no a kind of science at present And universal determination method.The calculation formula of the determination the number of hidden nodes proposed in most literature at present is both for training sample This any more situation, and majority is to be directed to worst situation, is difficult to meet in common engineering practice, should not use.Thing In reality, the number of hidden nodes that various calculation formula obtain has even tens times of several times of phase difference.When avoiding training as far as possible Existing " over-fitting " phenomenon, guarantees sufficiently high network performance and generalization ability, determines that the most basic principle of the number of hidden nodes is: Structure as compact as possible is taken under the premise of meeting required precision, that is, exhausts the number of hidden nodes that may lack.Studies have shown that hidden layer Number of nodes is not only related with Shu Ru, the number of nodes of output layer, the type more with the complexity of problem to be solved and transfer function The factors such as the characteristic of formula and sample data are related.
It must satisfy following condition when determining the number of hidden nodes:
1. the number of hidden nodes is necessarily less than N-1 (wherein N is number of training), otherwise the systematic error and instruction of network model The characteristic of white silk sample is unrelated and goes to zero, that is, the network model established does not have generalization ability, without any practical value yet.
2. number of training must be more than the connection flexible strategy of network model, generally 2-10 times, otherwise, sample be must be separated into Several parts simultaneously are likely to obtain reliable neural network model using the method for " trained in turn ".
In short, network cannot may train or network performance is very poor if the number of hidden nodes is very little at all;If the number of hidden nodes Too much, although the systematic error of network can be made to reduce, on the one hand extend net training time, on the other hand, training is easy It falls into local minimum point and cannot get optimum point, and occur the immanent cause of " over-fitting " when training.
The process of the BP algorithm of BP neural network is illustrated below:
(1), it initializes: the initial value of each weight and threshold value is set;Random number is set, the random number of 0-1 is generally taken;
(2), input sample and desired output: training sample and target output are provided, (3)-(5) are carried out to each sample Step;
(3), each layer input is calculated;
(4), training error is calculated;
(5), weight and threshold value are corrected;
(6), calculation of performance indicators: when sample all in sample set all experienced (3)-(5) step, that is, an instruction is completed Practice the period, calculates error criterion (as mean square error);
(7) if, error criterion meet required precision, terminate;If error criterion is unsatisfactory for required precision, return to (2), continue next cycle of training until meeting required precision.
The training process of BP neural network is the basis that long SMS simplifies optimization, is directly related to the height of optimization rate.It is defeated Send training sample to BP neural network training, adjusting weight repeatedly on gradient direction keeps network error of sum square minimum.To make Network has certain robustness to input vector, can be first trained with muting sample to network, until its error of sum square Minimum, then be trained with the sample of Noise, guarantee network to insensitive for noise.Training finishes, and test long SMS is inputted It is tested in BP neural network.For example, 500 × 1 Boolean matrix are built into after long SMS participle, with this 500 elements Form the column matrix of a Feature Words, the i.e. feature vector of Feature Words.It is defeated at one 500 × 1 by 500 feature Column vector groups Enter vector, be denoted as: sample_group=[0,1,2 ..., 500], 0,1 in formula ..., 500 represent the feature of short message Feature Words Column vector.After target vector corresponding with input vector is desirable to each Feature Words input neural network, if the specific word It can optimize, then be 1 on the corresponding position of output neuron, be otherwise 0.Take thus target vector be 1 on diagonal line 500 × 1 unit matrix, with matlab order realize are as follows: targets=eye (500,1).
The target capabilities function that the training of the embodiment of the present application BP neural network uses is SSE, the setting of error performance target value It is 0.01, such as when the error sum of squares SSE that neural metwork training number reaches maximum value 500 or neural network drops to 0.01 Hereinafter, terminating training.It is used to train network by training sample and establishes long SMS Optimized model, test sample is used to analyze net The nicety of grading and generalization ability of network model.The embodiment of the present application chooses 100 long SMSs as train samples number According to, take the number of hidden nodes n=1 respectively, 5,15,20,25,30,35,40,45,50,51,52,53,54, to BP neural network into Row training, adjustment and optimization neuron threshold value and connection weight.When n=54, the mean square deviation situation of change of BP neural network training As shown in figure 3, Fig. 3 is BP neural network learning error curve provided by the embodiment of the present application.It can be seen in figure 3 that Variance is gradually reduced with the increase of frequency of training, and when frequency of training is 54 times, mean square deviation has been substantially achieved surely Definite value.
S104, according to output vector, simplify short word of equal value to the target signature in feature vocabulary using in feature dictionary Word is replaced, and forms target signature vocabulary;
The embodiment of the present application is trained by feature vector substitution BP neural network, after obtaining output vector, according to defeated Outgoing vector is replaced the target signature word in feature vocabulary using the short word of equal value of simplifying in feature dictionary, forms target Feature vocabulary.Using the process simplifying short word of equal value and being replaced to the target signature word in feature vocabulary in feature dictionary, It generally includes: establishing the mapping relations simplified between short word and target signature word of equal value in feature dictionary;Using simplifying equivalence Short word is replaced target signature word.From the foregoing, it can be understood that the output vector for simplifying short word replacement of equal value will be can be used to be set as 1, remaining output vector is set as 0, and it is target signature word that output vector, which is set as 1 corresponding Feature Words,.For target spy The quantity of sign word is not specifically limited herein, and corresponding setting should be made according to the actual situation by those skilled in the art.Feature Dictionary is the basis that long SMS simplifies optimization, and long SMS, which simplifies optimization, to be realized using trial and error procedure, will include in short message Long word, SBC case, the short word of equivalence that use of numerals is simplified, half-angle meet, and arithemetic unit variation etc. is replaced.Meanwhile it examining Consider character code problem, even an identical word, using different coded formats, finally obtains word conversion Byte number is also different.As China " in ", using GBK coding account for 2 bytes, using UTF8 coding account for 3 bytes, use UTF16 coding accounts for 4 bytes.Therefore, it also needs to save various Feature Words in feature dictionary to use shared by various coded formats Byte number.Moreover, feature vector substitution BP neural network after library is built in initialization, is especially trained it by feature dictionary Afterwards, it also typically includes: the corresponding feature vocabulary of feature vector being saved into feature dictionary using BP neural network, that is, passes through BP Neural network learning is enriched constantly the content of feature dictionary, to promote accuracy rate and reliability that long SMS simplifies optimization.
For example, with 100 training samples to and 100 test samples to trained BP network carry out long SMS optimization grind Study carefully, the optimization rate of training sample is 100% (such as table 1, table 1 are training sample experimental result), and the optimization rate of test sample is 88% (such as table 2, table 2 are test sample experimental result).Because established BP neural network has carried out certainly training sample Study, therefore BP network model is higher to the optimization of training sample.Test optimization failure short message is analyzed, the spy established is primarily due to It is not abundant enough to levy dictionary, some Feature Words are not brought into feature dictionary, therefore partial test sample is caused not to be optimized to Function.The problem can enhance the optimization rate of long SMS by feature-rich dictionary.
1 training sample experimental result of table
2 test sample experimental result of table
S105, target signature vocabulary is handled using segmentation methods corresponding algorithm for inversion, obtains target long SMS.
The embodiment of the present application simplifies short word of equal value in feature vocabulary using in feature dictionary according to output vector Target signature word is replaced, after forming target signature vocabulary, using the corresponding algorithm for inversion of segmentation methods to target signature vocabulary It is handled, obtains target long SMS.From the foregoing, it can be understood that the embodiment of the present application is not especially limited the type of segmentation methods, So be also not especially limited to the type of algorithm for inversion herein, but it need to guarantee that the algorithm for inversion is that segmentation methods above are corresponding inverse Algorithm.There are several Feature Words in target signature vocabulary, therefore needs to utilize the corresponding algorithm for inversion of segmentation methods by several features Word is stitched together, and obtains target long SMS.
The embodiment of the present application is obtaining corresponding feature vocabulary and spy to long SMS progress word segmentation processing using segmentation methods After levying word space, corresponding feature vector is obtained using feature vocabulary and feature word space, feature vector is then substituted into BP mind It is trained to obtain output vector through network, further according to output vector, simplifies short word of equal value to feature using in feature dictionary Target signature word in vocabulary is replaced, and forms target signature vocabulary, finally using the corresponding algorithm for inversion of segmentation methods to mesh Mark feature vocabulary is handled, and target long SMS is obtained.As it can be seen that this method is substituted with the short word of equal value of simplifying in feature dictionary Feature Words in long SMS, simplify long SMS, reduce the item number of the normal short message splitted into, and then save the cost. Moreover, the embodiment of the present application due to use BP neural network, therefore calculating speed faster, accuracy it is higher.
It is simplified in optimization in long SMS, the selection of feature is a critical issue.Recognizer is according to selected characteristic Come what is identified.Whether selected feature is stable, represents whether long SMS can optimize, and is the pass of system optimization rate height Key.From the viewpoint of statistical-simulation spectrometry, it is actually a pattern recognition problem that long SMS, which simplifies optimization, and people is to natural object The identification of body is built upon and is learnt to the object, on the basis of signature analysis, the process of CRT technology and people Identification process have similar place.It is actually a kind of by study or other methods, form a memory knowledge Library, carry out pattern-recognition when, clearly give expression to it is a kind of from object to memory knowledge base image, to obtain recognition result. Therefore, the embodiment of the present application is simplified optimization to long SMS and is realized according to the optimal way of short message Feature Words, and long SMS is simplified excellent Changing reduces SMS platform operation cost.
Based on the above embodiment, word segmentation processing is carried out to long SMS using segmentation methods in the present embodiment and obtains corresponding spy Levy vocabulary and feature word space, generally include: using segmentation methods to long SMS carry out word segmentation processing obtain each Feature Words and Corresponding feature word space, each stop words and corresponding stop words space;Each stop words and right is filtered according to deactivated dictionary The stop words space answered, obtains feature vocabulary and feature word space.There are some word frequency of occurrence most in short message, but for Optimize long SMS length again without what help, as " ", "Yes", " " this kind of word, this kind of word is stop words.Therefore, This kind of word, which is filtered out, can reduce the interference that unrelated word simplifies optimization to long SMS, and enhancing long SMS simplifies the accurate of optimization Rate improves long SMS and simplifies optimal speed.Therefore after participle, generally according to deactivating in deactivated dictionary filtering characteristic vocabulary Word.
Below to a kind of long SMS simplified system provided by the embodiments of the present application, equipment and computer readable storage medium into Row is introduced, long SMS simplified system, equipment and computer readable storage medium described below and above-described long SMS essence Simple method can correspond to each other reference.
Referring to FIG. 4, Fig. 4 is a kind of structural block diagram of long SMS simplified system provided by the embodiment of the present application;The length Short message simplified system includes:
Word segmentation processing module 401 obtains corresponding Feature Words for carrying out word segmentation processing to long SMS using segmentation methods Table and feature word space;
Feature vector obtains module 402, for obtaining corresponding feature vector using feature vocabulary and feature word space;
BP neural network training module 403, for feature vector substitution BP neural network to be trained, obtain exporting to Amount;
Target signature word replacement module 404, for simplifying short word pair of equal value using in feature dictionary according to output vector Target signature word in feature vocabulary is replaced, and forms target signature vocabulary;
Target signature vocabulary processing module 405, for using the corresponding algorithm for inversion of segmentation methods to target signature vocabulary into Row processing, obtains target long SMS.
Based on the above embodiment, word segmentation processing module 401 generally includes in the present embodiment:
Word segmentation processing unit obtains each Feature Words and correspondence for carrying out word segmentation processing to long SMS using segmentation methods Feature word space, each stop words and corresponding stop words space;
Filter element, for obtaining Feature Words according to the dictionary each stop words of filtering and corresponding stop words space is deactivated Table and feature word space.
Based on the above embodiment, the long SMS simplified system also typically includes in the present embodiment:
Memory module, for being saved the corresponding feature vocabulary of feature vector into feature dictionary using BP neural network.
Based on the above embodiment, target signature word replacement module 404 generally includes in the present embodiment:
Mapping relations establish unit, for simplifying reflecting between short word and target signature word of equal value in foundation in feature dictionary Penetrate relationship;
Target signature word replacement unit is simplified short word of equal value for utilization and is replaced to target signature word.
The application also provides a kind of equipment, comprising: memory and processor;Wherein, memory is for storing computer journey The step of sequence, processor is for realizing the long SMS compressing method of above-mentioned any embodiment when executing computer program.
The application also provides a kind of computer readable storage medium, and computer-readable recording medium storage has computer journey Sequence, the step of long SMS compressing method of above-mentioned any embodiment is realized when computer program is executed by processor.
The computer readable storage medium may include: USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. is various to deposit Store up the medium of program code.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For embodiment provide system and Speech, since it is corresponding with the method that embodiment provides, so being described relatively simple, related place is referring to method part illustration ?.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
Above to a kind of long SMS compressing method, long SMS simplified system, equipment and computer provided herein Readable storage medium storing program for executing is described in detail.Specific case used herein carries out the principle and embodiment of the application It illustrates, the description of the example is only used to help understand the method for the present application and its core ideas.It should be pointed out that for this For the those of ordinary skill of technical field, under the premise of not departing from the application principle, the application can also be carried out several Improvement and modification, these improvement and modification are also fallen into the protection scope of the claim of this application.

Claims (10)

1. a kind of long SMS compressing method characterized by comprising
Word segmentation processing is carried out to long SMS using segmentation methods and obtains corresponding feature vocabulary and feature word space;
Corresponding feature vector is obtained using the feature vocabulary and the feature word space;
Described eigenvector substitution BP neural network is trained, output vector is obtained;
According to the output vector, short word of equal value is simplified to the target signature word in the feature vocabulary using in feature dictionary It is replaced, forms target signature vocabulary;
The target signature vocabulary is handled using the segmentation methods corresponding algorithm for inversion, obtains target long SMS.
2. long SMS compressing method according to claim 1, which is characterized in that it is described using segmentation methods to long SMS into Row word segmentation processing obtains corresponding feature vocabulary and feature word space, comprising:
Using the segmentation methods to the long SMS carry out word segmentation processing obtain each Feature Words and corresponding feature word space, Each stop words and corresponding stop words space;
Each stop words and corresponding stop words space are filtered according to deactivated dictionary, obtains the feature vocabulary and the spy Levy word space.
3. long SMS compressing method according to claim 1, which is characterized in that described eigenvector is substituted into BP nerve net After network is trained, further includes:
The corresponding feature vocabulary of described eigenvector is saved into the feature dictionary using the BP neural network.
4. long SMS compressing method according to claim 1, which is characterized in that described to utilize simplifying in feature dictionary The short word of valence is replaced the target signature word in the feature vocabulary, comprising:
The mapping relations between short word of equal value and the target signature word are simplified described in establish in the feature dictionary;
The target signature word is replaced using the equivalence short word of simplifying.
5. a kind of long SMS simplified system characterized by comprising
Word segmentation processing module obtains corresponding feature vocabulary and feature for carrying out word segmentation processing to long SMS using segmentation methods Word space;
Feature vector obtains module, for obtaining corresponding feature vector using the feature vocabulary and the feature word space;
BP neural network training module obtains output vector for described eigenvector substitution BP neural network to be trained;
Target signature word replacement module, for simplifying short word of equal value to institute using in feature dictionary according to the output vector The target signature word stated in feature vocabulary is replaced, and forms target signature vocabulary;
Target signature vocabulary processing module, for using the corresponding algorithm for inversion of the segmentation methods to the target signature vocabulary into Row processing, obtains target long SMS.
6. long SMS simplified system according to claim 5, which is characterized in that the word segmentation processing module includes:
Word segmentation processing unit, for using the segmentation methods to the long SMS carry out word segmentation processing obtain each Feature Words and Corresponding feature word space, each stop words and corresponding stop words space;
Filter element, for obtaining the spy according to the dictionary each stop words of filtering and corresponding stop words space is deactivated Levy vocabulary and the feature word space.
7. long SMS simplified system according to claim 5, which is characterized in that further include:
Memory module, for being saved the corresponding feature vocabulary of described eigenvector to the feature using the BP neural network In dictionary.
8. long SMS simplified system according to claim 5, which is characterized in that the target signature word replacement module packet It includes:
Mapping relations establish unit, for simplifying short word of equal value and the target signature word described in the foundation in the feature dictionary Between mapping relations;
Target signature word replacement unit is replaced the target signature word for simplifying short word of equal value described in utilization.
9. a kind of equipment characterized by comprising
Memory and processor;Wherein, the memory is for storing computer program, the processor by execute it is described based on The step of long SMS compressing methods as described in any item such as Claims 1-4 are realized when calculation machine program.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence is realized when the computer program is executed by processor such as the described in any item long SMS compressing methods of Claims 1-4 Step.
CN201811333876.2A 2018-11-09 2018-11-09 Long short message simplifying method and system Active CN109657231B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811333876.2A CN109657231B (en) 2018-11-09 2018-11-09 Long short message simplifying method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811333876.2A CN109657231B (en) 2018-11-09 2018-11-09 Long short message simplifying method and system

Publications (2)

Publication Number Publication Date
CN109657231A true CN109657231A (en) 2019-04-19
CN109657231B CN109657231B (en) 2023-04-07

Family

ID=66110781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811333876.2A Active CN109657231B (en) 2018-11-09 2018-11-09 Long short message simplifying method and system

Country Status (1)

Country Link
CN (1) CN109657231B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930561A (en) * 2010-05-21 2010-12-29 电子科技大学 N-Gram participle model-based reverse neural network junk mail filter device
CN106803096A (en) * 2016-12-27 2017-06-06 上海大汉三通通信股份有限公司 A kind of short message type recognition methods, system and short message managing platform
CN107491434A (en) * 2017-08-10 2017-12-19 北京邮电大学 Text snippet automatic generation method and device based on semantic dependency
CN108062302A (en) * 2016-11-08 2018-05-22 北京国双科技有限公司 A kind of recognition methods of particular text information and device
CN108509409A (en) * 2017-02-27 2018-09-07 芋头科技(杭州)有限公司 A method of automatically generating semantic similarity sentence sample
CN108763191A (en) * 2018-04-16 2018-11-06 华南师范大学 A kind of text snippet generation method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930561A (en) * 2010-05-21 2010-12-29 电子科技大学 N-Gram participle model-based reverse neural network junk mail filter device
CN108062302A (en) * 2016-11-08 2018-05-22 北京国双科技有限公司 A kind of recognition methods of particular text information and device
CN106803096A (en) * 2016-12-27 2017-06-06 上海大汉三通通信股份有限公司 A kind of short message type recognition methods, system and short message managing platform
CN108509409A (en) * 2017-02-27 2018-09-07 芋头科技(杭州)有限公司 A method of automatically generating semantic similarity sentence sample
CN107491434A (en) * 2017-08-10 2017-12-19 北京邮电大学 Text snippet automatic generation method and device based on semantic dependency
CN108763191A (en) * 2018-04-16 2018-11-06 华南师范大学 A kind of text snippet generation method and system

Also Published As

Publication number Publication date
CN109657231B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
EP3451165B1 (en) Neural network operation device and method supporting few-bit floating-point number
CN110070359A (en) Verification of data system, method, calculating equipment and storage medium based on block chain
CN112035841B (en) Intelligent contract vulnerability detection method based on expert rules and serialization modeling
CN107885760A (en) It is a kind of to represent learning method based on a variety of semantic knowledge mappings
CN109033129A (en) Multi-source Information Fusion knowledge mapping based on adaptive weighting indicates learning method
CN103514201A (en) Method and device for querying data in non-relational database
CN109815336A (en) A kind of text polymerization and system
CN110503136A (en) Platform area line loss exception analysis method, computer readable storage medium and terminal device
CN109582774A (en) Natural language classification method, device, equipment and storage medium
EP4113316A2 (en) Method and apparatus for processing table, device, and storage medium
CN114817553A (en) Knowledge graph construction method, knowledge graph construction system and computing equipment
CN112860918A (en) Sequential knowledge graph representation learning method based on collaborative evolution modeling
CN112949748A (en) Dynamic network anomaly detection algorithm model based on graph neural network
EP3945431A1 (en) Bridge from natural language processing engine to database engine
CN110046344A (en) Add the method and terminal device of separator
CN109344294A (en) Feature generation method, device, electronic equipment and computer readable storage medium
CN109657231A (en) A kind of long SMS compressing method and system
CN108427773B (en) Distributed knowledge graph embedding method
CN116127974A (en) Radar-oriented entity relationship joint extraction method
CN108762523A (en) Output characters through input method prediction technique based on capsule networks
CN116502132A (en) Account set identification method, device, equipment, medium and computer program product
CN113626826A (en) Intelligent contract security detection method, system, equipment, terminal and application
CN115587187A (en) Knowledge graph complementing method based on small sample
CN109739980A (en) Emotion classifiers are carried out with the method, apparatus and terminal of tuning
CN108733636A (en) The method and apparatus that multi-component system is extracted in word

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant