CN110610213A

CN110610213A - Mail classification method, device, equipment and computer readable storage medium

Info

Publication number: CN110610213A
Application number: CN201910893789.0A
Authority: CN
Inventors: 张莉; 郑晓晗; 周伟达; 王邦军
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2019-09-20
Filing date: 2019-09-20
Publication date: 2019-12-24
Also published as: WO2021051764A1

Abstract

The invention discloses a mail classification method, which comprises the following steps: receiving mail data; processing the mail data by using a predetermined linear discriminant function to obtain a discriminant function value; the discrimination parameters in the linear discrimination function are: analyzing the training set by a twin support vector machine classification algorithm based on an L1 norm in advance to obtain the training set; and classifying the mail data by using a classification rule and a discrimination function value. Therefore, in the scheme, when mail data is classified through a linear discriminant function, discriminant parameters in the linear discriminant function need to be obtained by analyzing a training set through a twin support vector machine classification algorithm based on an L1 norm in advance, and the influence of characteristics with small contribution degree on a classification result can be reduced through the discriminant parameters, so that the classification efficiency and the generalization performance are improved, and the accuracy of filtering junk mails is improved; the invention also discloses a mail classification device, equipment and a computer readable storage medium, which can also realize the technical effects.

Description

Mail classification method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for classifying mails.

Background

The hazard of the junk mails is very large, the junk mails occupy network bandwidth, and the operation efficiency of the whole network is reduced; the network is easy to be used by hackers, and network congestion and even paralysis are caused; spam is also easily utilized by lawbreakers, propagates bad information, and so on. In order to maintain the healthy and safe development of the internet, a safer and more effective spam filtering technology is urgently needed.

Jayadeva et al currently proposes to handle filtering of spam through a Twin Support Vector Machine (TSVM). For the two classification problem, the TSVM seeks two non-parallel planes so that the two types of samples are as close as possible to one plane and away from the other. However, the model constructed by the algorithm is not necessarily sparse, that is, when mail classification is performed through the model, unimportant features in the mail sample can be concerned, so that the generalization performance of the classifier is reduced, and the accuracy of spam filtering is reduced. Therefore, how to improve the accuracy of spam filtering is a problem to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a mail classification method, a mail classification device, mail classification equipment and a computer readable storage medium, so as to realize accurate recognition of junk mails.

In order to achieve the above object, the present invention provides a mail classification method, comprising:

receiving mail data to be classified;

processing the mail data by using a predetermined linear discriminant function to obtain a discriminant function value; wherein, the discrimination parameters in the linear discrimination function are: analyzing the training set by a twin support vector machine classification algorithm based on an L1 norm in advance to obtain the training set; the training data comprises mail training data of different categories;

and classifying the mail data by using a preset classification rule and the discrimination function value.

Optionally, the method for generating the discriminant parameter in the linear discriminant function includes:

acquiring a training set; determining a discrimination parameter in the linear discrimination function by using the training set and a preset condition;

the preset conditions include:

s.t.-(X₂w₁+e₂b₁)+ξ₂≥e₂,ξ₂≥0

s.t.(X₁w₂+e₁b₂)+ξ₁≥e₁,ξ₁≥0

wherein, w₁Is a first weight vector, w, in the discriminating parameter₂Is a second weight vector in the discriminating parameter, b₁As a first function deviation coefficient in said discrimination parameter, b₂As a deviation factor, ξ, of a second function of said criterion parameter₁Is the first relaxation variable, ξ₂Is the second relaxation variable, X₁Feature matrices, X, for the non-spam data in the training set₂A feature matrix for the spam data in the training set, e₁First vector of all 1, e₂A second vector of all 1, | |. the non-woven phosphor₁Is L1 norm, C₁As a predetermined first auxiliary variable, C₂As a predetermined second auxiliary variable, C₃As a predetermined third auxiliary variable, C₄Is a predetermined fourth auxiliary variable.

Optionally, the processing the mail data by using a predetermined linear discriminant function to obtain a discriminant function value includes:

obtaining a first discrimination function value f by using the first linear discrimination function and the mail data x₁(x)；

Obtaining a second discrimination function value f by using a second linear discrimination function and the mail data x₂(x)；

Wherein the first linearityThe discriminant function is: f. of₁(x)＝x^Tw₁+b₁The second linear discriminant function is: f. of₂(x)＝x^Tw₂+b₂。

Optionally, the classifying the mail data by using a preset classification rule and the discrimination function value includes:

using a predetermined classification rule, the first discrimination function value f₁(x) The second discrimination function value f₂(x) Obtaining a classification result of the mail data;

the classification rule is as follows:

wherein, if the classification result is obtainedIf the mail is 1, judging the mail to be non-junk mail, and if the mail is classified, judging the mail to be non-junk mailAnd if the mail is-1, judging that the mail is a junk mail.

To achieve the above object, the present invention further provides a mail sorting apparatus comprising:

the data receiving module is used for receiving the mail data to be classified;

the data processing module is used for processing the mail data by utilizing a predetermined linear discriminant function to obtain a discriminant function value; wherein, the discrimination parameters in the linear discrimination function are: analyzing the training set by a twin support vector machine classification algorithm based on an L1 norm in advance to obtain the training set; the training data comprises mail training data of different categories;

and the data classification device is used for classifying the mail data by utilizing a preset classification rule and the judgment function value.

Optionally, the apparatus further includes a discrimination parameter generation module; wherein, the other parameter generation module comprises:

a training set acquisition unit for acquiring a training set;

a discrimination parameter determining unit, configured to determine a discrimination parameter in the linear discrimination function by using the training set and a preset condition; the preset conditions include:

s.t.-(X₂w₁+e₂b₁)+ξ₂≥e₂,ξ₂≥0

s.t.(X₁w₂+e₁b₂)+ξ₁≥e₁,ξ₁≥0

Optionally, the data processing module includes:

a first processing unit for obtaining a first discrimination function value f by using the first linear discrimination function and the mail data x₁(x)；

Second oneA processing unit for obtaining a second discrimination function value f by using the second linear discrimination function and the mail data x₂(x) (ii) a Wherein the first linear discriminant function is: f. of₁(x)＝x^Tw₁+b₁The second linear discriminant function is: f. of₂(x)＝x^Tw₂+b₂。

Optionally, the data classification device is specifically configured to: using a predetermined classification rule, the first discrimination function value f₁(x) The second discrimination function value f₂(x) Obtaining a classification result of the mail data;

the classification rule is as follows:

a memory for storing a computer program;

a processor for implementing the steps of the above mail classification method when executing the computer program.

To achieve the above object, the present invention further provides a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the above mail sorting method.

According to the scheme, the mail classification method provided by the embodiment of the invention comprises the following steps: receiving mail data to be classified; processing the mail data by using a predetermined linear discriminant function to obtain a discriminant function value; wherein, the discrimination parameters in the linear discrimination function are: analyzing the training set by a twin support vector machine classification algorithm based on an L1 norm in advance to obtain the training set; the training data comprises mail training data of different categories; and classifying the mail data by using a preset classification rule and the discrimination function value.

Therefore, in the scheme, when mail data is classified through a linear discriminant function, discriminant parameters in the linear discriminant function need to be obtained by analyzing a training set through a twin support vector machine classification algorithm based on an L1 norm in advance, and the influence of characteristics with small contribution degree on a classification result can be reduced through the discriminant parameters, so that the classification efficiency and the generalization performance are improved, and the accuracy of filtering junk mails is improved; the invention also discloses a mail classification device, equipment and a computer readable storage medium, which can also realize the technical effects.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a mail classification method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an apparatus for classifying mails according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an email sorting apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention discloses a mail classification method, a device, equipment and a computer readable storage medium, which are used for realizing accurate identification of junk mails.

Referring to fig. 1, a mail classification method provided in an embodiment of the present invention includes:

s101, receiving mail data to be classified;

in this embodiment, for classified mail data, it is first necessary to perform normalization processing on input mail data x, and normalize the feature of the mail data x in the interval [0,1 ]. In this embodiment, the classification of the mail data can be classified into spam mail and non-spam mail, so that the classification of the mail in the present application can also be understood as recognition of spam mail.

S102, processing the mail data by using a predetermined linear discriminant function to obtain a discriminant function value; wherein, the discrimination parameters in the linear discrimination function are: analyzing the training set by a twin support vector machine classification algorithm based on an L1 norm in advance to obtain the training set; the training data comprises mail training data of different categories;

the method for generating the discriminant parameters in the linear discriminant function comprises the following steps:

the preset conditions include:

s.t.-(X₂w₁+e₂b₁)+ξ₂≥e₂,ξ₂≥0

s.t.(X₁w₂+e₁b₂)+ξ₁≥e₁,ξ₁≥0

In this embodiment, the samples need to be normalized to obtain a training set, the training set is used to train the model, and the trained model is used to perform prediction on the test set to obtain a final prediction result. Specifically, the collected spam-related data needs to be counted first as a training set of the system, where D is X₁∪X₂Wherein X is₁＝{x_1i|x_1i∈R^m,y_1i＝1,i＝1,...,n₁Is a non-spam data set, X₂＝{x_2i|x_2i∈R^m,y_2i＝-1,i＝1,...,n₂Is a spam data set, with each sample having a characteristic number of m, n₁Number of non-spam data, n₂The number of the junk mail data, n is n₁+n₂For training set total number of samples, R^mFor a real number set with m features, X₁Feature matrices, X, being non-spam data₂Feature matrices, x, for spam data_1iFor the i-th non-spam mail data, y_1iFor classification result of ith non-spam e-mail, since x_1iIs the ith non-spam email, therefore y_1i＝1，x_2iMail data for ith spam, y_2iIs the classification result of the ith spam mail because of x_2iIs the ith spam, therefore y_2i＝-1。

In this embodiment, the classification result is obtained mainly by the following two linear discriminant functions:

f₁(x)＝x^Tw₁+b₁

f₂(x)＝x^Tw₂+b₂

wherein, w₁And w₂A first weight vector and a second weight vector of two functions, respectively, b₁And b₂A first function deviation factor and a second function deviation factor of the two functions, respectively. Therefore, to obtain the function weight vector and the deviation, two optimization problems as follows need to be solved respectively:

s.t.-(X₂w₁+e₂b₁)+ξ₂≥e₂,ξ₂≥0

s.t.(X₁w₂+e₁b₂)+ξ₁≥e₁,ξ₁≥0

wherein, C₁,C₂,C₃,C₄Four auxiliary variables that need to be determined in advance;andfeature metrics for non-spam and spam data, respectivelyThe number of the arrays is determined,andthe value of the relaxation variable is represented by,andis a vector of all 1, | | - | non-calculation₁Is the norm of L1.

After solving the two optimization problems, w is obtained₁，w₂，b₁And b₂Thus, two linear discriminant functions can be determined. In addition, w is₁And w₂The smaller the value of the middle element is, the smaller the contribution of the feature corresponding to the element to the model training is. Removal of w₁And w₂The characteristics corresponding to the elements with smaller median value improve the classification efficiency and the generalization performance of the model, thereby improving the accuracy of filtering the junk mails; therefore, in the present application, after obtaining the discriminant parameter in the linear discriminant function, whether a feature value smaller than a predetermined threshold exists in the first weight vector and the second weight vector in the discriminant parameter may be determined, and if so, the feature value smaller than the predetermined threshold in the first weight vector and the second weight vector may be set to zero, thereby improving the classification effect and the generalization capability of the model.

It can be understood that the discriminant parameters of the linear discriminant function are obtained in the above manner: w is a₁，w₂，b₁And b₂Then, the mail data can be processed by using a predetermined linear discriminant function to obtain a discriminant function value, and the process specifically includes: obtaining a first discrimination function value f by using the first linear discrimination function and the mail data x₁(x) (ii) a Obtaining a second discrimination function value f by using a second linear discrimination function and the mail data x₂(x) (ii) a Wherein the first linear discriminant function is: f. of₁(x)＝x^Tw₁+b₁The second linear discriminant function is: f. of₂(x)＝x^Tw₂+b₂。

That is, after the input mail data x to be predicted is acquired, normalization processing needs to be performed on the predicted mail data to make the characteristics thereof in the interval [0,1]]Then, the values of the discriminant functions are respectively calculated to obtain a first discriminant function value f₁(x) And a second discrimination function value f₂(x) The mail type is classified by the two discrimination function values.

S103, mail data is classified by using preset classification rules and discrimination function values.

The classifying the mail data by using a preset classification rule and the discrimination function value includes:

the classification rule is as follows:

It can be seen that after two discrimination function values are obtained, the type of the mail data can be discriminated according to the predetermined classification rule, that is: and judging whether the mail data is a junk mail.

The present invention is described in detail below with reference to a specific example, which is implemented on the premise of the technical solution of the present invention, and detailed embodiments and procedures are given, but the application scope of the present invention is not limited to the following example.

In this embodiment, a test is performed on the Spambase dataset from the UCI, which classifies mail according to whether it is spam or not. The data set contains 4601 training samples, each sample contains 57 features, most of which indicate whether a particular word or character is frequently present in the mail, as shown in table 1. Wherein, the feature with type "WORD _ freq _ WORD" represents the percentage of occurrences of matching WORDs in the email, namely:

"WORD" here may be any string of alphanumeric characters;

a feature of type "word _ freq _ CHAR" represents the percentage of occurrences of matching characters in an email, namely:

"Capital _ run _ length _ average" represents the average length of an uninterrupted sequence of capital letters;

"Capital _ run _ length _ changest" represents the length of the longest continuous capital letter sequence;

"Central _ run _ length _ total" represents the total number of capital letters in an email.

In this training sample, there are 1813 non-spam, which are labeled + 1; there are 2788 spam, which are marked as-1.

TABLE 1 characterization of the Spambase dataset

The specific implementation steps are as follows:

first, data preprocessing module

(1) And (4) counting the collected related data of the junk mails to be used as a training set of the system. The Spambase dataset is used in this example.

(2) Input training set D ═ X₁∪X₂Wherein X is₁＝{x_1i|x_1i∈R^m,y_1i＝1,i＝1,...,n₁Is a non-spam data set, X₂＝{x_2i|x_2i∈R^m,y_2i＝-1,i＝1,...,n₂Is a spam data set, with each sample having a characteristic number of m, n₁Number of non-spam data, n₂The number of the junk mail data, n is n₁+n₂Is the total number of samples in the training set. In this example, the number of features m is 57 and the total number of training set samples n is 4601. 3680 samples in the sample set were randomly taken as a training set, and the remaining 921 samples were taken as a test set.

Second, data training module

Two linear discriminant functions were determined using the present invention:

f₁(x)＝x^Tw₁+b₁

f₂(x)＝x^Tw₂+b₂

wherein w₁And w₂As a weight vector of a function, b₁And b₂Is the deviation of the function. To obtain the function weight vector and the deviation, the following two optimization problems are solved respectively:

s.t.-(X₂w₁+e₂b₁)+ξ₂≥e₂,ξ₂≥0

s.t.(X₁w₂+e₁b₂)+ξ₁≥e₁,ξ₁≥0

wherein, C₁,C₂,C₃,C₄Is an auxiliary variable that needs to be determined in advance;andfeature matrices that are non-spam and spam data respectively,andthe value of the relaxation variable is represented by,andis a vector of all 1 s.

After solving the two optimization problems, w is obtained₁，w₂，b₁And b₂Thus, two linear discriminant functions can be determined. w is a₁And w₂The smaller the value of the middle element is, the smaller the contribution of the feature corresponding to the element to the model training is. Removal of w₁And w₂And the characteristics corresponding to the elements with smaller median value improve the classification efficiency and the generalization performance of the model, thereby improving the accuracy of filtering the junk mails.

Table 2 shows w in this example₁And w₂And its corresponding characteristics.

TABLE 2 Spambase dataset trained w₁，w₂Values, and their corresponding characteristics

As can be seen from Table 2, some character strings consisting of numbers and some features such as symbols of "(", "[", etc. contribute less to the model training, while w₁And w₂The feature vocabulary corresponding to the larger element value in the model, such as features of "meeting", "business", "edu", etc., has a larger contribution to the model. Will be w in this example₁And w₂Median value of [ -e^-4,e^-4]The elements in between (i.e., the bolded data in the table) are set to 0.

Data prediction module

Inputting mail data x to be predicted, respectively calculating the value of discriminant function

f₁(x)＝x^Tw₁+b₁

f₂(x)＝x^Tw₂+b₂

Then, the mail category is judged according to the following rules:

if it isIf the number is 1, the mail is a non-junk mail; otherwise, the mail is junk mail.

TSVM and the present invention are compared. The invention is divided into two types, one is to carry out the pair w₁And w₂Setting the smaller value to zero directly; the other is to reserve w₁And w₂. The experimental results are shown in table 3, and the method reduces the influence of the features with lower contribution degree on the classification results, improves the generalization performance of classification, and further improves the accuracy of mail filtering.

TABLE 3 comparison of accuracy of Spambase data set test results

Method of producing a composite material	Accuracy of measurement
		The invention (Small weight rejection)	94.14％
Present invention (all weights)	94.03％
		TSVM	92.31％

It can be seen that when mail data is classified through a linear discriminant function, discriminant parameters in the linear discriminant function need to be obtained by analyzing a training set through a twin support vector machine classification algorithm based on an L1 norm in advance, and through the discriminant parameters, the influence of features with small contribution degree on a classification result can be reduced, so that the classification efficiency and the generalization performance are improved; furthermore, the scheme can also be realized by combining w₁And w₂And the influence of the characteristics with small contribution degree on the classification result is directly removed in a mode of directly setting the smaller value of the spam, so that the accuracy of filtering the spam is further improved.

In the following, the mail sorting apparatus provided in the embodiment of the present invention is introduced, and the mail sorting apparatus described below and the mail sorting method described above may be referred to each other.

Referring to fig. 2, an email sorting apparatus provided in an embodiment of the present invention includes:

a data receiving module 100, configured to receive mail data to be classified;

a data processing module 200, configured to process the mail data by using a predetermined linear discriminant function to obtain a discriminant function value; wherein, the discrimination parameters in the linear discrimination function are: analyzing the training set by a twin support vector machine classification algorithm based on an L1 norm in advance to obtain the training set; the training data comprises mail training data of different categories;

a data classification device 300, configured to classify the mail data by using a preset classification rule and the discrimination function value.

The device also comprises a discrimination parameter generation module; wherein, the discrimination parameter generation module comprises:

a training set acquisition unit for acquiring a training set;

s.t.-(X₂w₁+e₂b₁)+ξ₂≥e₂,ξ₂≥0

s.t.(X₁w₂+e₁b₂)+ξ₁≥e₁,ξ₁≥0

Wherein the data processing module comprises:

A second processing unit for obtaining a second discrimination function value f by using a second linear discrimination function and the mail data x₂(x) (ii) a Wherein the first linear discriminant function is: f. of₁(x)＝x^Tw₁+b₁The second linear discriminant function is: f. of₂(x)＝x^Tw₂+b₂。

Wherein the data classification device is specifically configured to: using a predetermined classification rule, the first discrimination function value f₁(x) The second discrimination function value f₂(x) Obtaining a classification result of the mail data;

the classification rule is as follows:

Referring to fig. 3, a schematic structural diagram of an email sorting apparatus is also disclosed for the embodiment of the present invention; the apparatus may include:

a memory 11 for storing a computer program;

a processor 12 for implementing the steps of the mail sorting method according to any of the above-described method embodiments when executing said computer program.

In the present embodiment, the device 1 may be a PC (Personal Computer), or may be a terminal device such as a smart phone, a tablet Computer, a palmtop Computer, or a portable Computer.

The device 1 may include a memory 11, a processor 12 and a bus 13.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the device 1, for example a hard disk of the device 1. The memory 11 may also be an external storage device of the device 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the device 1. Further, the memory 11 may also comprise both internal memory units of the device 1 and external memory devices. The memory 11 can be used not only for storing application software installed in the apparatus 1 and various types of data such as codes for executing a mail classification method, etc., but also for temporarily storing data that has been output or is to be output.

Processor 12, which in some embodiments may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor or other data Processing chip, executes program code or processes data stored in memory 11, such as code for performing mail sorting methods.

The bus 13 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 3, but this does not mean only one bus or one type of bus.

Further, the device may further comprise a network interface 14, and the network interface 14 may optionally comprise a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the device 1 and other electronic devices.

Optionally, the device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the device 1 and for displaying a visual user interface.

Fig. 3 only shows the device 1 with the components 11-14, and it will be understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the steps of the mail classification method according to any method embodiment.

Wherein the storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of classifying mail, comprising:

receiving mail data to be classified;

2. The mail classification method according to claim 1, wherein the method for generating the discriminant parameters in the linear discriminant function comprises:

the preset conditions include:

s.t.-(X₂w₁+e₂b₁)+ξ₂≥e₂,ξ₂≥0

s.t.(X₁w₂+e₁b₂)+ξ₁≥e₁,ξ₁≥0

3. The mail sorting method of claim 2, wherein the processing the mail data with a predetermined linear discriminant function to obtain a discriminant function value comprises:

Wherein the first linear discriminant function is: f. of₁(x)＝x^Tw₁+b₁The second linear discriminant function is: f. of₂(x)＝x^Tw₂+b₂。

4. The mail classification method according to claim 3, wherein said classifying the mail data by using a predetermined classification rule and the discrimination function value includes:

using a predetermined classification rule, the first discrimination function value f₁(x) The second discrimination function value f₂(x)，Obtaining a classification result of the mail data;

the classification rule is as follows:

5. A mail sorting apparatus, comprising:

the data receiving module is used for receiving the mail data to be classified;

6. The mail sorting apparatus of claim 5, further comprising a discrimination parameter generation module; wherein, the discrimination parameter generation module comprises:

a training set acquisition unit for acquiring a training set;

s.t.-(X₂w₁+e₂b₁)+ξ₂≥e₂,ξ₂≥0

s.t.(X₁w₂+e₁b₂)+ξ₁≥e₁,ξ₁≥0

7. The mail sorting device of claim 6, wherein said data processing module comprises:

8. The mail sorting device of claim 7, wherein the data sorting device is specifically configured to: using a predetermined classification rule, the first discrimination function value f₁(x) The second discrimination function value f₂(x) Obtaining a classification result of the mail data;

the classification rule is as follows:

9. A mail sorting apparatus, comprising:

a memory for storing a computer program;

processor for implementing the steps of the mail sorting method according to any one of claims 1 to 4 when executing said computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the mail sorting method according to any one of claims 1 to 4.