CN110796200A

CN110796200A - Data classification method, terminal, device and storage medium

Info

Publication number: CN110796200A
Application number: CN201911044522.0A
Authority: CN
Inventors: 陈瑞钦; 黄启军; 李诗琦; 唐兴兴; 林冰垠
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-14
Anticipated expiration: 2039-10-30
Also published as: CN110796200B

Abstract

The invention discloses a data classification method, which comprises the following steps: when a data classification instruction is received, a target characteristic identifier is obtained, data in a data set is partitioned based on the target characteristic identifier, classification operation is carried out on each data block based on a preset classification rule to obtain a sub-training set, a sub-verification set and a sub-test set, and the sub-training set, the sub-verification set and the sub-test set are respectively sent to the training set, the sub-verification set and the sub-test set and the test set. The invention also discloses a device, a terminal and a storage medium. The data with the same characteristic data corresponding to the target characteristic identification are divided into the data blocks on the terminal, and then the data blocks are classified into the sub-training set, the sub-verification set and the sub-test set according to the preset proportion and are sent, so that the corresponding data are directly sent to the training set, the verification set and the test set on the terminal.

Description

Data classification method, terminal, device and storage medium

Technical Field

The present invention relates to the field of terminal technologies, and in particular, to a data classification method, a terminal, an apparatus, and a storage medium.

Background

Today is a big data age, and big data analysis is generally done on distributed systems, such as: the distributed machine learning system is used for constructing a machine learning model under big data. In the process of training a machine learning model, original data is generally split according to a certain proportion and divided into a training set, a verification set and a test set, so that the original data stored on each computing node needs to be split according to requirements. For example, if a user selects a feature to be split as x, where the feature includes three values a, b, and c, data of the three values needs to be split respectively, and then split results of the three values are combined to obtain a final training set, a final verification set, and a final test set.

The current pain point: the large data are generally stored on a plurality of computing nodes in a distributed manner, the data are required to be subdivided according to characteristics in a traditional map-reduce computing mode or a group operator equal-division layer splitting method based on the map-reduce computing mode, namely the data stored on each computing node are required to be subdivided according to the value of the characteristic x, so that single characteristics or selected characteristic data are stored on each computing node, then each characteristic data on each computing node is randomly split according to the proportion, and the splitting results are converged to obtain the final layer splitting result. This splitting method requires a large amount of data to be transmitted between different computing nodes, resulting in poor computing performance.

Disclosure of Invention

The invention mainly aims to provide a data classification method, a terminal, a device and a storage medium, and aims to solve the technical problems that when data classification is carried out on a sample data set of an existing training machine learning model, data movement needs to be carried out among distributed terminals, so that the system processing burden is heavy, and the data classification time is long.

In order to achieve the above object, the present invention provides a data classification method applied to a terminal, the data classification method comprising the steps of:

when a data classification instruction is received, acquiring a target characteristic identifier;

partitioning the data in the data set based on the target feature identification to obtain a plurality of data blocks;

classifying each data block based on a preset classification rule to obtain a sub-training set, a sub-verification set and a sub-test set;

and respectively sending the sub-training set to a training set, the sub-verification set to a verification set and the sub-test set to a test set.

Further, in an embodiment, the data includes a target feature identifier, the feature data corresponding to the target feature identifier has m values, m is a positive integer, and the step of blocking the data in the data set based on the target feature identifier to obtain a plurality of data blocks includes:

and dividing the data with the same value of the characteristic data corresponding to the target characteristic identifier in the data set into one data block to obtain data blocks corresponding to the m values.

Further, in an embodiment, the step of performing a classification operation on each data block based on a preset classification rule to obtain a sub-training set, a sub-verification set, and a sub-test set includes:

acquiring proportional data of the sub-training set, the sub-verification set and the sub-test set;

and traversing each data block, and correspondingly distributing each data block to the sub-training set, the sub-verification set or the sub-test set based on the proportion data.

Further, in an embodiment, the data set includes first data, and after the step of obtaining the target feature identifier when receiving the data classification instruction, the method further includes:

and when the feature data corresponding to the target feature identifier in the first data meets the admission condition of a training set, determining the first data as the training set data, and sending the training set data to the training set.

Further, in an embodiment, when the feature data corresponding to the target feature identifier of the first data satisfies a training set admission condition, the step of determining that the first data is training set data includes:

and when the quantity of the data required by the training set is greater than or equal to a threshold value and the feature data of the first data exists in the feature data of the required data, determining the first data as the training set data, wherein the required data comprises the feature data corresponding to the target feature identifier.

Further, in an embodiment, the data set includes second data, and after the step of obtaining the target feature identifier when receiving the data classification instruction, the method further includes:

and when the feature data corresponding to the target feature identifier in the second data meets the admission condition of a verification set, determining the second data as the verification set data, and sending the verification set data to the verification set.

Further, in an embodiment, the data set includes third data, and after the step of obtaining the target feature identifier when receiving the data classification instruction, the method further includes:

and when the feature data corresponding to the target feature identifier in the third data meets the test set admission condition, determining the third data as test set data, and sending the test set data to the test set.

Further, in an embodiment, the data classification apparatus includes:

the acquisition module is used for acquiring the target characteristic identification when a data classification instruction is received;

the blocking module is used for blocking the data in the data set based on the target feature identification to obtain a plurality of data blocks;

the classification module is used for performing classification operation on each data block based on a preset classification rule to obtain a sub-training set, a sub-verification set and a sub-test set;

and the sending module is used for respectively sending the sub-training set to the training set, the sub-verification set to the verification set and the sub-test set to the test set.

In order to achieve the above object, the present invention further provides a terminal, including: a memory, a processor and a data classification program stored on the memory and executable on the processor, the data classification program when executed by the processor implementing the steps of the data classification method as described above.

In addition, to achieve the above object, the present invention further provides a storage medium having a data classification program stored thereon, the data classification program implementing the steps of any one of the data classification methods described above when executed by a processor.

The method comprises the steps of obtaining a target characteristic identifier when a data classification instruction is received, partitioning data in a data set based on the target characteristic identifier to obtain a plurality of data blocks, classifying the data blocks based on a preset classification rule to obtain a sub-training set, a sub-verification set and a sub-test set, and finally respectively sending the sub-training set to the training set, the sub-verification set to the verification set and the sub-test set to the test set. The data with the same characteristic data corresponding to the target characteristic identification are divided into one data block, each data block is directly classified into a training set, a verification set and a test set according to a preset proportion, and finally, the sub-training set is respectively sent to the training set, the sub-verification set is sent to the verification set and the sub-test set is sent to the test set, so that the corresponding data are directly sent to the training set, the verification set and the test set at a terminal.

Drawings

Fig. 1 is a schematic structural diagram of a terminal in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a data classification method according to a first embodiment of the present invention;

FIG. 3 is a flow chart illustrating a prior art method of data classification according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating a whole process of classification processing according to an embodiment of the data classification method of the present invention;

FIG. 5 is a flowchart illustrating a data classification method according to a second embodiment of the present invention;

FIG. 6 is a functional block diagram of an embodiment of a data sorting apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a terminal in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the attitude sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration) for recognizing the attitude of the mobile terminal, and related functions (such as pedometer and tapping) for vibration recognition; of course, the terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a data sorting program.

In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a user terminal and performing data communication with the user terminal; and the processor 1001 may be used to invoke the data sorting program stored in the memory 1005.

In this embodiment, the terminal includes: the system comprises a memory 1005, a processor 1001 and a data classification program which is stored in the memory 1005 and can be run on the processor 1001, wherein when the processor 1001 calls the data classification program stored in the memory 1005, the steps of the data classification method provided by each embodiment of the application are executed.

The invention also provides a data classification method, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the data classification method of the invention.

While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than presented herein.

The data classification method of the first embodiment of the present invention is applied to a terminal, and a plurality of terminal devices are in communication connection with a server, and in this embodiment, the data classification method includes:

step S10, when a data classification instruction is received, a target feature identifier is obtained;

in this embodiment, in deep machine model learning, an available data set is often divided into a training set, a verification set, and a test set, where the training set refers to a sample set for training and is mainly used to train parameters in a neural network; the verification set is understood in a literal sense and is a sample set for verifying the performance of the model, and after training of different machine models on the training set is finished, the performance of each model is compared and judged through the verification set; the test set is used for objectively evaluating the performance of the machine model for the trained machine model.

In particular, the data set is stored in the terminal, and the data set of a plurality of terminals can be considered as one original data set, and the original data set needs to be divided into a training set, a verification set and a test set for deep machine model learning. One data is composed of characteristic data and target data corresponding to the characteristic data. The feature data comprises a plurality of features, and one feature has a plurality of value conditions. For example, for characterizing data of a person, the features may include gender, height, weight, age range, etc., and possible values for the feature "age range" are under 10 years, 10-18 years, 18-30 years, 30-50 years, and over 50 years. Note that the data in the data set includes the same kind of features, but the specific values of the features are different. The data set is divided into a training set, a verification set and a test set, a certain characteristic is required to be determined as a target characteristic, and then the data are finally distributed to the training set, the verification set or the test set according to the specific value of the target characteristic corresponding to each data. Therefore, when a data classification instruction is received, the target feature identification is synchronously obtained, the specific target feature identification is determined according to the actual condition, the target feature is generally one feature, and the target feature identification can be determined according to historical data and expert experience.

Step S20, partitioning the data in the data set based on the target feature identification to obtain a plurality of data blocks;

in this embodiment, after the target feature identifier is determined, that is, the classification basis of the data is determined, the data set is partitioned according to the specific value of the target feature identifier corresponding to each data, so as to obtain a plurality of data blocks.

Further, as shown in fig. 3, there are n terminals, the target feature x has m feature values, when data set classification is performed, data on each terminal is moved in the prior art, data in all terminals participating in data classification is moved according to the target feature, data with the same value corresponding to the target feature is divided into the same terminal, then data blocks in each terminal are split, and the split data are respectively output to a training set, a verification set and a test set. The data classification method provided by the invention is to perform blocking and classification in one terminal without involving data movement between terminals, so that compared with the prior art, the data classification method avoids data movement between terminals during data classification, thereby reducing system resource consumption, saving data classification time and improving data classification efficiency.

Specifically, step S20 includes: and dividing the data with the same value of the characteristic data corresponding to the target characteristic identifier in the data set into one data block to obtain data blocks corresponding to the m values.

In this embodiment, the target feature identifier is a feature in the data, and one feature has multiple value-taking conditions, and the data is partitioned according to a value range of the feature value, that is, data with the same feature data corresponding to the target feature identifier in the data set is aggregated to obtain data blocks corresponding to different feature data. Assuming that the target feature x includes m different values, the data set may be divided into m data blocks according to the value of the target feature in the data, and the value of the target feature corresponding to the data in each data block is the same.

For example, the characteristic data used for describing a person may include sex, height, weight, age group, and the like, and assuming that there may be 5 kinds of characteristic "age group" values, which are respectively under 10 years old, 10-18 years old, 18-30 years old, 30-50 years old, and over 50 years old, a data set including 1000 pieces of data is identified by using "age group" as a target characteristic, data in the data set with "age group" under 10 years old are divided into one data block, similarly, data in "age group" between 10-18 years old are divided into one data block, data in "age group" between 18-30 years old are divided into one data block, data in "age group" between 30-50 years old are divided into one data block, and data in "age group" over 50 years old are divided into one data block, so that 5 data blocks are finally obtained. In general, when the number of data included in the data set is large, the number of the divided data blocks is equal to the number of values in the value range of the target feature, that is, the target feature has 5 possible values, and the data set is divided into 5 data blocks.

Step S30, classifying each data block based on a preset classification rule to obtain a sub-training set, a sub-verification set and a sub-test set;

in this embodiment, after the data set is partitioned according to the values of the target features of the data, the data blocks corresponding to different feature data are obtained, that is, the values of the target features corresponding to each data in one data block are the same. And then, continuously classifying each data block according to a preset classification rule to further obtain a sub-training set, a sub-verification set and a sub-test set, wherein the preset classification rule is as follows: after obtaining the proportional data of the sub-training set, the sub-verification set and the sub-test set, dividing the data in each data block into sub-training set data, sub-verification set data and sub-test set data according to the proportional data, and then dividing the sub-training set data into the sub-training set, the sub-verification set data into the sub-verification set and the sub-test set data into the sub-test set

Specifically, step S30 includes:

step S31, obtaining the proportion data of the sub training set, the sub verification set and the sub test set;

in this embodiment, the scale data commonly used for small scale data sets is a sub-training set: and (4) sub-verification set: subtest set 6: 2: 2, for example, 10000 data are in total, the sub training set is divided into 6000 data, the sub verification set is 2000 data, and the sub test set is 2000 data; for a large sample set, the proportion of the sub-verification set to the sub-test set is reduced a lot, because a certain sample size is sufficient to verify (compare) the model performance and the test model performance, for example, 10000000 samples are total, the training set is divided into 9980000 samples, the verification set is divided into 10000 samples, and the test set is divided into 10000 samples.

Specifically, the proportion data of the sub-training set, the sub-verification set and the sub-test set is obtained according to the data quantity included in the data set. Optionally, the preset proportional data list stores the data quantity and proportional data corresponding to the data quantity, and the proportional data of the sub-training set, the sub-verification set and the sub-test set corresponding to the data set can be acquired in the proportional data list according to the data quantity of the terminal in a table look-up manner.

Step S32, traversing each data block, and correspondingly allocating each data block to the sub-training set, the sub-verification set, or the sub-test set based on the ratio data.

In this embodiment, after obtaining the proportional data of the sub-training set, the sub-verification set, and the sub-test set, all the data blocks are traversed, and then each data block is allocated to the sub-training set, the sub-verification set, or the sub-test set according to the proportional data, it should be noted that the data in each data block needs to be divided into sub-training set data, sub-verification set data, and sub-test set data according to the proportional data, and the sub-training set data is divided into the sub-training set, the sub-verification set data is divided into the sub-verification set, and the sub-test set data is divided into the sub-test set. Optionally, according to the proportional data, sequentially traversing all data blocks in the data set, completing data block allocation of the sub-training set, then completing data block allocation of the sub-verification set, and finally completing data block allocation of the sub-test set, or simultaneously allocating currently traversed data to the sub-training set, the sub-verification set, and the sub-test set according to the proportional data, and when the number of data blocks in the sub-verification set or the sub-test set meets the requirement of the proportional data, then no data block is allocated to the sub-verification set or the sub-test set. Finally, after traversing all the data blocks, distributing all the data blocks according to the proportion data to obtain a sub-training set, a sub-verification set and a sub-test set.

And step S40, respectively sending the sub training set to the training set, the sub verification set to the verification set and the sub test set to the test set.

In this embodiment, a data set of a plurality of terminals is considered as an original data set, and the original data set is finally divided into a training set, a verification set, and a test set, so that in one terminal, data in the data set is partitioned according to a value obtaining condition of a target feature, that is, data with the same feature data corresponding to a target feature identifier in the data set is aggregated to obtain data blocks corresponding to different feature data. And then, acquiring proportional data of the sub-training set, the sub-verification set and the sub-test set corresponding to the terminal, and distributing all the data blocks according to the proportional data after traversing all the data blocks to obtain the sub-training set, the sub-verification set and the sub-test set. And finally, respectively sending the sub-training set, the sub-verification set and the sub-test set on the terminal to the training set, the verification set and the test set.

For example, referring to fig. 4, the data classification method of the present invention processes as follows:

the first step is as follows: storing an original data set D in n terminals, and partitioning data on each terminal according to the value of the target characteristic x;

the second step is that: each terminal divides the data blocks corresponding to different characteristic data into a sub-training set, a sub-verification set and a sub-test set according to a proportion;

the third step: each terminal outputs the sub-training set, the sub-verification set and the sub-test set;

the fourth step: and combining the sub-training set, the sub-verification set and the sub-test set of each terminal to obtain a training set, a verification set and a test set.

In the data classification method provided in this embodiment, when a data classification instruction is received, a target feature identifier is obtained, data in a data set is partitioned based on the target feature identifier to obtain a plurality of data blocks, then, a classification operation is performed on each data block based on a preset classification rule to obtain a sub-training set, a sub-verification set and a sub-test set, and finally, the sub-training set, the sub-verification set and the sub-test set are respectively sent to the training set, the verification set and the test set. The data with the same characteristic data corresponding to the target characteristic identification are divided into one data block, each data block is directly classified into a training set, a verification set and a test set according to a preset proportion, and finally, the sub-training set is respectively sent to the training set, the sub-verification set is sent to the verification set and the sub-test set is sent to the test set, so that the corresponding data are directly sent to the training set, the verification set and the test set at a terminal.

A second embodiment of the data classification method of the present invention is proposed based on the first embodiment, with reference to fig. 5, and in this embodiment, after step S10, the method includes:

step S50, when the feature data corresponding to the target feature identifier in the first data meets the admission condition of the training set, determining that the first data is the training set data, and sending the training set data to the training set.

In this embodiment, when a data classification instruction is received, a target feature identifier is obtained, then it is determined whether feature data corresponding to the target feature identifier of first data meets an admission condition for a training set, when the feature data corresponding to the target feature identifier of the first data meets the admission condition for the training set, the first data is determined to be training set data, and the first data is sent to the training set.

Specifically, step S50 includes: and when the quantity of the data required by the training set is greater than or equal to a threshold value and the feature data of the first data exists in the feature data of the required data, determining the first data as the training set data, wherein the required data comprises the feature data corresponding to the target feature identifier.

In this embodiment, after a data classification instruction is received, a target feature identifier is obtained, and then proportional data corresponding to a training set, a verification set, and a test set is obtained when different feature values are obtained, where the feature values refer to all possible values corresponding to the target feature identifier, for example, if a target feature x includes m different values, proportional data corresponding to m different values need to be obtained respectively. Optionally, all the scale data of the training set, the verification set, and the test set are stored in a preset scale data list, and may be obtained by table lookup.

Further, after the proportional data corresponding to the different feature values are obtained, the total amount of data to be classified into the training set, the total amount of data to be classified into the verification set and the amount of data to be classified into the training set, which correspond to the different feature values, are calculated according to the proportional data. And then, traversing all data in the data set in sequence, judging whether the data meets the admission condition of the training set, and when the data meets the admission condition of the training set, taking the data as the data of the training set and sending the data to the training set. Wherein, the training set admission condition refers to: the quantity of the data required by the current training set is larger than or equal to a threshold value, the threshold value can be set to be 1, the data set comprises first data, if the first data is traversed, a characteristic value corresponding to a target characteristic of the first data is obtained, when the characteristic data of the first data exists in the characteristic data of the required data, the first data meets the admission condition of the training set, the first data is used as the training set data at the moment and is sent to the training set, meanwhile, the quantity of the data required by the training set is updated, namely, after the operation of the first data is completed, the quantity of the data required by the training set is reduced by 1.

Step S60, when the feature data corresponding to the target feature identifier in the second data meets the verification set admission condition, determining that the second data is verification set data, and sending the verification set data to the verification set.

Further, after the proportion data corresponding to the different feature values are obtained, the total amount of data to be classified into the training set, the total amount of data to be classified into the verification set and the amount of data to be classified into the test set, which correspond to the different feature values, are calculated according to the proportion data. And then, sequentially traversing all data in the data set, judging whether the data meets the access condition of the verification set, and when the data meets the access condition of the verification set, taking the data as the data of the verification set and sending the data to the verification set. Wherein, verifying the admission condition refers to: the number of data required by the current verification set is larger than or equal to a threshold value, the threshold value can be set to be 1, the data set comprises second data, if the second data is traversed currently, a characteristic value corresponding to a target characteristic of the second data is obtained, when the characteristic data of the second data exists in the characteristic data of the required data, the second data meets the admission condition of the verification set, the second data is used as the data of the verification set at the moment and is sent to the verification set, meanwhile, the number of data required by the verification set is updated, namely, after the operation of the second data is completed, the number of data required by the verification set is reduced by 1.

Step S70, when the feature data corresponding to the target feature identifier in the third data meets the test set admission condition, determining that the third data is test set data, and sending the test set data to the test set.

Further, after the proportion data corresponding to the different feature values are obtained, the total amount of data to be classified into the training set, the total amount of data to be classified into the verification set and the amount of data to be classified into the test set, which correspond to the different feature values, are calculated according to the proportion data. And then, traversing all data in the data set in sequence, judging whether the data meets the test set access condition, and when the data meets the test set access condition, taking the data as the test set data and sending the data to the test set. Wherein, the test set admission condition refers to: the number of data required by the current test set is larger than or equal to a threshold value, the threshold value can be set to be 1, the data set comprises third data, if the third data is traversed currently, a characteristic value corresponding to a target characteristic of the third data is obtained, when the characteristic data of the third data exists in the characteristic data of the required data, the third data meets the access condition of the test set, the third data is used as the test set data and is sent to the test set, meanwhile, the number of the data required by the test set is updated, namely, after the operation of the third data is completed, the number of the data required by the test set is reduced by 1.

The data classification method provided by this embodiment determines the classification type of the data by traversing the data in the terminal and using the training set admission condition, the verification set admission condition, or the test set admission condition, and sends the classification type to the server, thereby directly sending the corresponding data to the training set, the verification set, and the test set at the terminal. Compared with the prior art (all terminals participating in data classification move data according to target characteristics and are used for realizing data classification to the same terminal with the same value of the target characteristics), the data do not need to move at the terminals when the terminals classify the data, and the influence on the system performance and the processing speed due to the data movement among the terminals is avoided, so that the resource consumption of the system is reduced, meanwhile, the data classification time is saved, and the data classification efficiency is improved.

The present invention further provides a data classifying device, referring to fig. 6, fig. 6 is a functional module schematic diagram of an embodiment of the data classifying device of the present invention.

The acquiring module 10 is configured to acquire a target feature identifier when a data classification instruction is received;

a blocking module 20, configured to block data in the data set based on the target feature identifier to obtain a plurality of data blocks;

the classification module 30 is configured to perform classification operations on the data blocks based on preset classification rules to obtain a sub-training set, a sub-verification set, and a sub-test set;

and a sending module 40, configured to send the sub-training set to the training set, the sub-verification set to the verification set, and the sub-test set to the test set, respectively.

Further, the blocking module 20 is further configured to:

Further, the classification module 30 is further configured to:

Further, the data classification device further includes:

and the first processing module is used for determining the first data as training set data when the characteristic data corresponding to the target characteristic identification in the first data meets the admission condition of a training set, and sending the training set data to the training set.

Further, the first processing module is further configured to:

Further, the data classification device further includes:

and the second processing module is used for determining the second data as verification set data when the feature data corresponding to the target feature identifier in the second data meets the verification set admission condition, and sending the verification set data to the verification set.

Further, the data classification device further includes:

and the third processing module is used for determining the third data as test set data when the feature data corresponding to the target feature identifier in the third data meets the test set admission condition, and sending the test set data to the test set.

In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores a data classification program, and the data classification program, when executed by a processor, implements the steps of the data classification method in the foregoing embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be substantially or partially embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing a system device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A data classification method is applied to a terminal, and the data classification method comprises the following steps:

2. The data classification method according to claim 1, wherein the data includes a target feature identifier, the feature data corresponding to the target feature identifier has m values, m is a positive integer, and the step of blocking the data in the data set based on the target feature identifier to obtain a plurality of data blocks includes:

3. The data classification method according to claim 1, wherein the step of performing classification operation on each data block based on a preset classification rule to obtain a sub-training set, a sub-verification set and a sub-test set comprises:

4. The data classification method of claim 1, wherein the data set includes first data, and wherein the step of obtaining the target feature identifier upon receiving the data classification command further comprises:

5. The data classification method according to claim 4, wherein the step of determining the first data as training set data when the feature data corresponding to the target feature identifier of the first data satisfies a training set admission condition comprises:

6. The data classification method of claim 1, wherein the data set includes second data, and wherein the step of obtaining the target feature identifier upon receiving the data classification command further comprises:

7. The data classification method according to claim 1, wherein the data set includes third data, and further comprising, after the step of obtaining the target feature identifier upon receiving the data classification command:

8. A data sorting apparatus, characterized in that the data sorting apparatus comprises:

9. A terminal, characterized in that the terminal comprises: memory, processor and a data classification program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the data classification method according to any one of claims 1 to 7.

10. A storage medium having the data sorting program stored thereon, the data sorting program when executed by a processor implementing the steps of the data sorting method according to any one of claims 1 to 7.