CN113516205A

CN113516205A - Data classification method, device, equipment and storage medium based on artificial intelligence

Info

Publication number: CN113516205A
Application number: CN202111029679.3A
Authority: CN
Inventors: 任杰
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-03
Filing date: 2021-09-03
Publication date: 2021-10-19
Anticipated expiration: 2041-09-03
Also published as: CN113516205B

Abstract

The invention relates to artificial intelligence and provides a data classification method, a device, equipment and a storage medium based on artificial intelligence. The method can obtain a plurality of initial samples, each initial sample comprises a sample value of a sample user in a plurality of data characteristics and a user result, the sample value is subjected to discrete processing to obtain a discrete result, the discrete result is subjected to nuclear density estimation analysis to obtain characteristic distribution information of each data characteristic, information in each characteristic distribution information is extracted to obtain a training sample, a characteristic weight of each data characteristic is determined according to the discrete result and the user result, a sample result is generated according to the training sample and the characteristic weight, a preset network is adjusted based on the training sample and the sample result to obtain a classification model, a request characteristic is obtained, and the request characteristic is processed according to the classification model to obtain a classification result. The invention can improve the accuracy of the classification result. In addition, the invention also relates to a block chain technology, and the classification result can be stored in the block chain.

Description

Data classification method, device, equipment and storage medium based on artificial intelligence

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a data classification method, a data classification device, data classification equipment and a storage medium based on artificial intelligence.

Background

The evaluation of the stability of the employees by the enterprise is a key problem of the relationship management research of the employees, and the value of the employees can be better realized through the evaluation of the stability of the employees.

At present, in a scheme for classifying user stability, a classification model is generally trained by directly using acquired data samples, and then the stability of a user is evaluated according to the classification model, however, the data samples adopted in training the classification model are directly acquired from a real scene, and the acquired data has the problem of sample distribution imbalance, so that the accuracy of the trained classification model is low, and the problem that the stability of employees cannot be accurately evaluated is caused.

Disclosure of Invention

In view of the above, it is desirable to provide a data classification method, apparatus, device and storage medium based on artificial intelligence, which can improve the accuracy of the classification result.

In one aspect, the present invention provides an artificial intelligence based data classification method, including:

obtaining a plurality of initial samples, wherein each initial sample comprises sample values of a sample user in a plurality of data characteristics and a user result corresponding to the sample user;

performing discrete processing on the sample value to obtain a discrete result;

performing kernel density estimation analysis on the discrete result based on the plurality of data characteristics to obtain characteristic distribution information of each data characteristic;

extracting information in each feature distribution information to obtain a training sample;

determining a feature weight of each data feature according to the discrete result and the user result;

generating a sample result of the training sample according to the training sample and the feature weight;

adjusting a preset network based on the training sample and the sample result to obtain a classification model;

when a classification request is received, acquiring request characteristics according to the classification request;

and processing the request characteristics according to the classification model to obtain a classification result of the classification request.

According to a preferred embodiment of the present invention, the discretizing the sample values to obtain a discretization result includes:

detecting a data type of the sample value;

screening sample values with the data types being numerical types from the sample values to serve as first numerical values, and screening sample values with the data types not being the numerical types from the sample values to serve as information to be processed;

acquiring data characteristics to which the information to be processed belongs as target characteristics, and acquiring a plurality of preset ranges of the target characteristics;

dispersing the information to be processed into scores corresponding to the preset ranges to obtain a second score;

determining the first numerical value and the second numerical value as the discrete result.

According to a preferred embodiment of the present invention, the performing a kernel density estimation analysis on the discrete result based on the plurality of data features to obtain feature distribution information of each data feature includes:

for each data feature, acquiring an attribute feature of the data feature;

selecting a kernel function corresponding to the attribute characteristics from preset functions;

calculating the feature distribution information of the data feature according to the following formula based on the kernel function and the discrete result:

;

wherein the content of the first and second substances,

it is referred to the feature distribution information,

refers to the number of such discrete results,

，

it is meant that the discrete result is,

is referred to as

The number of the discrete results is one,

refers to the kernel function.

According to a preferred embodiment of the present invention, the extracting information in each of the feature distribution information to obtain a training sample includes:

randomly selecting any feature distribution information from the plurality of feature distribution information as target feature distribution information, and determining the plurality of feature distribution information except the target feature distribution information as the rest feature distribution information;

randomly extracting any feature data from the target feature distribution information, and extracting initial feature data from the rest feature distribution information;

calculating the coexistence probability of the arbitrary feature data and the initial feature data;

determining the initial characteristic data with the coexistence probability larger than a preset probability threshold value as target characteristic data;

and determining the combination of the arbitrary characteristic data and each target characteristic data as the training sample.

According to a preferred embodiment of the present invention, the calculating the coexistence probability of the arbitrary feature data and the initial feature data in the remaining feature distribution information includes:

calculating the information correlation degree of the target characteristic distribution information and the rest characteristic distribution information;

calculating a first data probability of the arbitrary feature data in the target feature distribution information, and calculating a second data probability of the initial feature data in the rest feature distribution information;

and calculating the product of the information correlation degree, the first data probability and the second data probability to obtain the coexistence probability.

According to the preferred embodiment of the present invention, the determining the feature weight of each data feature according to the discrete result and the user result includes:

discretizing the user result to obtain a labeling result;

based on the labeling result and the discrete result, calculating the feature weight according to the following formula:

；

；

wherein the content of the first and second substances,

the result of the annotation is referred to as the result of the annotation,

，

，

respectively, refer to the weight of the feature,

，

the feature weight value corresponding to each data feature is referred to.

According to a preferred embodiment of the present invention, the adjusting the preset network based on the training samples and the sample results to obtain the classification model includes:

determining classification scenes corresponding to the plurality of initial samples;

acquiring a network matched with the classification scene from a classification network library as the preset network;

inputting the training sample into the preset network to obtain a prediction result;

calculating a network loss value of the preset network based on the prediction result and the sample result;

and adjusting the network parameters of the preset network according to the network loss value until the network loss value is not reduced any more, and obtaining the classification model.

On the other hand, the invention also provides a data classification device based on artificial intelligence, which comprises:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of initial samples, and each initial sample comprises a sample value of a sample user in a plurality of data characteristics and a user result corresponding to the sample user;

the discrete unit is used for carrying out discrete processing on the sample value to obtain a discrete result;

the analysis unit is used for carrying out nuclear density estimation analysis on the discrete result based on the plurality of data characteristics to obtain characteristic distribution information of each data characteristic;

the extraction unit is used for extracting information in each feature distribution information to obtain a training sample;

a determining unit, configured to determine a feature weight of each data feature according to the discrete result and the user result;

the generating unit is used for generating a sample result of the training sample according to the training sample and the characteristic weight;

the adjusting unit is used for adjusting a preset network based on the training sample and the sample result to obtain a classification model;

the obtaining unit is further used for obtaining request characteristics according to the classification request when the classification request is received;

and the processing unit is used for processing the request characteristics according to the classification model to obtain a classification result of the classification request.

In another aspect, the present invention further provides an electronic device, including:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the artificial intelligence based data classification method.

In another aspect, the present invention also provides a computer-readable storage medium having computer-readable instructions stored therein, which are executed by a processor in an electronic device to implement the artificial intelligence based data classification method.

According to the technical scheme, the discrete results are subjected to kernel density estimation analysis, so that the feature distribution information can be uniformly distributed on the data features, the balance of the training sample is improved, the robustness of the classification model is improved, the sample results can be generated based on the training sample and the feature weight, repeated marking on the training sample is not needed, the generation efficiency of the sample results can be improved, meanwhile, the sample results can be analyzed from data according to the sample probability and the feature weight, the accuracy of the sample results is improved, the accuracy of the classification model is improved again, and the determination accuracy of the classification results can be further improved.

Drawings

FIG. 1 is a flow chart of the data classification method based on artificial intelligence according to the preferred embodiment of the invention.

FIG. 2 is a functional block diagram of a preferred embodiment of the data classification apparatus based on artificial intelligence according to the present invention.

FIG. 3 is a schematic structural diagram of an electronic device implementing an artificial intelligence-based data classification method according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a flow chart of a preferred embodiment of the data classification method based on artificial intelligence according to the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The artificial intelligence based data classification method can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The data classification method based on artificial intelligence is applied to one or more electronic devices, which are devices capable of automatically performing numerical calculation and/or information processing according to computer readable instructions set or stored in advance, and hardware of the electronic devices includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), a smart wearable device, and the like.

The electronic device may include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network electronic device, an electronic device group consisting of a plurality of network electronic devices, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network electronic devices.

The network in which the electronic device is located includes, but is not limited to: the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

S10, obtaining a plurality of initial samples, wherein each initial sample comprises sample values of a sample user in a plurality of data characteristics and a user result corresponding to the sample user.

In at least one embodiment of the invention, the sample user may be any employee of the enterprise.

The plurality of data characteristics may include number of clients contacted, number of meetings attended, frequency of company APP usage, job age, academic calendar, age, capacity of the group where the user is located, capacity ranking, and the like.

Accordingly, the sample value refers to information corresponding to the sample user in the plurality of data features.

The user result refers to a stable situation of the sample user, for example, the user result may be high in stability.

And S11, performing discrete processing on the sample values to obtain discrete results.

In at least one embodiment of the present invention, the discretization result includes information obtained by discretizing a non-numerical sample value. The discrete result also includes a sample value of a numerical type.

In at least one embodiment of the present invention, the electronic device performs a discretization process on the sample values, and obtaining a discretization result includes:

detecting a data type of the sample value;

Wherein the data types include: character type, numerical type, etc.

The preset ranges are generated according to a plurality of feature values of the target feature, for example, the preset ranges may include [ elementary school, college ] and [ subject, research student ] and the like. The plurality of feature values refer to values corresponding to the target feature, for example, the target feature is: learning, the plurality of feature values may then include: primary school, this department, etc.

The information to be processed is subjected to discrete processing without processing the first numerical value, so that the discrete efficiency can be improved, and meanwhile, the information to be processed is subjected to discrete processing, so that the information datamation can be realized, and the analysis accuracy is improved.

And S12, performing kernel density estimation analysis on the discrete result based on the plurality of data characteristics to obtain characteristic distribution information of each data characteristic.

In at least one embodiment of the present invention, the characteristic distribution information may be a distribution curve of the discrete results on the data characteristic, for example, the characteristic distribution information may be a normal distribution curve.

In at least one embodiment of the present invention, the performing, by the electronic device, a kernel density estimation analysis on the discrete result based on the plurality of data features, and obtaining feature distribution information of each data feature includes:

for each data feature, acquiring an attribute feature of the data feature;

;

wherein the content of the first and second substances,

it is referred to the feature distribution information,

refers to the number of such discrete results,

，

it is meant that the discrete result is,

is referred to as

The number of the discrete results is one,

refers to the kernel function.

Wherein the attribute feature refers to a characteristic of the data feature, for example, the data feature is: age, the attribute characteristic is: distribution characteristics conforming to normal distribution; the data characteristics are: a sales performance characteristic, the attribute characteristic being: fixed pricing summing features.

The preset function may include a gaussian function, a Triangle function, a trigonometric function, and the like.

The kernel function is a preset function matched with the attribute feature, for example, the data feature is: age, the attribute characteristic is: if the distribution characteristic of the normal distribution is met, the kernel function is: a Gaussian function; the data characteristics are: a sales performance characteristic, the attribute characteristic being: fixed pricing and summing characteristics, the kernel function is: the Triangle function.

And a proper kernel function can be accurately selected for the data characteristics through the attribute characteristics, so that the accuracy of the characteristic distribution information is improved.

And S13, extracting the information in each feature distribution information to obtain a training sample.

In at least one embodiment of the present invention, any one of the feature distribution information is included in the training sample.

Specifically, the training sample includes the arbitrary feature data and each of the target feature data.

In at least one embodiment of the present invention, the extracting, by the electronic device, information in each of the feature distribution information to obtain a training sample includes:

Wherein the plurality of feature distribution information includes the target feature distribution information and the remaining feature distribution information.

The arbitrary feature data refers to arbitrary information in the target feature distribution information, for example, if the target feature distribution information is data with an age of 0 to 100 years, the arbitrary feature data may be 8 years old.

The initial feature data refers to any information in the remaining feature distribution information. For example, if the remaining feature distribution information is data from schoolchildren to doctor, the initial feature data may be the subject.

The coexistence probability refers to a probability that the arbitrary feature data and the initial feature data exist in the sample user at the same time.

The preset probability threshold is set according to actual requirements.

The target feature data is that the coexistence probability with the arbitrary feature data is greater than the preset probability threshold, and the target feature data is selected according to the coexistence probability, so that the situation that exclusion information exists between the arbitrary feature data and the target feature data can be avoided, for example, the arbitrary feature data is 2 years old, and the target feature data cannot be doctor students.

And selecting the target characteristic data from the initial characteristic data through the coexistence probability, and then generating the training sample according to the arbitrary characteristic data and the target characteristic data, so that the rationality of the training sample can be improved, and the rejection information between the arbitrary characteristic data and the target characteristic data is avoided.

Specifically, the calculating, by the electronic device, a coexistence probability of the arbitrary feature data and initial feature data in the remaining feature distribution information includes:

By analyzing the coexistence probability in combination with the information correlation degree between the data features, the accuracy of the coexistence probability can be improved.

And S14, determining the feature weight of each data feature according to the discrete result and the user result.

In at least one embodiment of the present invention, the feature weight refers to a weight occupied by the data feature in terms of user stability.

In at least one embodiment of the present invention, the determining, by the electronic device, the feature weight of each data feature according to the discrete result and the user result includes:

discretizing the user result to obtain a labeling result;

；

；

wherein the content of the first and second substances,

the result of the annotation is referred to as the result of the annotation,

，

，

respectively, refer to the weight of the feature,

，

the feature weight value corresponding to each data feature is referred to.

Through the implementation mode, the feature weight can be accurately quantized based on the mathematical relationship between the discrete result and the user result.

And S15, generating a sample result of the training sample according to the training sample and the feature weight.

In at least one embodiment of the present invention, a specific manner in which the electronic device generates the sample result of the training sample according to the training sample and the feature weight is reversible with a specific manner in which the electronic device determines the feature weight of each data feature according to the discrete result and the user result, which is not described in detail herein.

And S16, adjusting a preset network based on the training sample and the sample result to obtain a classification model.

In at least one embodiment of the present invention, the classification model refers to a model obtained after the preset network is adjusted.

In at least one embodiment of the present invention, the adjusting, by the electronic device, a preset network based on the training sample and the sample result to obtain a classification model includes:

Wherein the classification network base stores a plurality of networks for data classification.

The preset network can be generated after the unbalanced samples are adjusted by a pre-constructed learner.

The network parameter may include a learning rate of the preset network, and the like.

The preset network is directly adjusted, the preset network is applicable to the classification scene without parameter adjustment, the adjustment efficiency of the classification model can be improved, the network parameters are adjusted through the network loss values, and the classification accuracy of the classification model can be ensured.

S17, when a classification request is received, request characteristics are obtained according to the classification request.

In at least one embodiment of the invention, the sort request may be triggered and generated by any high-management user in the enterprise.

The classification request carries a user identification code and the like.

The request characteristics refer to information corresponding to the data characteristics of the users needing to perform stability classification in the classification request.

In at least one embodiment of the present invention, the obtaining, by the electronic device, the request feature according to the classification request includes:

analyzing the message of the classification request to obtain data information carried by the message;

extracting a user identification code from the data information;

and extracting information corresponding to the plurality of data characteristics from a user information base according to the user identification code to be used as the request characteristics.

The user identification code refers to a tag capable of identifying a user. For example, the user identification code may be a job number.

And S18, processing the request characteristics according to the classification model to obtain the classification result of the classification request.

In at least one embodiment of the present invention, the classification result refers to a stability condition corresponding to a user who needs to perform stability analysis in the classification request.

It is emphasized that the classification result may also be stored in a node of a blockchain in order to further ensure the privacy and security of the classification result.

In at least one embodiment of the present invention, the electronic device discretizes the request feature and inputs the discretized result into the classification model to obtain the classification result.

FIG. 2 is a functional block diagram of a preferred embodiment of the data classifying apparatus based on artificial intelligence according to the present invention. The artificial intelligence based data classification device 11 includes an acquisition unit 110, a discretization unit 111, an analysis unit 112, an extraction unit 113, a determination unit 114, a generation unit 115, an adjustment unit 116, and a processing unit 117. The module/unit referred to herein is a series of computer readable instruction segments that can be accessed by the processor 13 and perform a fixed function and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

The obtaining unit 110 obtains a plurality of initial samples, each of which includes a sample value of a sample user in a plurality of data characteristics and a user result corresponding to the sample user.

The discrete unit 111 performs discrete processing on the sample values to obtain discrete results.

In at least one embodiment of the present invention, the discretizing unit 111 performs discretization on the sample values, and obtaining a discretization result includes:

detecting a data type of the sample value;

Wherein the data types include: character type, numerical type, etc.

The analysis unit 112 performs kernel density estimation analysis on the discrete result based on the plurality of data features to obtain feature distribution information of each data feature.

In at least one embodiment of the present invention, the analyzing unit 112 performs a kernel density estimation analysis on the discrete result based on the plurality of data features, and obtaining feature distribution information of each data feature includes:

for each data feature, acquiring an attribute feature of the data feature;

;

wherein the content of the first and second substances,

it is referred to the feature distribution information,

refers to the number of such discrete results,

，

it is meant that the discrete result is,

is referred to as

The number of the discrete results is one,

refers to the kernel function.

The extracting unit 113 extracts information in each of the feature distribution information to obtain a training sample.

In at least one embodiment of the present invention, the extracting unit 113 extracts information in each of the feature distribution information, and obtaining the training sample includes:

The preset probability threshold is set according to actual requirements.

Specifically, the calculating, by the extracting unit 113, a coexistence probability of the arbitrary feature data and the initial feature data in the remaining feature distribution information includes:

The determining unit 114 determines a feature weight of each data feature according to the discrete result and the user result.

In at least one embodiment of the present invention, the determining unit 114 determines the feature weight of each data feature according to the discrete result and the user result, including:

discretizing the user result to obtain a labeling result;

；

；

wherein the content of the first and second substances,

the result of the annotation is referred to as the result of the annotation,

，

，

respectively, refer to the weight of the feature,

，

the feature weight value corresponding to each data feature is referred to.

The generating unit 115 generates a sample result of the training sample according to the training sample and the feature weight.

In at least one embodiment of the present invention, a specific manner in which the generating unit 115 generates the sample result of the training sample according to the training sample and the feature weight is reversible with a specific manner in which the determining unit 114 determines the feature weight of each data feature according to the discrete result and the user result, which is not described in detail herein.

The adjusting unit 116 adjusts a preset network based on the training samples and the sample results to obtain a classification model.

In at least one embodiment of the present invention, the adjusting unit 116 adjusts a preset network based on the training samples and the sample results, and obtaining the classification model includes:

When a classification request is received, the obtaining unit 110 obtains request characteristics according to the classification request.

The classification request carries a user identification code and the like.

In at least one embodiment of the present invention, the obtaining unit 110 obtains the request feature according to the classification request, including:

extracting a user identification code from the data information;

The processing unit 117 processes the request features according to the classification model to obtain a classification result of the classification request.

In at least one embodiment of the present invention, the processing unit 117 discretizes the request feature and inputs the discretized result into the classification model to obtain the classification result.

Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing an artificial intelligence-based data classification method.

In one embodiment of the present invention, the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as an artificial intelligence based data classification program, stored in the memory 12 and executable on the processor 13.

It will be appreciated by a person skilled in the art that the schematic diagram is only an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and that it may comprise more or less components than shown, or some components may be combined, or different components, e.g. the electronic device 1 may further comprise an input output device, a network access device, a bus, etc.

The Processor 13 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The processor 13 is an operation core and a control center of the electronic device 1, and is connected to each part of the whole electronic device 1 by various interfaces and lines, and executes an operating system of the electronic device 1 and various installed application programs, program codes, and the like.

Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to implement the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer readable instructions in the electronic device 1. For example, the computer readable instructions may be divided into an acquisition unit 110, a discrete unit 111, an analysis unit 112, an extraction unit 113, a determination unit 114, a generation unit 115, an adjustment unit 116, and a processing unit 117.

The memory 12 may be used for storing the computer readable instructions and/or modules, and the processor 13 implements various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. The memory 12 may include non-volatile and volatile memories, such as: a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.

The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a memory having a physical form, such as a memory stick, a TF Card (Trans-flash Card), or the like.

The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by hardware that is configured to be instructed by computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying said computer readable instruction code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM).

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

With reference to fig. 1, the memory 12 in the electronic device 1 stores computer-readable instructions to implement an artificial intelligence based data classification method, and the processor 13 can execute the computer-readable instructions to implement:

performing discrete processing on the sample value to obtain a discrete result;

Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer readable instructions, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The computer readable storage medium has computer readable instructions stored thereon, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:

performing discrete processing on the sample value to obtain a discrete result;

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The plurality of units or devices may also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. An artificial intelligence based data classification method, characterized in that the artificial intelligence based data classification method comprises:

performing discrete processing on the sample value to obtain a discrete result;

2. The artificial intelligence based data classification method of claim 1, wherein the discretizing the sample values to obtain discretized results comprises:

detecting a data type of the sample value;

3. The artificial intelligence based data classification method of claim 1, wherein the performing a kernel density estimation analysis on the discrete results based on the plurality of data features to obtain feature distribution information of each data feature comprises:

for each data feature, acquiring an attribute feature of the data feature;

;

wherein the content of the first and second substances,

it is referred to the feature distribution information,

refers to the number of such discrete results,

，

it is meant that the discrete result is,

is referred to as

The number of the discrete results is one,

refers to the kernel function.

4. The artificial intelligence based data classification method of claim 1, wherein the extracting information in each feature distribution information to obtain training samples comprises:

5. The artificial intelligence based data classification method of claim 4, wherein the calculating the coexistence probability of the arbitrary feature data and the initial feature data in the remaining feature distribution information comprises:

6. The artificial intelligence based data classification method of claim 1, wherein the determining a feature weight for each data feature according to the discretized results and the user results comprises:

discretizing the user result to obtain a labeling result;

；

；

wherein the content of the first and second substances,

the result of the annotation is referred to as the result of the annotation,

，

，

respectively, refer to the weight of the feature,

，

the feature weight value corresponding to each data feature is referred to.

7. The artificial intelligence based data classification method of claim 1, wherein the adjusting a predetermined network based on the training samples and the sample results to obtain a classification model comprises:

8. An artificial intelligence based data classification apparatus, characterized in that the artificial intelligence based data classification apparatus comprises:

9. An electronic device, characterized in that the electronic device comprises:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the artificial intelligence based data classification method of any of claims 1 to 7.

10. A computer-readable storage medium characterized by: the computer-readable storage medium has stored therein computer-readable instructions that are executed by a processor in an electronic device to implement the artificial intelligence based data classification method of any of claims 1 to 7.