CN111191091A - Data classification method and system - Google Patents

Data classification method and system Download PDF

Info

Publication number
CN111191091A
CN111191091A CN201911395075.3A CN201911395075A CN111191091A CN 111191091 A CN111191091 A CN 111191091A CN 201911395075 A CN201911395075 A CN 201911395075A CN 111191091 A CN111191091 A CN 111191091A
Authority
CN
China
Prior art keywords
classification
enterprise
data
memory
characteristic variables
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911395075.3A
Other languages
Chinese (zh)
Inventor
陈文�
林佳仪
巫源睿
周凡吟
曾途
吴桐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Business Big Data Technology Co Ltd
Original Assignee
Chengdu Business Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Business Big Data Technology Co Ltd filed Critical Chengdu Business Big Data Technology Co Ltd
Priority to CN201911395075.3A priority Critical patent/CN111191091A/en
Publication of CN111191091A publication Critical patent/CN111191091A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Technology Law (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data classification method and a system, wherein the method comprises the following steps: obtaining data to be classified, wherein the data comprises an enterprise name and a plurality of characteristic variables; determining the industry to which the enterprise belongs according to the enterprise name, and calling out the classification rule of the industry to which the enterprise belongs from a rule base; and classifying according to the extracted classification rule and the numerical values of the plurality of characteristic variables. By the method or the system, the enterprise data can be classified more accurately, and then a corresponding method or model can be adopted in a targeted manner during further application, so that the data processing result is more accurate.

Description

Data classification method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a data classification method and a data classification system suitable for enterprise credit risk early warning.
Background
Based on the big data era, the system can effectively help enterprises or others to create more value by collecting and analyzing various data generated in enterprise operation. For example, by analyzing the hot-market type and buying crowd of a product, a business may be helped to make more accurate product marketing strategies. For another example, analysis of the transaction data of the enterprise can help the enterprise form a credit profile, which helps the enterprise to perform financing or loan. Taking the enterprise credit assessment as an example, the traditional credit assessment model performs credit assessment on an enterprise through a model taking methods such as logistic regression and discriminant analysis as main methods, and although the enterprise credit risk can be evaluated, the data of the traditional credit assessment model mainly depends on transaction data, so that the traditional credit assessment model has good reliability for large enterprises with large amount of transaction data, and for small companies lacking loan experience and transaction behaviors, the credit records of the small companies are automatically regarded as large credit risks and further influence financing or loan of the small companies. Therefore, by classifying the enterprises, different credit assessment models are adopted based on different types of enterprises, and the accuracy of credit assessment can be improved. However, the current enterprise classification is generally classified according to indexes such as enterprise workers, income, and total assets, which is a rough classification method, and such classification method is not suitable for various applications such as enterprise credit assessment, and cannot improve the accuracy of the assessment result.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a data classification method and system suitable for enterprise credit risk early warning.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
a method of classifying data comprising the steps of:
obtaining data to be classified from a database, wherein the data comprises an enterprise name and a plurality of characteristic variables;
determining the industry to which the enterprise belongs according to the enterprise name, and calling out the classification rule of the industry to which the enterprise belongs from a rule base;
and classifying according to the extracted classification rule and the numerical values of the plurality of characteristic variables.
In the method, different classification rules are adopted for different industries, the industries to which the industries belong are determined according to enterprise names during classification, and then classification is performed based on the classification rules of the industries, so that the method has higher accuracy compared with the traditional method that a single classification mode of enterprise scale is adopted for all the industries. And corresponding application processing is carried out based on the classified data, so that the processing result is more accurate and referential.
As a preferred embodiment, the step of classifying according to the extracted classification rule and the numerical values of the plurality of feature variables includes:
and judging whether the numerical values of all the characteristic variables simultaneously meet the classification condition of a certain class or not according to the sequence of the classes from high to low by taking the requirement of the classification condition as a reference, classifying the characteristic variables into the class if the numerical values of all the characteristic variables simultaneously meet the classification condition of the certain class, and classifying the characteristic variables into the class next to the class if the numerical values of any one of the characteristic variables do not meet the classification condition of the class.
On the other hand, an embodiment of the present invention also provides a data classification system, including:
the system comprises a first memory, a second memory and a third memory, wherein the first memory is provided with a database and is used for storing data to be classified, and the data comprises an enterprise name and a plurality of characteristic variables;
the second memory is configured with a rule base and used for storing classification rules of various industries;
and the classification equipment with the processor is in data communication with the first memory and the second memory respectively, and is used for acquiring data to be classified from the first memory, determining the industry to which the enterprise belongs according to the enterprise name, calling out the classification rule of the industry to which the enterprise belongs from the rule base of the second memory, and classifying according to the called out classification rule and the numerical values of the characteristic variables.
Compared with the prior art, the method or the system can classify the enterprise data more accurately, and then a corresponding method or model can be adopted pertinently during further application, so that the data processing result is more accurate.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of a data classification method according to an embodiment.
FIG. 2 is a flow chart of another data classification method in an embodiment.
FIG. 3 is a diagram of a data classification system according to an embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 3, the present embodiment provides a data classification system, which includes a first memory, a second memory and a classification device with a processor, wherein the classification device is respectively in communication with the first memory and the second memory for data transmission. The classification device can be a PC, a notebook computer, a server and other devices with data processing functions.
The first memory is configured with a database for storing data to be classified, wherein the data comprises an enterprise name and a plurality of characteristic variables. The second memory is configured with a rule base for storing classification rules of various industries.
And the classification equipment acquires the data to be classified from the first memory, determines the industry to which the enterprise belongs according to the enterprise name, calls out the classification rule of the industry to which the enterprise belongs from the rule base of the second memory, and classifies the data according to the called out classification rule and the numerical values of the characteristic variables.
The embodiment also provides a method for realizing data classification based on the system. Referring to fig. 2, the data classification method includes the following steps:
s100, obtaining data to be classified, wherein the data comprises an enterprise name and a plurality of characteristic variables.
The number of feature variables may vary for different applications, such as the aforementioned applications for enterprise credit assessment, feature variable registration capital and number of one degree associates. It is to be understood that the data herein includes not only the name of the characteristic variable but also the specific numerical value of the characteristic variable. For example, a business named AAA has a registered capital of 200 ten thousand and a degree of associator 15, then the data described in this step includes: AAA, registered capital 200 ten thousand, and a number of associators of 15.
S200, determining the industry to which the enterprise belongs according to the enterprise name, and calling out the classification rule of the industry to which the enterprise belongs from the rule base.
There may be different industry divisions for different applications, such as 21 categories for applications for corporate credit assessment, and other divisions for other applications, such as production manufacturing (including agriculture, forestry, animal husbandry, fishery, mining, manufacturing, construction, finance, land based), profitable services (including power, thermal, gas and water production and supply, wholesale and retail, transportation, warehousing and postal, lodging and catering, information transfer, software and information technology services, rental and business services, cultural, sports and entertainment), non-profitable services (including scientific and technical services, water conservancy, environmental and public facilities management, residential, repair and other services, education, health and social work, public administration, social security and social organization, international organization) and other of these 4 major categories.
And S300, classifying according to the extracted classification rule and the numerical values of the plurality of characteristic variables.
When classifying, with the requirement level of the classification condition as a reference, it is first determined whether the values of all the feature variables satisfy the classification condition of a certain class at the same time in the order of class from high to low, and if the values satisfy the classification condition at the same time, the classification is performed on the class.
The enterprise data may be classified based on different applications, and the data classification method of the present invention will be described below by taking the enterprise credit evaluation application as an example only.
Referring to fig. 1, the data classification method provided in the present embodiment includes the following steps based on the enterprise credit evaluation:
s10, obtaining data to be classified, wherein the data comprises business names, registered capital and one-degree associator quantity. The related party refers to a party having a relationship with the enterprise, and may be a natural person or an enterprise, the relationship may be an arbitrary role (such as a manager, a director, etc.), a stockholder, an investment, etc., and once the relationship means that the related party is directly related to the enterprise, for example, company a is an investor of company B, and for example, zhang san is a stockholder of company B.
And S20, determining the industry to which the enterprise belongs according to the enterprise name, and calling the classification rule of the industry to which the enterprise belongs. In this embodiment, the types of industries are classified into 21 types, which are: agriculture, forestry, animal husbandry, fisheries, mining, manufacturing, power, thermal, gas and water production and supply, construction, wholesale and retail, transportation, warehousing and postal, lodging and catering, information transmission, software and information technology services, financial, land-based, leasing and business services, scientific research and technical services, water, environmental and public facilities management, residential, repair and other services, educational, health and social operations, cultural, sports and entertainment, public management, social security and social organisation, international organisation, and others. Of course, there may be different ways of partitioning based on different applications.
And S30, classifying according to the extracted classification rule, the registered capital and the number of first degree associator.
For the application of enterprise credit assessment, classified characteristic variables refer to registered capital and the number of one-degree related parties, but different classification rules are adopted for different industries, so that data can be classified more accurately. When classifying, the requirement of the classification condition is used as a reference, firstly, whether the registration capital and the number of the first degree related parties simultaneously satisfy the classification condition of a certain class is judged according to the sequence of the classes from high to low, if the registration capital and the number of the first degree related parties simultaneously satisfy the classification condition of the certain class, the classification is carried out, and if the registration capital and the number of the first degree related parties cannot satisfy the classification condition of the certain class simultaneously, namely, if any one of the characteristic variables does not satisfy the classification condition of the certain class, the classification is carried out. That is, for each classification, if only one (any one) of the two feature variables, i.e., the registered capital and the number of one-degree-of-relevance parties, satisfies the classification condition of the class, the classification is performed as the next class of the class.
For example, for agriculture, forestry, animal husbandry and fishery, if the registered capital is more than or equal to 2000 ten thousand and the number of related parties per degree is more than or equal to 38, the large-scale enterprise is classified; classifying the enterprise as a small and medium-sized enterprise if the registered capital is between 360 and 2000 ten thousand and the number of the one-degree related parties is between 6 and 38; and if the registered capital is less than 360 ten thousand and the number of the one-degree related parties is less than 6, classifying the small micro-enterprise. For example, a small-to-medium enterprise is classified as a registered capital of 2000 ten thousand or more and a number of related parties of 6 to 38. If the registered capital is more than or equal to 2000 ten thousand and the number of the one-degree related parties is less than 6, the enterprise is classified as a medium-sized or small enterprise.
Based solely on the application of the enterprise credit evaluation, classification rules for various industries are illustrated in this embodiment, as shown in the following table. However, it is easy to understand that the classification rule is only directed to the application of enterprise credit assessment and is obtained through experiments, the classification rule is only an example, different classification rules can be formulated based on different applications and different requirements, and the method of the present invention only provides such an idea.
Figure BDA0002346072480000071
Figure BDA0002346072480000081
Figure BDA0002346072480000091
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A method of classifying data, comprising the steps of:
obtaining data to be classified from a database, wherein the data comprises an enterprise name and a plurality of characteristic variables;
determining the industry to which the enterprise belongs according to the enterprise name, and calling out the classification rule of the industry to which the enterprise belongs from a rule base;
and classifying according to the extracted classification rule and the numerical values of the plurality of characteristic variables.
2. The method of claim 1, wherein the step of classifying according to the retrieved classification rules and the values of the plurality of feature variables comprises:
and judging whether the numerical values of all the characteristic variables simultaneously meet the classification condition of a certain class or not according to the sequence of the classes from high to low by taking the requirement of the classification condition as a reference, classifying the characteristic variables into the class if the numerical values of all the characteristic variables simultaneously meet the classification condition of the certain class, and classifying the characteristic variables into the class next to the class if the numerical values of any one of the characteristic variables do not meet the classification condition of the class.
3. The method of claim 1, wherein the characteristic variables comprise registered capital and number of one degree associators.
4. A system for classifying data, comprising:
the system comprises a first memory, a second memory and a third memory, wherein the first memory is provided with a database and is used for storing data to be classified, and the data comprises an enterprise name and a plurality of characteristic variables;
the second memory is configured with a rule base and used for storing classification rules of various industries;
and the classification equipment with the processor is in data communication with the first memory and the second memory respectively, and is used for acquiring data to be classified from the first memory, determining the industry to which the enterprise belongs according to the enterprise name, calling out the classification rule of the industry to which the enterprise belongs from the rule base of the second memory, and classifying according to the called out classification rule and the numerical values of the characteristic variables.
5. The system according to claim 4, wherein the classification device determines whether the values of all the feature variables satisfy the classification condition of a certain category in descending order of category with reference to the requirement level of the classification condition when performing classification, classifies the category as the category if the values of all the feature variables satisfy the classification condition of the certain category at the same time, and classifies the category as the next category of the category if the values of any one of the feature variables do not satisfy the classification condition of the category.
6. The system of claim 4, wherein the characteristic variables include registered capital and number of one degree associators.
CN201911395075.3A 2019-12-30 2019-12-30 Data classification method and system Withdrawn CN111191091A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911395075.3A CN111191091A (en) 2019-12-30 2019-12-30 Data classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911395075.3A CN111191091A (en) 2019-12-30 2019-12-30 Data classification method and system

Publications (1)

Publication Number Publication Date
CN111191091A true CN111191091A (en) 2020-05-22

Family

ID=70707890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911395075.3A Withdrawn CN111191091A (en) 2019-12-30 2019-12-30 Data classification method and system

Country Status (1)

Country Link
CN (1) CN111191091A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079124A (en) * 2006-05-26 2007-11-28 辽宁三鑫发展有限公司 Method for converting enterprise information to electronic media and sequencing according to trade
CN105868272A (en) * 2016-03-18 2016-08-17 乐视网信息技术(北京)股份有限公司 Multimedia file classification method and apparatus
CN107193915A (en) * 2017-05-15 2017-09-22 北京因果树网络科技有限公司 A kind of company information sorting technique and device
CN107342882A (en) * 2016-05-03 2017-11-10 腾讯科技(深圳)有限公司 The sorting technique and sorter of a kind of terminal
CN109409677A (en) * 2018-09-27 2019-03-01 深圳壹账通智能科技有限公司 Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium
CN110598090A (en) * 2019-07-23 2019-12-20 平安科技(深圳)有限公司 Interest tag generation method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079124A (en) * 2006-05-26 2007-11-28 辽宁三鑫发展有限公司 Method for converting enterprise information to electronic media and sequencing according to trade
CN105868272A (en) * 2016-03-18 2016-08-17 乐视网信息技术(北京)股份有限公司 Multimedia file classification method and apparatus
CN107342882A (en) * 2016-05-03 2017-11-10 腾讯科技(深圳)有限公司 The sorting technique and sorter of a kind of terminal
CN107193915A (en) * 2017-05-15 2017-09-22 北京因果树网络科技有限公司 A kind of company information sorting technique and device
CN109409677A (en) * 2018-09-27 2019-03-01 深圳壹账通智能科技有限公司 Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium
CN110598090A (en) * 2019-07-23 2019-12-20 平安科技(深圳)有限公司 Interest tag generation method and device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
国家***: "统计上大中小微型企业划分办法(2017)", 《国家***》 *
武建华等: "提取有效规则的关联分类算法", 《西安交通大学学报》 *

Similar Documents

Publication Publication Date Title
CN107945024B (en) Method for identifying internet financial loan enterprise operation abnormity, terminal equipment and storage medium
Zhao et al. Risk analysis of the agri-food supply chain: A multi-method approach
US20190164015A1 (en) Machine learning techniques for evaluating entities
WO2020062660A1 (en) Enterprise credit risk evaluation method, apparatus and device, and storage medium
CN109492945A (en) Business risk identifies monitoring method, device, equipment and storage medium
US10579651B1 (en) Method, system, and program for evaluating intellectual property right
Omidi et al. The efficacy of predictive methods in financial statement fraud
CN110930250A (en) Enterprise credit risk prediction method and system, storage medium and electronic equipment
Wong et al. Financial accounting fraud detection using business intelligence
CN108492001A (en) A method of being used for guaranteed loan network risk management
CN108241867A (en) A kind of sorting technique and device
CN113434575A (en) Data attribution processing method and device based on data warehouse and storage medium
CN114444863A (en) Enterprise production safety assessment method, system, device and storage medium
CN110222180A (en) A kind of classification of text data and information mining method
US20230088044A1 (en) End-to-end prospecting platform utilizing natural language processing to reverse engineer client lists
CN114722789B (en) Data report integrating method, device, electronic equipment and storage medium
CN114398562B (en) Shop data management method, device, equipment and storage medium
CN111191091A (en) Data classification method and system
CN115660451A (en) Supplier risk early warning method, device, equipment and medium based on RPA
CN108805603A (en) Marketing activity method for evaluating quality, server and computer readable storage medium
Shishaev et al. Food security management in the Western Russian Arctic zone: Current status and information support issues
Liu et al. Application of master data classification model in enterprises
CN112487209A (en) String mark behavior analysis method based on knowledge graph, terminal equipment and storage medium
CN111612023A (en) Classification model construction method and device
CN111026705A (en) Building engineering file management method, system and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200522

WW01 Invention patent application withdrawn after publication