CN113902533B - Application method suitable for finance and tax field index self-definition and automatic operation - Google Patents

Application method suitable for finance and tax field index self-definition and automatic operation Download PDF

Info

Publication number
CN113902533B
CN113902533B CN202111180467.5A CN202111180467A CN113902533B CN 113902533 B CN113902533 B CN 113902533B CN 202111180467 A CN202111180467 A CN 202111180467A CN 113902533 B CN113902533 B CN 113902533B
Authority
CN
China
Prior art keywords
data
risk
tax
value
preprocessing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111180467.5A
Other languages
Chinese (zh)
Other versions
CN113902533A (en
Inventor
吴俊�
刘冬
黄友善
姜汉峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tax And Security Technology Hangzhou Co ltd
Original Assignee
Tax And Security Technology Hangzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tax And Security Technology Hangzhou Co ltd filed Critical Tax And Security Technology Hangzhou Co ltd
Priority to CN202111180467.5A priority Critical patent/CN113902533B/en
Publication of CN113902533A publication Critical patent/CN113902533A/en
Application granted granted Critical
Publication of CN113902533B publication Critical patent/CN113902533B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/10Tax strategies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Technology Law (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of computers, and particularly relates to an application suitable for index customization and automatic operation in the financial and tax field. The application comprises the following steps: step 1: dividing tax data into a plurality of data groups according to set characteristics; the set features include: time characteristics and/or money characteristics; at the same time, counting the possible intersection parts of each data group; step 2: carrying out data association in each data group, wherein the data association specifically comprises the following steps: and analyzing each data in the data group to calculate the association degree value of the data and other data, and connecting the associated data. The tax data is decomposed into a plurality of data groups, then the data groups are associated through the association degree values, and finally tax risk indexes are assessed through the dispersion and integration of the calculation of the risk weights, so that the efficiency and the accuracy are greatly improved.

Description

Application method suitable for finance and tax field index self-definition and automatic operation
Technical Field
The invention belongs to the technical field of computers, and particularly relates to an application method suitable for index self-definition and automatic operation in the financial and tax field.
Background
In the financial tax field, professional business personnel make an index aiming at tax wind control of enterprises, which is a process which needs to be continuously made and verified and repeated for many times; in the process, the business personnel need to repeatedly modify the wind control index model for a plurality of times, and the process is realized after verification and delivery, so that the process period is longer, and the production requirement of enterprises is difficult to meet.
The shortest path problem is a classical algorithm problem in graph theory research, which aims to find the shortest path between two nodes in a graph (consisting of nodes and paths). The specific form of the algorithm comprises: determining a shortest path problem of an origin: i.e. the problem of finding the shortest path by knowing the starting point. Determining the shortest path problem of the endpoint: in contrast to the problem of determining the origin, the problem is that the termination node is known and the shortest path is found. In the undirected graph this problem is exactly equivalent to the problem of determining the starting point, in the directed graph this problem is equivalent to the problem of determining the starting point by reversing the direction of all paths.
Patent No. CN202110518344.1A discloses a method for evaluating the risk of income tax of enterprises based on machine learning. The method specifically comprises the following steps: firstly, planning a feature set of a machine learning data set, and selecting 290 target sets of machine learning from the feature set according to management characteristics of income tax of enterprises; classifying and extracting the data, and classifying and extracting the data according to the systems and forms with different characteristics; then forming a machine learning final data set according to the user collection; selecting decision trees and support vector machine algorithm models for integration and connection to form a machine learning algorithm model suitable for income tax; the final operation outputs a result and a result verification feedback.
The tax data set is analyzed and calculated in a machine learning mode, and the tax risk assessment efficiency can be improved, but the processing speed is obviously reduced when facing enterprises with large data volume, firstly, tax data cannot be found in a large amount of data, and secondly, the tax data is acquired, and the processing workload is increased because a short path is not found.
Disclosure of Invention
Therefore, the main purpose of the invention is to provide an application method suitable for the self-defining and automatic operation of indexes in the financial tax field, which is characterized in that tax data are decomposed into a plurality of data groups, then the data groups are associated by association degree values, and finally tax risk indexes are assessed by the dispersion and integration of the calculation of risk weights, so that the efficiency and the accuracy are greatly improved.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
an application method suitable for finance and tax field index self-definition and automatic operation, the application method comprises the following steps:
step 1: dividing tax data into a plurality of data groups according to set characteristics; the set features include: time characteristics and/or money characteristics; at the same time, counting the possible intersection parts of each data group;
step 2: carrying out data association in each data group, wherein the data association specifically comprises the following steps: analyzing each data in the data group to calculate the association degree value of the data and other data, and connecting the associated data;
step 3: carrying out data risk analysis by using a preset data analysis model in each data group to obtain a risk weight corresponding to the data group; carrying out data risk analysis on each intersection part to obtain a risk weight corresponding to the intersection part; based on the risk weight corresponding to the data group and the risk weight corresponding to the intersection part, solving the risk weight of the non-intersection part;
step 4: when a risk index generation command is received, analyzing the risk index generation command to obtain a tax data group of the risk index generation command; after screening out tax data groups, judging whether the screened tax data groups have intersection parts by using a discriminator, if so, accumulating risk weights corresponding to all intersection parts, calculating the risk weights of all non-intersection parts, sorting the non-intersection parts according to the order from large to small based on the calculated risk weights of the non-intersection parts, and screening out the non-intersection parts with the risk weights arranged in the first three;
step 5: in the non-intersection part and the intersection part, generating a command based on the risk index, finding an entry value, and starting data search in each intersection part and the non-intersection part based on entry data, specifically including: starting from the entry data, finding the data with the highest association degree value with the previous data as the next data until the data is traversed to the end point;
step 6: and based on the non-intersection parts and the duty ratio of all data found by data searching in the intersection parts in each non-intersection part or intersection part, a final risk weight is obtained and used as a generated risk index.
Further, in the step 1, the time feature includes: time of tax data generation and/or time of recording; the money feature comprises: the source and/or use of tax data.
Further, the method of analyzing each data in the data group in the step 2 to calculate the association degree value between the data and other data executes the following steps: classifying the characteristics of each data in the data group to obtain the classification of the characteristics; the features are one or more of the set features; membership labeling is carried out on each category to obtain membership types of each category; selecting a class with the same membership type as the preset membership type and highest priority from all classes as an associated feature according to the preset membership type priority order; acquiring data containing the associated features from a data group; from the acquired data, determining associated data with data to be associated in the data group.
Further, determining, from the acquired data, associated data with data to be associated in the data group specifically includes: according to the characteristic information of the data to be correlated and the acquired data, determining the similarity of the data to be correlated and the acquired data; and determining the data associated with the data to be associated according to the similarity of the data.
Further, if the characteristics of the data include time of data generation, time of recording, source and use of tax data, determining similarity between the data to be associated and each acquired data according to the characteristic information of the data to be associated and the acquired data, specifically includes: determining the similarity of the data to be associated and the time of the data generation of the data according to the time of the data to be associated and the data generation of the data; determining the similarity of the recorded time of the data to be associated and the data according to the recorded time of the data to be associated and the recorded time of the data; determining the similarity of the data to be correlated and the source of the data according to the data to be correlated and the source of the data; determining the similarity of the data to be associated and the application of the data according to the data to be associated and the application of the data; and determining the similarity of the data to be correlated and the data according to the generated time similarity, the recorded time similarity, the source similarity and the application similarity.
Further, in the step 3, the method for performing data risk analysis by using a preset data analysis model in each data group performs the following steps: the data analysis model is expressed using the following formula:wherein R is a risk weight; dm is a risk threshold and is a set value; d is a risk value of the data group; and after the risk value of the data group is obtained through calculation, substituting the risk value into the data analysis model, so that the risk weight value can be obtained.
Further, the method for calculating the risk value of the data group executes the following steps: selecting one data from the data group as a starting value, and calculating a risk value of the starting value by using the following formula: wherein T is a data value, and K is a mean value of association degree values of the data and other associated data; after the risk value of the initial value is obtained, repeatedly calculating the risk value from other data values with highest association degree values with the initial value until all the risk values of the data in the data group are calculated; and calculating the average value of all risk values as the risk weight of the data group.
Further, the method further comprises the step of preprocessing the tax data before the data is divided into a plurality of data groups according to the set characteristics; the method specifically comprises the following steps: carrying out structuring treatment on tax data to be preprocessed to obtain structured tax data; the tax data comprises a data field to be preprocessed; determining the attribute corresponding to each data field and the preprocessing rule subordinate to each attribute; forming a preprocessing rule set by utilizing preprocessing rules belonging to each attribute; preprocessing the tax data based on the preprocessing rule set.
Further, the preprocessing the tax data based on the preprocessing rule set includes: acquiring the data volume of the tax data; when the data quantity exceeds a preset threshold value, generating a plurality of data preprocessing tasks according to the data quantity; the data preprocessing task comprises a data field column subset and/or a data field row subset which need preprocessing; configuring a corresponding preprocessing rule subset for each data preprocessing task from the preprocessing rule set; all data preprocessing tasks are performed in a distributed manner.
The application method suitable for the customized and automatic operation of the indexes in the financial tax field has the following beneficial effects:
1. the efficiency is higher: according to the invention, data searching is realized in a shortest path mode, so that the efficiency of risk index generation is improved; this is achieved mainly by two aspects: 1. natural realization of shortest path: when the shortest path is realized, the data is not searched each time through an algorithm, but the data search is normalized each time through a fixed mode, in other words, the data search is not performed each time through an algorithm, and the data with the highest association degree value with the previous data is found from the entry data to serve as the next data until the data is traversed to the end point; therefore, the data searching based on the shortest path is realized by using the least resources and the shortest time, and the efficiency is greatly improved; 2. the shortest path algorithm improves the data acquisition efficiency: the invention screens the data required by the generation of the index through the shortest path, and the data acquisition efficiency can be greatly improved by using the shortest path method in the screening process.
2. The accuracy is higher: the invention does not simply and directly calculate based on classified data groups when calculating and generating the risk index, but considers the intersection part between the data groups, and has the advantage of avoiding inaccurate data caused by repeated data.
Drawings
FIG. 1 is a flow chart of an application method suitable for finance and tax field index custom and automatic operation according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a structure of associated data of connection of an application method applicable to finance and tax field index customization and automation operation according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a structure of a system of the present invention, wherein the structure is connected after a non-intersection part of the first three risk weights are selected by an application method suitable for the customized and automatic operation of indexes in the financial and tax field;
fig. 4 is a schematic diagram of data searching by using an application method suitable for finance and tax field index customization and automation operation according to an embodiment of the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the accompanying drawings.
Example 1
As shown in fig. 1
An application adapted for tax domain index customization and automation, the method performing the steps of:
step 1: dividing tax data into a plurality of data groups according to set characteristics; the set features include: time characteristics and/or money characteristics; at the same time, counting the possible intersection parts of each data group;
step 2: carrying out data association in each data group, wherein the data association specifically comprises the following steps: analyzing each data in the data group to calculate the association degree value of the data and other data, and connecting the associated data;
step 3: carrying out data risk analysis by using a preset data analysis model in each data group to obtain a risk weight corresponding to the data group; carrying out data risk analysis on each intersection part to obtain a risk weight corresponding to the intersection part; based on the risk weight corresponding to the data group and the risk weight corresponding to the intersection part, solving the risk weight of the non-intersection part;
step 4: when a risk index generation command is received, analyzing the risk index generation command to obtain a tax data group of the risk index generation command; after screening out tax data groups, judging whether the screened tax data groups have intersection parts by using a discriminator, if so, accumulating risk weights corresponding to all intersection parts, calculating the risk weights of all non-intersection parts, sorting the non-intersection parts according to the order from large to small based on the calculated risk weights of the non-intersection parts, and screening out the non-intersection parts with the risk weights arranged in the first three;
step 5: in the non-intersection part and the intersection part, generating a command based on the risk index, finding an entry value, and starting data search in each intersection part and the non-intersection part based on entry data, specifically including: starting from the entry data, finding the data with the highest association degree value with the previous data as the next data until the data is traversed to the end point;
step 6: and based on the non-intersection parts and the duty ratio of all data found by data searching in the intersection parts in each non-intersection part or intersection part, a final risk weight is obtained and used as a generated risk index.
Referring to fig. 2, the data association between the associated data in fig. 2 ranges from 1 to 4. And according to the calculated data association degree, associating the data. Each letter in the figure represents a datum.
Referring to fig. 3, the arbiter in fig. 3 is used to determine which cluster data search should be performed from. In the three intersection parts and the non-intersection part which are screened out, the entrance data can be found by the discriminator. Each number in the graph represents a risk weight for each data.
Referring to fig. 4, a is the start data and B is the end of the data in fig. 4. Each number in the middle represents a risk weight for each data group.
Example 2
On the basis of the above embodiment, in the step 1, the time feature includes: time of tax data generation and/or time of recording; the money feature comprises: the source and/or use of tax data.
Example 3
On the basis of the above embodiment, the method in step 2 for analyzing each data in the data group to calculate the association degree value between the data and other data performs the following steps: classifying the characteristics of each data in the data group to obtain the classification of the characteristics; the features are one or more of the set features; membership labeling is carried out on each category to obtain membership types of each category; selecting a class with the same membership type as the preset membership type and highest priority from all classes as an associated feature according to the preset membership type priority order; acquiring data containing the associated features from a data group; from the acquired data, determining associated data with data to be associated in the data group.
Example 4
On the basis of the above embodiment, determining, from the acquired data, associated data with data to be associated in the data group specifically includes: according to the characteristic information of the data to be correlated and the acquired data, determining the similarity of the data to be correlated and the acquired data; and determining the data associated with the data to be associated according to the similarity of the data.
In particular, data (Data) is a representation of facts, concepts or instructions that may be processed by manual or automated means. After the data is interpreted and given a certain meaning, the data becomes information. Data processing (data processing) is the collection, storage, retrieval, processing, transformation, and transmission of data.
The basic purpose of data processing is to extract and derive data that is valuable and meaningful to some particular person from a large, possibly unorganized, unintelligible, data.
Data processing is a fundamental link of system engineering and automatic control. Data processing extends throughout various areas of social production and social life. The development of data processing technology and the breadth and depth of application thereof greatly influence the progress of human society development.
Example 5
On the basis of the above embodiment, if the characteristics of the data include the time of data generation, the time of recording, and the source and use of tax data, determining the similarity between the data to be associated and each acquired data according to the characteristic information of the data to be associated and the acquired data specifically includes: determining the similarity of the data to be associated and the time of the data generation of the data according to the time of the data to be associated and the data generation of the data; determining the similarity of the recorded time of the data to be associated and the data according to the recorded time of the data to be associated and the recorded time of the data; determining the similarity of the data to be correlated and the source of the data according to the data to be correlated and the source of the data; determining the similarity of the data to be associated and the application of the data according to the data to be associated and the application of the data; and determining the similarity of the data to be correlated and the data according to the generated time similarity, the recorded time similarity, the source similarity and the application similarity.
Specifically, association analysis is also known as association mining, which is to search for frequent patterns, associations, correlations, or causal structures existing between sets of items or objects in transaction data, relationship data, or other information carriers.
In other words, the association analysis is the discovery of the association between different goods (items) in the transaction database.
Example 6
On the basis of the above embodiment, the method for performing data risk analysis in step 3 by using a preset data analysis model inside each data group performs the following steps: the data analysis model is expressed using the following formula:wherein R is a risk weight; dm is a risk threshold and is a set value; d is a risk value of the data group; and after the risk value of the data group is obtained through calculation, substituting the risk value into the data analysis model, so that the risk weight value can be obtained.
Specifically, association analysis is a simple and practical analysis technique that finds associations or correlations that exist in a large number of data sets, thereby describing the rules and patterns in which certain attributes appear simultaneously in a thing.
Correlation analysis is the discovery of interesting correlations and related links between item sets from a large amount of data. A typical example of a correlation analysis is shopping basket analysis. The process analyzes the purchasing habits of the customer by finding the contact between the different items that the customer places in his shopping basket. The discovery of such associations may help retailers formulate marketing strategies by knowing which items are frequently purchased by customers simultaneously. Other applications also include tariff design, commodity promotions, commodity emissions, and customer demarcation based on purchasing patterns.
Rules such as "occurrence of some events due to occurrence of other events" may be parsed from the database in association. Such as "67% of customers will purchase diapers while buying beer", so the quality of service and benefits of the supermarket can be improved by reasonable shelf placement or bundled sales of beer and diapers. For example, the students with excellent courses of 'C language' have the excellent possibility of 88 percent when learning 'data structures', so that the teaching effect can be improved by strengthening the learning of 'C language'.
Example 7
On the basis of the above embodiment, the numberThe method for calculating the risk value of the group comprises the following steps: selecting one data from the data group as a starting value, and calculating a risk value of the starting value by using the following formula:wherein T is a data value, and K is a mean value of association degree values of the data and other associated data; after the risk value of the initial value is obtained, repeatedly calculating the risk value from other data values with highest association degree values with the initial value until all the risk values of the data in the data group are calculated; and calculating the average value of all risk values as the risk weight of the data group.
Example 8
On the basis of the above embodiment, the method further includes a step of preprocessing the tax data before the data is divided into a plurality of data groups according to the set characteristics; the method specifically comprises the following steps: carrying out structuring treatment on tax data to be preprocessed to obtain structured tax data; the tax data comprises a data field to be preprocessed; determining the attribute corresponding to each data field and the preprocessing rule subordinate to each attribute; forming a preprocessing rule set by utilizing preprocessing rules belonging to each attribute; preprocessing the tax data based on the preprocessing rule set.
Example 9
On the basis of the above embodiment, the preprocessing the tax data based on the preprocessing rule set includes: acquiring the data volume of the tax data; when the data quantity exceeds a preset threshold value, generating a plurality of data preprocessing tasks according to the data quantity; the data preprocessing task comprises a data field column subset and/or a data field row subset which need preprocessing; configuring a corresponding preprocessing rule subset for each data preprocessing task from the preprocessing rule set; all data preprocessing tasks are performed in a distributed manner.
It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated here.
It should be noted that, in the system provided in the foregoing embodiment, only the division of the foregoing functional units is illustrated, in practical application, the foregoing functional allocation may be performed by different functional units, that is, the units or steps in the embodiment of the present invention are further decomposed or combined, for example, the units in the foregoing embodiment may be combined into one unit, or may be further split into multiple sub-units, so as to complete all or the functions of the units described above. The features of the units and steps related to the embodiments of the invention are merely for distinguishing the units or steps, and are not to be construed as undue limitations of the present invention.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Those of skill in the art will appreciate that the various illustrative elements, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software elements, method steps may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.
The terms "first," "another portion," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit/apparatus.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related art marks may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention.

Claims (9)

1. An application method suitable for finance and tax field index self-definition and automatic operation is characterized by comprising the following steps:
step 1: dividing tax data into a plurality of data groups according to set characteristics; the set features include: time characteristics and/or money characteristics; at the same time, counting the possible intersection parts of each data group;
step 2: carrying out data association in each data group, wherein the data association specifically comprises the following steps: analyzing each data in the data group to calculate the association degree value of the data and other data, and connecting the associated data;
step 3: carrying out data risk analysis by using a preset data analysis model in each data group to obtain a risk weight corresponding to the data group; carrying out data risk analysis on each intersection part to obtain a risk weight corresponding to the intersection part; based on the risk weight corresponding to the data group and the risk weight corresponding to the intersection part, solving the risk weight of the non-intersection part;
step 4: when a risk index generation command is received, analyzing the risk index generation command to obtain a tax data group of the risk index generation command; after screening out tax data groups, judging whether the screened tax data groups have intersection parts by using a discriminator, if so, accumulating risk weights corresponding to all intersection parts, calculating the risk weights of all non-intersection parts, sorting the non-intersection parts according to the order from large to small based on the calculated risk weights of the non-intersection parts, and screening out the non-intersection parts with the risk weights arranged in the first three;
step 5: in the non-intersection part and the intersection part, generating a command based on the risk index, finding an entry value, and starting data search in each intersection part and the non-intersection part based on entry data, specifically including: starting from the entry data, finding the data with the highest association degree value with the previous data as the next data until the data is traversed to the end point;
step 6: and based on the non-intersection parts and the duty ratio of all data found by data searching in the intersection parts in each non-intersection part or intersection part, a final risk weight is obtained and used as a generated risk index.
2. The application method according to claim 1, wherein in the step 1, the time feature includes: time of tax data generation and/or time of recording; the money feature comprises: the source and/or use of tax data.
3. The application method according to claim 2, wherein the method of analyzing each data in the data group to calculate the association degree value of the data with other data in step 2 comprises the following steps: classifying the characteristics of each data in the data group to obtain the classification of the characteristics; the features are one or more of the set features; membership labeling is carried out on each category to obtain membership types of each category; selecting a class with the same membership type as the preset membership type and highest priority from all classes as an associated feature according to the preset membership type priority order; acquiring data containing the associated features from a data group; from the acquired data, determining associated data with data to be associated in the data group.
4. An application method according to claim 3, wherein determining, from the acquired data, the data associated with the data to be associated in the data group comprises: according to the characteristic information of the data to be correlated and the acquired data, determining the similarity of the data to be correlated and the acquired data; and determining the data associated with the data to be associated according to the similarity of the data.
5. The application method according to claim 4, wherein if the characteristics of the data include time of data generation, time of recording, source and use of tax data, determining the similarity between the data to be associated and each acquired data according to the characteristic information of the data to be associated and the acquired data specifically includes: determining the similarity of the data to be associated and the time of the data generation of the data according to the time of the data to be associated and the data generation of the data; determining the similarity of the recorded time of the data to be associated and the data according to the recorded time of the data to be associated and the recorded time of the data; determining the similarity of the data to be correlated and the source of the data according to the data to be correlated and the source of the data; determining the similarity of the data to be associated and the application of the data according to the data to be associated and the application of the data; and determining the similarity of the data to be correlated and the data according to the generated time similarity, the recorded time similarity, the source similarity and the application similarity.
6. The application method according to claim 5, wherein in the step 3, a preset data analysis model is used for data risk classification inside each data groupThe analytical method performs the following steps: the data analysis model is expressed using the following formula:wherein R is a risk weight; dm is a risk threshold and is a set value; d is a risk value of the data group; and after the risk value of the data group is obtained through calculation, substituting the risk value into the data analysis model, so that the risk weight value can be obtained.
7. The application method according to claim 6, wherein the calculation method of risk values of the data group performs the steps of: selecting one data from the data group as a starting value, and calculating a risk value of the starting value by using the following formula:wherein T is a data value, and K is a mean value of association degree values of the data and other associated data; after the risk value of the initial value is obtained, repeatedly calculating the risk value from other data values with highest association degree values with the initial value until all the risk values of the data in the data group are calculated; and calculating the average value of all risk values as the risk weight of the data group.
8. The application method according to claim 7, wherein the method further comprises the step of performing data preprocessing on tax data before dividing the data into a plurality of data groups according to the set characteristics; the method specifically comprises the following steps: carrying out structuring treatment on tax data to be preprocessed to obtain structured tax data; the tax data comprises a data field to be preprocessed; determining the attribute corresponding to each data field and the preprocessing rule subordinate to each attribute; forming a preprocessing rule set by utilizing preprocessing rules belonging to each attribute; preprocessing the tax data based on the preprocessing rule set.
9. The application method of claim 8, wherein the preprocessing the tax data based on the preprocessing rule set comprises: acquiring the data volume of the tax data; when the data quantity exceeds a preset threshold value, generating a plurality of data preprocessing tasks according to the data quantity; the data preprocessing task comprises a data field column subset and/or a data field row subset which need preprocessing; configuring a corresponding preprocessing rule subset for each data preprocessing task from the preprocessing rule set; all data preprocessing tasks are performed in a distributed manner.
CN202111180467.5A 2021-10-11 2021-10-11 Application method suitable for finance and tax field index self-definition and automatic operation Active CN113902533B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111180467.5A CN113902533B (en) 2021-10-11 2021-10-11 Application method suitable for finance and tax field index self-definition and automatic operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111180467.5A CN113902533B (en) 2021-10-11 2021-10-11 Application method suitable for finance and tax field index self-definition and automatic operation

Publications (2)

Publication Number Publication Date
CN113902533A CN113902533A (en) 2022-01-07
CN113902533B true CN113902533B (en) 2023-08-25

Family

ID=79191171

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111180467.5A Active CN113902533B (en) 2021-10-11 2021-10-11 Application method suitable for finance and tax field index self-definition and automatic operation

Country Status (1)

Country Link
CN (1) CN113902533B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983224A (en) * 1997-10-31 1999-11-09 Hitachi America, Ltd. Method and apparatus for reducing the computational requirements of K-means data clustering
CN106156294A (en) * 2016-06-29 2016-11-23 福建富士通信息软件有限公司 A kind of method of quick lookup associated data
CN106355405A (en) * 2015-07-14 2017-01-25 阿里巴巴集团控股有限公司 Method and device for identifying risks and system for preventing and controlling same
CN107180070A (en) * 2017-03-29 2017-09-19 暨南大学 A kind of risk information is classified, recognized and method for early warning and system automatically
CN107402927A (en) * 2016-05-19 2017-11-28 上海斯睿德信息技术有限公司 A kind of enterprise's incidence relation topology method for building up and querying method based on graph model
KR20180003674A (en) * 2016-06-30 2018-01-10 (주) 더존비즈온 Apparatus and method for managing vehicle information using financial management system
CN110851869A (en) * 2019-11-14 2020-02-28 深圳前海微众银行股份有限公司 Sensitive information processing method and device and readable storage medium
CN110852600A (en) * 2019-11-07 2020-02-28 江苏税软软件科技有限公司 Method for evaluating dynamic risk of market subject
CN111754044A (en) * 2020-06-30 2020-10-09 深圳前海微众银行股份有限公司 Employee behavior auditing method, device, equipment and readable storage medium
CN112016843A (en) * 2020-09-02 2020-12-01 税安科技(杭州)有限公司 Organizational finance and tax data risk analysis method and related device
CN112434048A (en) * 2021-01-26 2021-03-02 湖州市大数据运营有限公司 Data cross analysis method and device, computer equipment and storage medium
CN112528096A (en) * 2020-12-15 2021-03-19 航天信息股份有限公司 Enterprise analysis method, storage medium and electronic device
CN113205271A (en) * 2021-05-12 2021-08-03 国家税务总局山东省税务局 Method for evaluating enterprise income tax risk based on machine learning
WO2021196520A1 (en) * 2020-03-30 2021-10-07 西安交通大学 Tax field-oriented knowledge map construction method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010044734A1 (en) * 2000-09-01 2001-11-22 Audit Protection Insurance Services, Inc. Method, system, and software for providing tax audit insurance
US8374951B2 (en) * 2002-04-10 2013-02-12 Research Affiliates, Llc System, method, and computer program product for managing a virtual portfolio of financial objects
US20090210246A1 (en) * 2002-08-19 2009-08-20 Choicestream, Inc. Statistical personalized recommendation system
US9461876B2 (en) * 2012-08-29 2016-10-04 Loci System and method for fuzzy concept mapping, voting ontology crowd sourcing, and technology prediction
US10073974B2 (en) * 2016-07-21 2018-09-11 International Business Machines Corporation Generating containers for applications utilizing reduced sets of libraries based on risk analysis
US20210383489A1 (en) * 2018-08-20 2021-12-09 Shawn R. Hutchinson Scheduling, booking, and pricing engines

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983224A (en) * 1997-10-31 1999-11-09 Hitachi America, Ltd. Method and apparatus for reducing the computational requirements of K-means data clustering
CN106355405A (en) * 2015-07-14 2017-01-25 阿里巴巴集团控股有限公司 Method and device for identifying risks and system for preventing and controlling same
CN107402927A (en) * 2016-05-19 2017-11-28 上海斯睿德信息技术有限公司 A kind of enterprise's incidence relation topology method for building up and querying method based on graph model
CN106156294A (en) * 2016-06-29 2016-11-23 福建富士通信息软件有限公司 A kind of method of quick lookup associated data
KR20180003674A (en) * 2016-06-30 2018-01-10 (주) 더존비즈온 Apparatus and method for managing vehicle information using financial management system
CN107180070A (en) * 2017-03-29 2017-09-19 暨南大学 A kind of risk information is classified, recognized and method for early warning and system automatically
CN110852600A (en) * 2019-11-07 2020-02-28 江苏税软软件科技有限公司 Method for evaluating dynamic risk of market subject
CN110851869A (en) * 2019-11-14 2020-02-28 深圳前海微众银行股份有限公司 Sensitive information processing method and device and readable storage medium
WO2021196520A1 (en) * 2020-03-30 2021-10-07 西安交通大学 Tax field-oriented knowledge map construction method and system
CN111754044A (en) * 2020-06-30 2020-10-09 深圳前海微众银行股份有限公司 Employee behavior auditing method, device, equipment and readable storage medium
CN112016843A (en) * 2020-09-02 2020-12-01 税安科技(杭州)有限公司 Organizational finance and tax data risk analysis method and related device
CN112528096A (en) * 2020-12-15 2021-03-19 航天信息股份有限公司 Enterprise analysis method, storage medium and electronic device
CN112434048A (en) * 2021-01-26 2021-03-02 湖州市大数据运营有限公司 Data cross analysis method and device, computer equipment and storage medium
CN113205271A (en) * 2021-05-12 2021-08-03 国家税务总局山东省税务局 Method for evaluating enterprise income tax risk based on machine learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
三文笔村集体经营性建设用地入市风险研究;付光辉 等;合作经济与科技(第1期);30-32 *

Also Published As

Publication number Publication date
CN113902533A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
Peacock Data mining in marketing: Part 1
Aher et al. Data mining in educational system using weka
Karim et al. Decision tree and naive bayes algorithm for classification and generation of actionable knowledge for direct marketing
Setnes et al. Fuzzy modeling of client preference from large data sets: an application to target selection in direct marketing
CN116431931B (en) Real-time incremental data statistical analysis method
Tamilselvi et al. An overview of data mining techniques and applications
CN104346698A (en) Catering member big data analysis and checking system based on cloud computing and data mining
Bhardwaj et al. Review of text mining techniques
CN111191099A (en) User activity type identification method based on social media
CN112070126A (en) Internet of things data mining method
Akulwar et al. Survey on different data mining techniques for prediction
Babaiyan et al. Analyzing customers of South Khorasan telecommunication company with expansion of RFM to LRFM model
Khajvand et al. Analyzing customer segmentation based on customer value components (case study: a private bank)
Adewole et al. Frequent pattern and association rule mining from inventory database using apriori algorithm
CN113902533B (en) Application method suitable for finance and tax field index self-definition and automatic operation
Fan et al. Spatially enabled customer segmentation using a data classification method with uncertain predicates
Asmat et al. Data mining framework for the identification of profitable customer based on recency, frequency, monetary (RFM)
Orozova et al. How to follow modern trends in courses in “databases”-introduction of data mining techniques by example
Zimal et al. Customer churn prediction using machine learning
Saraiya et al. Study of clustering techniques in the data mining domain
Porkizhi A study of data mining techniques and its applications
Farjoo et al. Design of a recommender system for online shopping using decision tree and Apriori algorithm
Khajvand et al. Recommendation rules for an online game site based oncustomer lifetime value
Sweidan et al. Predicting customer churn in retailing
BELHADJ Customer Value Analysis Using Weighted RFM model: Empirical Case Study.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant