CN113902533B

CN113902533B - Application method suitable for finance and tax field index self-definition and automatic operation

Info

Publication number: CN113902533B
Application number: CN202111180467.5A
Authority: CN
Inventors: 吴俊�; 刘冬; 黄友善; 姜汉峰
Original assignee: Tax And Security Technology Hangzhou Co ltd
Current assignee: Tax And Security Technology Hangzhou Co ltd
Priority date: 2021-10-11
Filing date: 2021-10-11
Publication date: 2023-08-25
Anticipated expiration: 2041-10-11
Also published as: CN113902533A

Abstract

The invention belongs to the technical field of computers, and particularly relates to an application suitable for index customization and automatic operation in the financial and tax field. The application comprises the following steps: step 1: dividing tax data into a plurality of data groups according to set characteristics; the set features include: time characteristics and/or money characteristics; at the same time, counting the possible intersection parts of each data group; step 2: carrying out data association in each data group, wherein the data association specifically comprises the following steps: and analyzing each data in the data group to calculate the association degree value of the data and other data, and connecting the associated data. The tax data is decomposed into a plurality of data groups, then the data groups are associated through the association degree values, and finally tax risk indexes are assessed through the dispersion and integration of the calculation of the risk weights, so that the efficiency and the accuracy are greatly improved.

Description

Application method suitable for finance and tax field index self-definition and automatic operation

Technical Field

The invention belongs to the technical field of computers, and particularly relates to an application method suitable for index self-definition and automatic operation in the financial and tax field.

Background

In the financial tax field, professional business personnel make an index aiming at tax wind control of enterprises, which is a process which needs to be continuously made and verified and repeated for many times; in the process, the business personnel need to repeatedly modify the wind control index model for a plurality of times, and the process is realized after verification and delivery, so that the process period is longer, and the production requirement of enterprises is difficult to meet.

The shortest path problem is a classical algorithm problem in graph theory research, which aims to find the shortest path between two nodes in a graph (consisting of nodes and paths). The specific form of the algorithm comprises: determining a shortest path problem of an origin: i.e. the problem of finding the shortest path by knowing the starting point. Determining the shortest path problem of the endpoint: in contrast to the problem of determining the origin, the problem is that the termination node is known and the shortest path is found. In the undirected graph this problem is exactly equivalent to the problem of determining the starting point, in the directed graph this problem is equivalent to the problem of determining the starting point by reversing the direction of all paths.

Patent No. CN202110518344.1A discloses a method for evaluating the risk of income tax of enterprises based on machine learning. The method specifically comprises the following steps: firstly, planning a feature set of a machine learning data set, and selecting 290 target sets of machine learning from the feature set according to management characteristics of income tax of enterprises; classifying and extracting the data, and classifying and extracting the data according to the systems and forms with different characteristics; then forming a machine learning final data set according to the user collection; selecting decision trees and support vector machine algorithm models for integration and connection to form a machine learning algorithm model suitable for income tax; the final operation outputs a result and a result verification feedback.

The tax data set is analyzed and calculated in a machine learning mode, and the tax risk assessment efficiency can be improved, but the processing speed is obviously reduced when facing enterprises with large data volume, firstly, tax data cannot be found in a large amount of data, and secondly, the tax data is acquired, and the processing workload is increased because a short path is not found.

Disclosure of Invention

Therefore, the main purpose of the invention is to provide an application method suitable for the self-defining and automatic operation of indexes in the financial tax field, which is characterized in that tax data are decomposed into a plurality of data groups, then the data groups are associated by association degree values, and finally tax risk indexes are assessed by the dispersion and integration of the calculation of risk weights, so that the efficiency and the accuracy are greatly improved.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows:

an application method suitable for finance and tax field index self-definition and automatic operation, the application method comprises the following steps:

step 1: dividing tax data into a plurality of data groups according to set characteristics; the set features include: time characteristics and/or money characteristics; at the same time, counting the possible intersection parts of each data group;

step 2: carrying out data association in each data group, wherein the data association specifically comprises the following steps: analyzing each data in the data group to calculate the association degree value of the data and other data, and connecting the associated data;

step 3: carrying out data risk analysis by using a preset data analysis model in each data group to obtain a risk weight corresponding to the data group; carrying out data risk analysis on each intersection part to obtain a risk weight corresponding to the intersection part; based on the risk weight corresponding to the data group and the risk weight corresponding to the intersection part, solving the risk weight of the non-intersection part;

step 4: when a risk index generation command is received, analyzing the risk index generation command to obtain a tax data group of the risk index generation command; after screening out tax data groups, judging whether the screened tax data groups have intersection parts by using a discriminator, if so, accumulating risk weights corresponding to all intersection parts, calculating the risk weights of all non-intersection parts, sorting the non-intersection parts according to the order from large to small based on the calculated risk weights of the non-intersection parts, and screening out the non-intersection parts with the risk weights arranged in the first three;

step 5: in the non-intersection part and the intersection part, generating a command based on the risk index, finding an entry value, and starting data search in each intersection part and the non-intersection part based on entry data, specifically including: starting from the entry data, finding the data with the highest association degree value with the previous data as the next data until the data is traversed to the end point;

step 6: and based on the non-intersection parts and the duty ratio of all data found by data searching in the intersection parts in each non-intersection part or intersection part, a final risk weight is obtained and used as a generated risk index.

Further, in the step 1, the time feature includes: time of tax data generation and/or time of recording; the money feature comprises: the source and/or use of tax data.

Further, the method of analyzing each data in the data group in the step 2 to calculate the association degree value between the data and other data executes the following steps: classifying the characteristics of each data in the data group to obtain the classification of the characteristics; the features are one or more of the set features; membership labeling is carried out on each category to obtain membership types of each category; selecting a class with the same membership type as the preset membership type and highest priority from all classes as an associated feature according to the preset membership type priority order; acquiring data containing the associated features from a data group; from the acquired data, determining associated data with data to be associated in the data group.

Further, determining, from the acquired data, associated data with data to be associated in the data group specifically includes: according to the characteristic information of the data to be correlated and the acquired data, determining the similarity of the data to be correlated and the acquired data; and determining the data associated with the data to be associated according to the similarity of the data.

Further, if the characteristics of the data include time of data generation, time of recording, source and use of tax data, determining similarity between the data to be associated and each acquired data according to the characteristic information of the data to be associated and the acquired data, specifically includes: determining the similarity of the data to be associated and the time of the data generation of the data according to the time of the data to be associated and the data generation of the data; determining the similarity of the recorded time of the data to be associated and the data according to the recorded time of the data to be associated and the recorded time of the data; determining the similarity of the data to be correlated and the source of the data according to the data to be correlated and the source of the data; determining the similarity of the data to be associated and the application of the data according to the data to be associated and the application of the data; and determining the similarity of the data to be correlated and the data according to the generated time similarity, the recorded time similarity, the source similarity and the application similarity.

Further, in the step 3, the method for performing data risk analysis by using a preset data analysis model in each data group performs the following steps: the data analysis model is expressed using the following formula:wherein R is a risk weight; dm is a risk threshold and is a set value; d is a risk value of the data group; and after the risk value of the data group is obtained through calculation, substituting the risk value into the data analysis model, so that the risk weight value can be obtained.

Further, the method for calculating the risk value of the data group executes the following steps: selecting one data from the data group as a starting value, and calculating a risk value of the starting value by using the following formula: wherein T is a data value, and K is a mean value of association degree values of the data and other associated data; after the risk value of the initial value is obtained, repeatedly calculating the risk value from other data values with highest association degree values with the initial value until all the risk values of the data in the data group are calculated; and calculating the average value of all risk values as the risk weight of the data group.

Further, the method further comprises the step of preprocessing the tax data before the data is divided into a plurality of data groups according to the set characteristics; the method specifically comprises the following steps: carrying out structuring treatment on tax data to be preprocessed to obtain structured tax data; the tax data comprises a data field to be preprocessed; determining the attribute corresponding to each data field and the preprocessing rule subordinate to each attribute; forming a preprocessing rule set by utilizing preprocessing rules belonging to each attribute; preprocessing the tax data based on the preprocessing rule set.

Further, the preprocessing the tax data based on the preprocessing rule set includes: acquiring the data volume of the tax data; when the data quantity exceeds a preset threshold value, generating a plurality of data preprocessing tasks according to the data quantity; the data preprocessing task comprises a data field column subset and/or a data field row subset which need preprocessing; configuring a corresponding preprocessing rule subset for each data preprocessing task from the preprocessing rule set; all data preprocessing tasks are performed in a distributed manner.

The application method suitable for the customized and automatic operation of the indexes in the financial tax field has the following beneficial effects:

1. the efficiency is higher: according to the invention, data searching is realized in a shortest path mode, so that the efficiency of risk index generation is improved; this is achieved mainly by two aspects: 1. natural realization of shortest path: when the shortest path is realized, the data is not searched each time through an algorithm, but the data search is normalized each time through a fixed mode, in other words, the data search is not performed each time through an algorithm, and the data with the highest association degree value with the previous data is found from the entry data to serve as the next data until the data is traversed to the end point; therefore, the data searching based on the shortest path is realized by using the least resources and the shortest time, and the efficiency is greatly improved; 2. the shortest path algorithm improves the data acquisition efficiency: the invention screens the data required by the generation of the index through the shortest path, and the data acquisition efficiency can be greatly improved by using the shortest path method in the screening process.

2. The accuracy is higher: the invention does not simply and directly calculate based on classified data groups when calculating and generating the risk index, but considers the intersection part between the data groups, and has the advantage of avoiding inaccurate data caused by repeated data.

Drawings

FIG. 1 is a flow chart of an application method suitable for finance and tax field index custom and automatic operation according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a structure of associated data of connection of an application method applicable to finance and tax field index customization and automation operation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a structure of a system of the present invention, wherein the structure is connected after a non-intersection part of the first three risk weights are selected by an application method suitable for the customized and automatic operation of indexes in the financial and tax field;

fig. 4 is a schematic diagram of data searching by using an application method suitable for finance and tax field index customization and automation operation according to an embodiment of the present invention.

Detailed Description

The method of the present invention will be described in further detail with reference to the accompanying drawings.

Example 1

As shown in fig. 1

An application adapted for tax domain index customization and automation, the method performing the steps of:

Referring to fig. 2, the data association between the associated data in fig. 2 ranges from 1 to 4. And according to the calculated data association degree, associating the data. Each letter in the figure represents a datum.

Referring to fig. 3, the arbiter in fig. 3 is used to determine which cluster data search should be performed from. In the three intersection parts and the non-intersection part which are screened out, the entrance data can be found by the discriminator. Each number in the graph represents a risk weight for each data.

Referring to fig. 4, a is the start data and B is the end of the data in fig. 4. Each number in the middle represents a risk weight for each data group.

Example 2

On the basis of the above embodiment, in the step 1, the time feature includes: time of tax data generation and/or time of recording; the money feature comprises: the source and/or use of tax data.

Example 3

On the basis of the above embodiment, the method in step 2 for analyzing each data in the data group to calculate the association degree value between the data and other data performs the following steps: classifying the characteristics of each data in the data group to obtain the classification of the characteristics; the features are one or more of the set features; membership labeling is carried out on each category to obtain membership types of each category; selecting a class with the same membership type as the preset membership type and highest priority from all classes as an associated feature according to the preset membership type priority order; acquiring data containing the associated features from a data group; from the acquired data, determining associated data with data to be associated in the data group.

Example 4

On the basis of the above embodiment, determining, from the acquired data, associated data with data to be associated in the data group specifically includes: according to the characteristic information of the data to be correlated and the acquired data, determining the similarity of the data to be correlated and the acquired data; and determining the data associated with the data to be associated according to the similarity of the data.

In particular, data (Data) is a representation of facts, concepts or instructions that may be processed by manual or automated means. After the data is interpreted and given a certain meaning, the data becomes information. Data processing (data processing) is the collection, storage, retrieval, processing, transformation, and transmission of data.

The basic purpose of data processing is to extract and derive data that is valuable and meaningful to some particular person from a large, possibly unorganized, unintelligible, data.

Data processing is a fundamental link of system engineering and automatic control. Data processing extends throughout various areas of social production and social life. The development of data processing technology and the breadth and depth of application thereof greatly influence the progress of human society development.

Example 5

On the basis of the above embodiment, if the characteristics of the data include the time of data generation, the time of recording, and the source and use of tax data, determining the similarity between the data to be associated and each acquired data according to the characteristic information of the data to be associated and the acquired data specifically includes: determining the similarity of the data to be associated and the time of the data generation of the data according to the time of the data to be associated and the data generation of the data; determining the similarity of the recorded time of the data to be associated and the data according to the recorded time of the data to be associated and the recorded time of the data; determining the similarity of the data to be correlated and the source of the data according to the data to be correlated and the source of the data; determining the similarity of the data to be associated and the application of the data according to the data to be associated and the application of the data; and determining the similarity of the data to be correlated and the data according to the generated time similarity, the recorded time similarity, the source similarity and the application similarity.

Specifically, association analysis is also known as association mining, which is to search for frequent patterns, associations, correlations, or causal structures existing between sets of items or objects in transaction data, relationship data, or other information carriers.

In other words, the association analysis is the discovery of the association between different goods (items) in the transaction database.

Example 6

On the basis of the above embodiment, the method for performing data risk analysis in step 3 by using a preset data analysis model inside each data group performs the following steps: the data analysis model is expressed using the following formula:wherein R is a risk weight; dm is a risk threshold and is a set value; d is a risk value of the data group; and after the risk value of the data group is obtained through calculation, substituting the risk value into the data analysis model, so that the risk weight value can be obtained.

Specifically, association analysis is a simple and practical analysis technique that finds associations or correlations that exist in a large number of data sets, thereby describing the rules and patterns in which certain attributes appear simultaneously in a thing.

Correlation analysis is the discovery of interesting correlations and related links between item sets from a large amount of data. A typical example of a correlation analysis is shopping basket analysis. The process analyzes the purchasing habits of the customer by finding the contact between the different items that the customer places in his shopping basket. The discovery of such associations may help retailers formulate marketing strategies by knowing which items are frequently purchased by customers simultaneously. Other applications also include tariff design, commodity promotions, commodity emissions, and customer demarcation based on purchasing patterns.

Rules such as "occurrence of some events due to occurrence of other events" may be parsed from the database in association. Such as "67% of customers will purchase diapers while buying beer", so the quality of service and benefits of the supermarket can be improved by reasonable shelf placement or bundled sales of beer and diapers. For example, the students with excellent courses of 'C language' have the excellent possibility of 88 percent when learning 'data structures', so that the teaching effect can be improved by strengthening the learning of 'C language'.

Example 7

On the basis of the above embodiment, the numberThe method for calculating the risk value of the group comprises the following steps: selecting one data from the data group as a starting value, and calculating a risk value of the starting value by using the following formula:wherein T is a data value, and K is a mean value of association degree values of the data and other associated data; after the risk value of the initial value is obtained, repeatedly calculating the risk value from other data values with highest association degree values with the initial value until all the risk values of the data in the data group are calculated; and calculating the average value of all risk values as the risk weight of the data group.

Example 8

On the basis of the above embodiment, the method further includes a step of preprocessing the tax data before the data is divided into a plurality of data groups according to the set characteristics; the method specifically comprises the following steps: carrying out structuring treatment on tax data to be preprocessed to obtain structured tax data; the tax data comprises a data field to be preprocessed; determining the attribute corresponding to each data field and the preprocessing rule subordinate to each attribute; forming a preprocessing rule set by utilizing preprocessing rules belonging to each attribute; preprocessing the tax data based on the preprocessing rule set.

Example 9

On the basis of the above embodiment, the preprocessing the tax data based on the preprocessing rule set includes: acquiring the data volume of the tax data; when the data quantity exceeds a preset threshold value, generating a plurality of data preprocessing tasks according to the data quantity; the data preprocessing task comprises a data field column subset and/or a data field row subset which need preprocessing; configuring a corresponding preprocessing rule subset for each data preprocessing task from the preprocessing rule set; all data preprocessing tasks are performed in a distributed manner.

It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated here.

It should be noted that, in the system provided in the foregoing embodiment, only the division of the foregoing functional units is illustrated, in practical application, the foregoing functional allocation may be performed by different functional units, that is, the units or steps in the embodiment of the present invention are further decomposed or combined, for example, the units in the foregoing embodiment may be combined into one unit, or may be further split into multiple sub-units, so as to complete all or the functions of the units described above. The features of the units and steps related to the embodiments of the invention are merely for distinguishing the units or steps, and are not to be construed as undue limitations of the present invention.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

Those of skill in the art will appreciate that the various illustrative elements, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software elements, method steps may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.

The terms "first," "another portion," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit/apparatus.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related art marks may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention.

Claims

1. An application method suitable for finance and tax field index self-definition and automatic operation is characterized by comprising the following steps:

2. The application method according to claim 1, wherein in the step 1, the time feature includes: time of tax data generation and/or time of recording; the money feature comprises: the source and/or use of tax data.

3. The application method according to claim 2, wherein the method of analyzing each data in the data group to calculate the association degree value of the data with other data in step 2 comprises the following steps: classifying the characteristics of each data in the data group to obtain the classification of the characteristics; the features are one or more of the set features; membership labeling is carried out on each category to obtain membership types of each category; selecting a class with the same membership type as the preset membership type and highest priority from all classes as an associated feature according to the preset membership type priority order; acquiring data containing the associated features from a data group; from the acquired data, determining associated data with data to be associated in the data group.

4. An application method according to claim 3, wherein determining, from the acquired data, the data associated with the data to be associated in the data group comprises: according to the characteristic information of the data to be correlated and the acquired data, determining the similarity of the data to be correlated and the acquired data; and determining the data associated with the data to be associated according to the similarity of the data.

5. The application method according to claim 4, wherein if the characteristics of the data include time of data generation, time of recording, source and use of tax data, determining the similarity between the data to be associated and each acquired data according to the characteristic information of the data to be associated and the acquired data specifically includes: determining the similarity of the data to be associated and the time of the data generation of the data according to the time of the data to be associated and the data generation of the data; determining the similarity of the recorded time of the data to be associated and the data according to the recorded time of the data to be associated and the recorded time of the data; determining the similarity of the data to be correlated and the source of the data according to the data to be correlated and the source of the data; determining the similarity of the data to be associated and the application of the data according to the data to be associated and the application of the data; and determining the similarity of the data to be correlated and the data according to the generated time similarity, the recorded time similarity, the source similarity and the application similarity.

6. The application method according to claim 5, wherein in the step 3, a preset data analysis model is used for data risk classification inside each data groupThe analytical method performs the following steps: the data analysis model is expressed using the following formula:wherein R is a risk weight; dm is a risk threshold and is a set value; d is a risk value of the data group; and after the risk value of the data group is obtained through calculation, substituting the risk value into the data analysis model, so that the risk weight value can be obtained.

7. The application method according to claim 6, wherein the calculation method of risk values of the data group performs the steps of: selecting one data from the data group as a starting value, and calculating a risk value of the starting value by using the following formula:wherein T is a data value, and K is a mean value of association degree values of the data and other associated data; after the risk value of the initial value is obtained, repeatedly calculating the risk value from other data values with highest association degree values with the initial value until all the risk values of the data in the data group are calculated; and calculating the average value of all risk values as the risk weight of the data group.

8. The application method according to claim 7, wherein the method further comprises the step of performing data preprocessing on tax data before dividing the data into a plurality of data groups according to the set characteristics; the method specifically comprises the following steps: carrying out structuring treatment on tax data to be preprocessed to obtain structured tax data; the tax data comprises a data field to be preprocessed; determining the attribute corresponding to each data field and the preprocessing rule subordinate to each attribute; forming a preprocessing rule set by utilizing preprocessing rules belonging to each attribute; preprocessing the tax data based on the preprocessing rule set.

9. The application method of claim 8, wherein the preprocessing the tax data based on the preprocessing rule set comprises: acquiring the data volume of the tax data; when the data quantity exceeds a preset threshold value, generating a plurality of data preprocessing tasks according to the data quantity; the data preprocessing task comprises a data field column subset and/or a data field row subset which need preprocessing; configuring a corresponding preprocessing rule subset for each data preprocessing task from the preprocessing rule set; all data preprocessing tasks are performed in a distributed manner.