CN113761297A - Method and device for determining field relevancy in database table - Google Patents

Method and device for determining field relevancy in database table Download PDF

Info

Publication number
CN113761297A
CN113761297A CN202011248181.1A CN202011248181A CN113761297A CN 113761297 A CN113761297 A CN 113761297A CN 202011248181 A CN202011248181 A CN 202011248181A CN 113761297 A CN113761297 A CN 113761297A
Authority
CN
China
Prior art keywords
field
fields
correlation
determining
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011248181.1A
Other languages
Chinese (zh)
Inventor
张蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202011248181.1A priority Critical patent/CN113761297A/en
Publication of CN113761297A publication Critical patent/CN113761297A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for determining field relevancy in a database table, and relates to the technical field of computers. One embodiment of the method comprises: for any two fields to be analyzed in a database table, judging the field type of each field according to the element of the field; the field types include: a numeric field and a categorical field, the elements in the categorical field belonging to at least two element categories; when one of the two fields is a numerical field and the other field is a typing field, determining elements belonging to the same element category in the typing field, and forming an analysis group by the elements in the numerical field corresponding to the elements; and determining the interclass variance and the intraclass variance of each analysis group, and obtaining the correlation index of the two fields according to the interclass variance and the intraclass variance. The implementation method can quantitatively calculate the correlation degree aiming at the numerical fields and the classification fields in any database table, and is beneficial to realizing the uniform analysis of the correlation degrees of different types of fields.

Description

Method and device for determining field relevancy in database table
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for determining field relevancy in a database table.
Background
There is a need in many scenarios to determine the degree of relatedness of different fields in a database table. For example, in a data analysis scenario, information of a data provider and a data demander is often asymmetric, and in addition, a database table has certain complexity, so that the problems of ambiguous data demand, frequent data correction and the like exist, and at this time, the correlation degree between different fields in the database table needs to be analyzed, so that a valuable reference is provided for the data demander, and the work efficiency is remarkably improved. In the prior art, correlation can be calculated by using methods such as a pearson correlation coefficient and the like according to whether the field type to be analyzed is a numerical field or a subtyping field.
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems: first, when dealing with a large number of fields to be analyzed whose types are unknown, the prior art cannot quickly and accurately identify the field types. Second, when calculating the correlation between the numeric field and some classified fields (such as gender), the prior art can only describe the correlation qualitatively, which cannot meet the requirements of some application environments. Third, there is a lack of uniform criteria in the prior art to implement relevancy analysis of database table fields in various situations.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining field relevancy in a database table, which can quantitatively calculate relevancy for a numeric field and a categorical field in any database table, and are helpful to implement unified analysis of relevancy for different types of fields.
To achieve the above objects, according to one aspect of the present invention, there is provided a method for determining the relatedness of fields in a database table.
The method for determining the field relevancy in the database table comprises the following steps: for any two fields to be analyzed in a database table, judging the field type of each field according to the element of the field; wherein the field types include: a numeric field and a categorical field, the elements in the categorical field belonging to at least two element categories; when one of the two fields is a numerical field and the other field is a typing field, determining elements belonging to the same element category in the typing field, and forming an analysis group by the elements in the numerical field corresponding to the elements; determining an interclass variance and an intraclass variance for each analysis group, and obtaining a correlation index of the two fields according to the interclass variance and the intraclass variance.
Optionally, the determining, according to the element of each field, the field type to which the field belongs includes: for any field to be analyzed, judging whether the proportion of elements in the field, which accord with a preset first regular expression, is not less than a first threshold value: if yes, determining the field as a numerical field; the first regular expression is used for matching floating point numbers; if the proportion of the elements in the field which accord with the first regular expression is smaller than a first threshold, judging whether the number of the elements in the field after the duplication removal is larger than 1 and not larger than a second threshold: if yes, determining the field as a type-divided field; wherein the second threshold is related to and less than the total number of elements in the field.
Optionally, the determining, according to the element of each field, the field type to which the field belongs includes: for any field to be analyzed, judging whether the number of elements in the field after de-duplication is larger than 1 and not larger than a second threshold value: if yes, determining the field as a type-divided field; wherein the second threshold is related to and less than the total number of elements in the field; if the number of the elements in the field after the duplication removal is 1 or is greater than a second threshold, whether the proportion of the elements in the field which accord with a preset second regular expression is not less than a third threshold is judged: if yes, determining the field as a numerical field; wherein the second regular expression is used for matching floating point numbers and integers.
Optionally, the obtaining the correlation index of the two fields according to the inter-group variance and the intra-group variance includes: dividing the variance between the groups by the variance in the groups to obtain an initial value of the correlation degree of the two fields, and determining the natural logarithm of the initial value of the correlation degree as a middle value of the correlation degree; and transforming the correlation intermediate value to a value interval from zero to one to form a correlation index of the two fields.
Optionally, the transforming the correlation intermediate value to a value interval from zero to one to form a correlation index of the two fields includes: when the correlation intermediate value is less than zero, determining the correlation index as zero; when the correlation degree intermediate value is larger than a first numerical value, determining the correlation degree index as one; wherein the first value is a real number greater than one; when the correlation degree intermediate value is not less than zero and not more than a first numerical value, determining the correlation degree index as a product of the correlation degree intermediate value and a second numerical value; wherein the second value is the inverse of the first value.
Optionally, the method further comprises: when any two fields to be analyzed in the database table are numerical fields, determining the absolute values of the spearman correlation coefficients of the two fields as the correlation indexes of the two fields; when any two fields to be analyzed in the database table are classified fields, determining the Cramer correlation coefficient of the two fields as the correlation index of the two fields.
Optionally, the method further comprises: after obtaining the relevance indexes of any two fields to be analyzed in the database table, inputting the relevance indexes into a preset relevance matrix; the row number and the column number of the correlation matrix are both equal to the total number of the fields to be analyzed of the database table, each row and each column respectively correspond to the identifiers of the fields to be analyzed in the database table which are arranged in a preset sequence, any element in the correlation matrix is a correlation index between the field corresponding to the row where the element is located and the field corresponding to the column where the element is located, and the gray value of the element is positively correlated with the correlation index.
Optionally, the method further comprises: after obtaining the relevance indexes of any two fields to be analyzed in the database table, inputting the relevance indexes into a preset weight connection diagram; the weight connection graph comprises nodes which are arranged along the circumferential direction and used for representing fields to be analyzed in the database table, and connecting lines which are positioned between any two nodes and used for representing correlation indexes; the nodes are configured with different colors for representing different field types, the connecting lines are configured with different colors for representing different correlation index types, the width and the color depth of the connecting line are positively correlated with the correlation index represented by the connecting line, and the correlation index types comprise: a relevance indicator between two numeric fields, a relevance indicator between two categorical fields, and a relevance indicator between a numeric field and a categorical field.
To achieve the above object, according to another aspect of the present invention, there is provided an apparatus for determining relevancy of fields in a database table.
The device for determining the field relevancy in the database table of the embodiment of the invention can comprise: a field type determination unit to: for any two fields to be analyzed in a database table, judging the field type of each field according to the element of the field; wherein the field types include: a numeric field and a categorical field, the elements in the categorical field belonging to at least two element categories; a grouping unit for: when one of the two fields is a numerical field and the other field is a typing field, determining elements belonging to the same element category in the typing field, and forming an analysis group by the elements in the numerical field corresponding to the elements; and the correlation calculation unit is used for determining the interclass variance and the intraclass variance of each analysis group and obtaining the correlation indexes of the two fields according to the interclass variance and the intraclass variance.
To achieve the above object, according to still another aspect of the present invention, there is provided an electronic apparatus.
An electronic device of the present invention includes: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for determining the field relevancy in the database table.
To achieve the above object, according to still another aspect of the present invention, there is provided a computer-readable storage medium.
The invention relates to a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for determining the relevancy of fields in a database table provided by the invention.
According to the technical scheme of the invention, the embodiment of the invention has the following advantages or beneficial effects: when the relevancy analysis is executed on the fields in the database table, firstly, the field types are quickly and accurately judged according to the field elements and a preset regular expression; then, the correlation index can be calculated by adopting corresponding methods according to different field types, for example, when two fields to be analyzed are both numerical fields, the absolute values of the spearman correlation coefficients of the two fields are used as the correlation index, and when the two fields to be analyzed are both classification type fields, the gram correlation coefficients of the two fields are used as the correlation index; specifically, when one of the two fields to be analyzed is a numeric field and the other is a typing field, the embodiment of the present invention first divides the elements in the data value field into a plurality of analysis groups according to the element class in the typing field, and then calculates the inter-group variance and the intra-group variance for each analysis group and uses the quotient of the two as a correlation index, thereby implementing quantitative correlation analysis of the numeric field and the typing field (the specific principle will be described below). To sum up, the embodiment of the present invention provides a unified analysis standard from field type determination to correlation calculation, and when two numeric fields, two subtype fields, or a numeric field and a subtype field are faced, quantitative correlation analysis can be performed to obtain a correlation index between zero and one.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of the method for determining the relevancy of the fields in the database table according to the embodiment of the present invention;
FIG. 2 is a global schematic of a saturation logarithm function of an embodiment of the present invention;
FIG. 3 is a partial schematic of a saturation logarithm function of an embodiment of the present invention;
FIG. 4 is a diagram illustrating a specific implementation of the method for determining relevancy of fields in a database table according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a correlation matrix according to an embodiment of the invention;
FIG. 6 is a diagram illustrating a weight connection according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a component of an apparatus for determining relevancy of fields in a database table according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
FIG. 9 is a schematic structural diagram of an electronic device for implementing the method for determining the relevancy of the fields in the database table according to the embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.
Example one
FIG. 1 is a diagram illustrating the main steps of a method for determining the relevancy of fields in a database table according to an embodiment of the present invention. As shown in fig. 1, the method for determining the field relevancy in the database table according to the embodiment of the present invention may be specifically performed according to the following steps:
step S101: and for any two fields to be analyzed in the database table, judging the field type of each field according to the element of the field.
In this step, the two fields to be analyzed may be fields of the same database table, or may be any fields in a plurality of database tables that can be associated with each other. For example, if the employee performance assessment table and the employee base information table may be associated by a common employee name field, then a relevancy analysis may be performed on any two fields in the two database tables. In embodiments of the present invention, the field types may include a numeric field, a categorical field, and other types of fields that are different from both the numeric field and the categorical field.
Wherein, the elements in the numeric field (i.e. the values of the database table in the fields) are represented as integers, decimals and other numeric values, the sizes of the numeric values generally have practical meanings and can be compared with each other, for example, the payroll amount field and the overtime length field in the staff performance evaluation table generally belong to the numeric field. The elements in the typing field can be generally divided into at least two element categories, for example, a gender field and an affiliated age group field in the employee basic information table generally belong to the typing field, the gender field includes two element categories, i.e., "male" and "female", and the affiliated age group field includes element categories which may be: "10 to 20 years old", "20 to 30 years old", "30 to 40 years old", "40 to 50 years old", and the like. The categorical field can be divided into an orderly-arranged categorical field and an unordered categorical field, the element categories of the orderly-arranged categorical field can be compared with each other (for example, the sizes of the comparison) and can be ordered according to actual meanings, and the element categories of the unordered field cannot be ordered according to the actual meanings. For example, the gender field is a sorted classification field, and the age field is a sorted classification field. For the mobile phone number field, the mobile phone number field belongs to other types of fields because the mobile phone number generally has no practical significance in a numerical level and does not reflect the properties of belonging to different types; in embodiments of the present invention, relevancy analysis may not be performed on other types of fields. For the age field, since its element has both a numerical meaning and a category meaning, the field may be either a numerical type field or a typing field.
In an alternative, the field type of any field may be determined according to the following steps. For any field to be analyzed, firstly, judging whether the proportion of elements in the field, which accord with a preset first regular expression, is not less than a first threshold value, namely judging whether the following formula is satisfied:
nnum≥η·n
wherein n isnumRepresenting the number of elements that conform to the first regular expression and n representing the total number of elements in the field. η is a preset first threshold and may be a number greater than zero and less than one. The first regular expression may be a regular expression used to match floating point numbers (i.e., numbers with a decimal point), such as "\\ d" (the first one used for character escape, the regular expression used to match elements that have numbers after a decimal point).
If the formula is satisfied, determining the field as a numeric field; otherwise, judging whether the number of the elements in the field after the duplication removal is greater than 1 and not greater than a second threshold value, namely whether the following conditions are met:
1<ndedup≤β
wherein n isdedupRepresents the number of elements in the field after de-duplication, and beta is a preset second threshold value.
If the above conditions are met, determining the field as a typing field; otherwise, the field is determined to be the other field. Wherein the second threshold β is related to and smaller than the total number of elements n in the field, which may be n, for exampleα(α is a positive number less than 1).
In an actual application scenario, because most elements of a numeric field are floating point numbers, in the field type determination method, a field with a certain proportion of elements as floating point numbers is determined as the numeric field, then a type-divided field is determined by determining whether the element type contained in the field is smaller or far smaller (the determination rule far smaller can be flexibly set) than the total number of the elements, and a field which is not determined by the two methods is the field of other types, so that the field type can be accurately and quickly determined, and the problem that the field type cannot be determined in time when a large number of unknown fields face can be solved. It should be noted that, in a few cases, the above method may determine the numeric field taking the value as an integer as another type field, but since most numeric fields take the value as a floating point number, the above limitation does not affect the actual effect of the above method.
In another alternative, the field type of any field may be determined according to the following steps. For any field to be analyzed, firstly, judging whether the number of elements in the field after deduplication is greater than 1 and not greater than a second threshold: if yes, determining the field as a type-divided field; otherwise, judging whether the element proportion of the field which accords with the preset second regular expression is not less than a third threshold value, namely judging whether the following formula is met:
n0≥τ·n
wherein n is0And τ is a preset third threshold value, and can be a number greater than zero and less than one. The second regular expression may be a regular expression, such as "\ d" (which is used to match elements having numbers) used to match floating point numbers and integers (i.e., numbers having decimal points).
If the formula is satisfied, determining the field as a numeric field; if the above formula is not satisfied, the field is determined to be a typing field.
It can be understood that the field type judgment method firstly executes classification type judgment and then executes numerical type judgment, and can also realize accurate and rapid judgment of the field type. In a specific case, for other types of fields (e.g., mobile phone number fields) whose values are integers, the method may determine the fields as numerical fields, and since the probability of occurrence of such a case is small, the actual use of the method is not greatly affected.
Step S102: when one of the two fields is a numerical field and the other field is a typing field, determining elements belonging to the same element class in the typing field, and forming an analysis group by the elements in the numerical field corresponding to the elements.
In the prior art, when one of the fields to be analyzed is a numerical field and the other is a classification field, only qualitative correlation analysis can be performed, and for this reason, the present embodiment provides a quantitative correlation analysis method. Specifically, elements belonging to the same element category in the typing field are determined first, and then the elements in the numerical field corresponding to the elements form an analysis group. It will be appreciated that if a first element in a typing field corresponds to a second element in a numeric field, the first element is in the same record as the second element (in the case where the typing field and the numeric field belong to the same database table), or the first element and the second element correspond to the same element of an associated field (in the case where the typing field and the numeric field belong to different database tables). For example, the two fields to be analyzed are payroll and gender fields, respectively, as follows (the two fields have five elements from record 1 to record 5, respectively, with record 1 to record 5 being arranged from top to bottom):
amount of payroll Sex
1223.12 For male
2154.56 Woman
1896.51 For male
3021.55 Woman
2136.96 Woman
1223.12, 1896.51 corresponding to the element category "male" may be formed into one analysis group and 2154.56, 3021.55, 2136.96 corresponding to the element category "female" may be formed into another analysis group.
Step S103: and determining the interclass variance and the intraclass variance of each analysis group, and obtaining the correlation index of the two fields according to the interclass variance and the intraclass variance.
In the present embodiment, the interclass variance is used to indicate the degree of data dispersion within the analysis groups, and the interclass variance is used to indicate the degree of data dispersion between the analysis groups. In general, the inter-group variance MSB and the intra-group variance MSE may be calculated by the following equations:
Figure BDA0002770756460000101
Figure BDA0002770756460000102
where r denotes the number of element classes of the taxonomic field to be analysed, niIndicates the number of elements, x, that the ith element class has in the typing fieldijIndicating that the ith element type in the typing field corresponds to the jth element of the numeric field to be analyzed,
Figure BDA0002770756460000103
indicating that the ith element category in the typing field corresponds to the average of the elements in the numeric fieldThe value of the one or more of the one,
Figure BDA0002770756460000104
and the average value of elements in the numerical field is shown, and i and j are positive integers.
After the inter-group variance and the intra-group variance are obtained, a correlation index of the two fields can be obtained according to the inter-group variance and the intra-group variance. Preferably, the inter-group variance MSB may be first divided by the intra-group variance MSE to obtain an initial correlation value F of the two fields, and the natural logarithm log of the initial correlation valueeF is determined as the middle value of the correlation degree, and finally the middle value log of the correlation degree is determinedeF transforms to the value interval from zero to one, thus forming the correlation index for both fields. As a preferred solution, the above transformation can be performed by the following formula:
Figure BDA0002770756460000111
the above function is a saturated logarithmic function of F. Wherein R is12And represents a correlation index, mu is a preset first value, and mu is greater than 1 (for example, 10 is optional).
That is, when the correlation median logeWhen F is less than zero, determining the correlation index as zero; when the correlation degree intermediate value is larger than the first numerical value, determining the correlation degree index as one; when the correlation median is not less than zero and not more than a first value, determining the correlation index as the correlation median logeF multiplied by a second value, the second value being the inverse of the first value. Fig. 2 is a global schematic diagram of a saturation logarithmic function according to an embodiment of the present invention, fig. 3 is a local schematic diagram of a saturation logarithmic function according to an embodiment of the present invention, and in fig. 2 and 3, the abscissa is the F value, and the ordinate is the correlation index R12
The principle of calculation of the above correlation index is that, for a plurality of analysis groups configured according to the element classes in the classification field, the inter-group variance MSB is determined by the individual difference E of the elements in the numerical field and the processing factor difference T existing between different analysis groups, the intra-group variance MSE is determined by the individual difference E of the elements within the analysis groups, and the above processing factor difference T is determined by the degree of correlation between the elements in the analysis groups and the element classes of the corresponding classification fields since the different analysis groups correspond to different element classes of the classification fields. Thus, as the correlation of elements within an analysis group with the corresponding element class increases (i.e., the correlation of numeric fields with typing fields increases), the difference in processing factors, T, that exists between different analysis groups increases, resulting in an increase in the ratio, F, of the inter-group variance, MSB, to the intra-group variance, MSE; conversely, as the degree of correlation of the numeric field with the categorical field decreases, the difference in processing factor T that exists between different analysis groups decreases, resulting in a decrease in F; when the numerical type field is not correlated with the typing field, the difference in processing factor, T, existing between different analysis groups is zero and F is equal to 1. Therefore, F can be used to accurately measure the degree of correlation of two fields.
In practical application, the variation range of the F value is large, so that the range is logarithmically reduced. Due to loge20000 ≈ 10, and numerous tests have shown that the F values of the numeric and subtype fields are in most cases [1,20000 ]]Within the interval, a logarithmic function of 0.1. log can therefore be usedeAnd F describes a relevance index. For rare excesses of the common interval, the logarithm function is not [0,1 ]]The value of F within the range may be defined as a boundary value by saturation. Thus, F is converted into a correlation index R in an ideal interval by a saturated logarithmic function12Therefore, quantitative analysis of the correlation degree of the numerical field and the classification field is realized. In some alternative implementations, the initial value F of the degree of correlation may be directly used as the index of the degree of correlation, and other calculation results of the inter-group variance MSB and the intra-group variance MSE, such as (MSB-MSE)/MSE, may be determined as the initial value F of the degree of correlation; log in the middle of correlationeWhen F performs the transformation, any other suitable transformation method may be used without being limited to the above-described saturation logarithm function.
In this embodiment, when any two fields X, Y to be analyzed in the database table are both numeric fields, the absolute value of the spearman correlation coefficient for both fields is determined as the correlation index for both fields. Specifically, a headThe elements in X and Y are sorted in ascending order, and the sorted list is marked as X0And Y0Then, the Spearman correlation coefficient r of Spearman is calculated according to the following formulaXY
Figure BDA0002770756460000121
Wherein the content of the first and second substances,
Figure BDA0002770756460000122
denotes that the ith element in X is in XoIn the position (a) of (b),
Figure BDA0002770756460000123
denotes that the i-th element in Y is in YoN denotes the number of elements of any field (the number of elements of the field to be analyzed is the same). Finally, r can beXYAs a correlation index. It will be appreciated that the correlation index is in the zero to one interval.
When any two fields to be analyzed in the database table (e.g., field 1, field 2) are both classified fields, the Cramer correlation coefficient of the two fields is determined as the correlation index. Specifically, for field 1, field 2, chi-squared independence analysis is performed first. Setting the element types of the field 1 and the field 2 as s and c respectively, wherein the s and the c are not less than 2, establishing an s multiplied by c observation frequency list table according to the elements in the field 1 and the field 2, and setting the value f of the unit cell in the ith row and the jth column in the tableijThe number of elements that take the ith element category in the field 1 and the jth element category in the field 2 is represented. Then calculating the expected frequency of each cell in the observation frequency list table to generate an s × c expected frequency list table, wherein the cell value of the ith row and the jth column in the table is as follows:
Figure BDA0002770756460000124
where k is a positive integer, and N represents the number of elements of field 1 or field 2 (the number of elements of field 1 is equal to the number of elements of field 2).
Then, chi-square statistic chi of field 1 and field 2 is calculated2If each one of
Figure BDA0002770756460000131
Not less than 5, then:
Figure BDA0002770756460000132
if present, less than 5
Figure BDA0002770756460000133
Then:
Figure BDA0002770756460000134
finally, the Cramer correlation coefficient of field 1 and field 2, i.e., the Cramer's V correlation coefficient, is calculated as the correlation index using the following formula:
Figure BDA0002770756460000135
it will be appreciated that the correlation index calculated according to the above steps is in the zero to one interval.
Through the setting, the two numerical fields, the two sub-type fields and the correlation indexes between the numerical fields and the classification fields can be accurately calculated after the field types are rapidly distinguished, so that the unified standard of the database table field correlation analysis is provided. After the relevancy indexes of any two fields of the database table are obtained, the relevancy indexes can be displayed through various data visualization methods.
Example two
FIG. 4 is a diagram illustrating a specific implementation of the method for determining the relevancy of the fields in the database table according to the embodiment of the present invention. As shown in FIG. 4, the method for determining the relevancy of the fields in the database table according to the embodiment of the invention may include three parts, namely preprocessing, relevancy analysis and result input.
In the preprocessing portion, a data flush needs to be performed first against the database table. Illustratively, if the database table is not in csv (Comma-Separated Values) format, then a string split is required for the header row and each row of records; unifying the formats of elements in each field, removing redundant spaces, punctuations, messy codes and the like, and unifying invalid values such as NULL, None and the like and missing values into NULL characters. The field type of the field to be analyzed can be determined according to the method in the first embodiment. Finally, initialization of the relevance matrix and the weight connection graph is performed (the relevance matrix and the weight connection graph are used for relevance index visualization, which will be described later).
In the correlation analysis part, correlation analysis needs to be performed on any two fields in a database table, and before analysis, it is first checked whether the number of elements of each field to be analyzed is greater than a preset threshold number: if yes, performing subsequent analysis; otherwise, the field is discarded. Thereafter, the correlation index may be calculated for each situation according to the method described in embodiment one.
In the result output section, the correlation index may be input to a preset correlation matrix and/or a weight connection map. Fig. 5 is a schematic diagram of a correlation matrix according to an embodiment of the present invention, and fig. 6 is a schematic diagram of a weight connection diagram according to an embodiment of the present invention. As shown in fig. 5, the number of rows and columns of the correlation matrix is equal to the total number of fields to be analyzed in the database table, each row and each column respectively correspond to the identifiers of the fields to be analyzed in the database table arranged in the preset sequence (i.e., each column from left to right corresponds to the identifier of the fields to be analyzed in the database table arranged in the sequence, each row from top to bottom corresponds to the identifier of the fields to be analyzed in the database table arranged in the sequence, and the field identifier may be a field name), any element in the correlation matrix is a correlation index between the field corresponding to the row where the element is located and the field corresponding to the column where the element is located, and the gray value of the element is positively correlated with the correlation index. It is to be understood that the elements in the correlation matrix of fig. 5 are each omitted by a percentile.
As shown in fig. 6, the weight connection graph includes nodes arranged along the circumferential direction and used for representing fields to be analyzed in the database table, and connection lines located between any two nodes and used for representing the relevance indexes; the nodes are configured with different colors for representing different field types, the connecting lines are configured with different colors for representing different correlation index types, the width and the color depth (i.e. the integrated gray scale of three channels to which the colors belong) of the connecting lines are positively correlated with the correlation index represented by the connecting lines, and the correlation index types comprise: a relevance indicator between two numeric fields, a relevance indicator between two categorical fields, and a relevance indicator between a numeric field and a categorical field. In fig. 6, since nodes of different colors and connecting lines of different colors cannot be displayed, only different gray scales are schematically displayed. It can be seen that the database table corresponding to the correlation matrix in fig. 5 and the weight connection graph in fig. 6 has the following fields: the system, a level 1 department, a level 2 department, whether an organization is responsible for people, whether high latency exists, whether core talents, job level sequence, job name, department age, gender, constellation, highest scholarness, whether performance is excellent, type of work hours, nationality, ethnicity, political face, marital status, province, city, department age, same-job duration, promotion interval, promotion speed, training duration, work saturation and performance level.
According to the technical scheme of the embodiment of the invention, when the correlation analysis is executed on the field in the database table, the field type is judged quickly and accurately according to the field element and a preset regular expression; then, the correlation index can be calculated by adopting corresponding methods according to different field types, for example, when two fields to be analyzed are both numerical fields, the absolute values of the spearman correlation coefficients of the two fields are used as the correlation index, and when the two fields to be analyzed are both classification type fields, the gram correlation coefficients of the two fields are used as the correlation index; specifically, when one of the two fields to be analyzed is a numeric field and the other is a typing field, the embodiment of the present invention first divides the elements in the data value field into a plurality of analysis groups according to the element class in the typing field, and then calculates the inter-group variance and the intra-group variance for each analysis group and uses the quotient of the two as a correlation index, thereby implementing quantitative correlation analysis of the numeric field and the typing field. To sum up, the embodiment of the present invention provides a unified analysis standard from field type determination to correlation calculation, and when two numeric fields, two subtype fields, or a numeric field and a subtype field are faced, quantitative correlation analysis can be performed to obtain a correlation index between zero and one.
It should be noted that, for the convenience of description, the foregoing method embodiments are described as a series of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts described, and that some steps may in fact be performed in other orders or concurrently. Moreover, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no acts or modules are necessarily required to implement the invention.
To facilitate a better implementation of the above-described aspects of embodiments of the present invention, the following also provides relevant means for implementing the above-described aspects.
Referring to fig. 7, an apparatus 700 for determining the relevance of a field in a database table according to an embodiment of the present invention may include: a field type determination unit 701, a grouping unit 702, and a correlation calculation unit 703.
Wherein, the field type determining unit 701 may be configured to: for any two fields to be analyzed in a database table, judging the field type of each field according to the element of the field; wherein the field types include: a numeric field and a categorical field, the elements in the categorical field belonging to at least two element categories; the grouping unit 702 may be configured to: when one of the two fields is a numerical field and the other field is a typing field, determining elements belonging to the same element category in the typing field, and forming an analysis group by the elements in the numerical field corresponding to the elements; the correlation calculation unit 703 may be configured to determine an interclass variance and an intraclass variance for each analysis group, from which a correlation index for the two fields is obtained.
In this embodiment of the present invention, the field type determining unit 701 may further be configured to: for any field to be analyzed, judging whether the proportion of elements in the field, which accord with a preset first regular expression, is not less than a first threshold value: if yes, determining the field as a numerical field; the first regular expression is used for matching floating point numbers; if the proportion of the elements in the field which accord with the first regular expression is smaller than a first threshold, judging whether the number of the elements in the field after the duplication removal is larger than 1 and not larger than a second threshold: if yes, determining the field as a type-divided field; wherein the second threshold is related to and less than the total number of elements in the field.
In an alternative, the field type determining unit 701 may be further configured to: for any field to be analyzed, judging whether the number of elements in the field after de-duplication is larger than 1 and not larger than a second threshold value: if yes, determining the field as a type-divided field; wherein the second threshold is related to and less than the total number of elements in the field; if the number of the elements in the field after the duplication removal is 1 or is greater than a second threshold, whether the proportion of the elements in the field which accord with a preset second regular expression is not less than a third threshold is judged: if yes, determining the field as a numerical field; wherein the second regular expression is used for matching floating point numbers and integers.
In a specific application, the correlation calculation unit 703 may be further configured to: dividing the variance between the groups by the variance in the groups to obtain an initial value of the correlation degree of the two fields, and determining the natural logarithm of the initial value of the correlation degree as a middle value of the correlation degree; and transforming the correlation intermediate value to a value interval from zero to one to form a correlation index of the two fields.
In practical applications, the correlation calculation unit 703 may be further configured to: when the correlation intermediate value is less than zero, determining the correlation index as zero; when the correlation degree intermediate value is larger than a first numerical value, determining the correlation degree index as one; wherein the first value is a real number greater than one; when the correlation degree intermediate value is not less than zero and not more than a first numerical value, determining the correlation degree index as a product of the correlation degree intermediate value and a second numerical value; wherein the second value is the inverse of the first value.
As a preferable scheme, the correlation calculation unit 703 may be further configured to: when any two fields to be analyzed in the database table are numerical fields, determining the absolute values of the spearman correlation coefficients of the two fields as the correlation indexes of the two fields; when any two fields to be analyzed in the database table are classified fields, determining the Cramer correlation coefficient of the two fields as the correlation index of the two fields.
Preferably, the apparatus 700 may further comprise a first visualization unit for: after obtaining the relevance indexes of any two fields to be analyzed in the database table, inputting the relevance indexes into a preset relevance matrix; the row number and the column number of the correlation matrix are both equal to the total number of the fields to be analyzed of the database table, each row and each column respectively correspond to the identifiers of the fields to be analyzed in the database table which are arranged in a preset sequence, any element in the correlation matrix is a correlation index between the field corresponding to the row where the element is located and the field corresponding to the column where the element is located, and the gray value of the element is positively correlated with the correlation index.
Furthermore, in an embodiment of the present invention, the apparatus 700 may further comprise a second visualization unit for: after obtaining the relevance indexes of any two fields to be analyzed in the database table, inputting the relevance indexes into a preset weight connection diagram; the weight connection graph comprises nodes which are arranged along the circumferential direction and used for representing fields to be analyzed in the database table, and connecting lines which are positioned between any two nodes and used for representing correlation indexes; the nodes are configured with different colors for representing different field types, the connecting lines are configured with different colors for representing different correlation index types, the width and the color depth of the connecting line are positively correlated with the correlation index represented by the connecting line, and the correlation index types comprise: a relevance indicator between two numeric fields, a relevance indicator between two categorical fields, and a relevance indicator between a numeric field and a categorical field.
According to the technical scheme of the embodiment of the invention, when the correlation analysis is executed on the field in the database table, the field type is judged quickly and accurately according to the field element and a preset regular expression; then, the correlation index can be calculated by adopting corresponding methods according to different field types, for example, when two fields to be analyzed are both numerical fields, the absolute values of the spearman correlation coefficients of the two fields are used as the correlation index, and when the two fields to be analyzed are both classification type fields, the gram correlation coefficients of the two fields are used as the correlation index; specifically, when one of the two fields to be analyzed is a numeric field and the other is a typing field, the embodiment of the present invention first divides the elements in the data value field into a plurality of analysis groups according to the element class in the typing field, and then calculates the inter-group variance and the intra-group variance for each analysis group and uses the quotient of the two as a correlation index, thereby implementing quantitative correlation analysis of the numeric field and the typing field. To sum up, the embodiment of the present invention provides a unified analysis standard from field type determination to correlation calculation, and when two numeric fields, two subtype fields, or a numeric field and a subtype field are faced, quantitative correlation analysis can be performed to obtain a correlation index between zero and one.
FIG. 8 illustrates an exemplary system architecture 800 for a method of determining the relevance of fields in a database table or an apparatus for determining the relevance of fields in a database table to which embodiments of the present invention may be applied.
As shown in fig. 8, the system architecture 800 may include terminal devices 801, 802, 803, a network 804 and a server 805 (this architecture is merely an example, and the components included in a particular architecture may be adapted according to the application specific circumstances). The network 804 serves to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. Various client applications, such as applications that perform relevance statistics (for example only), may be installed on the terminal devices 801, 802, 803.
The terminal devices 801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 805 may be a server that provides various services, such as an arithmetic server (for example only) that provides support for applications that perform correlation statistics operated by users using the terminal devices 801, 802, 803. The calculation server may process the received correlation calculation request and feed back the processing result (e.g., the calculated correlation index — just an example) to the terminal devices 801, 802, 803.
It should be noted that the method for determining the field relevancy in the database table provided by the embodiment of the present invention is generally executed by the server 805, and accordingly, the apparatus for determining the field relevancy in the database table is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The invention also provides the electronic equipment. The electronic device of the embodiment of the invention comprises: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the method for determining the field relevancy in the database table.
Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use in implementing an electronic device of an embodiment of the present invention. The electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the computer system 900 are also stored. The CPU901, ROM 902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
In particular, the processes described in the main step diagrams above may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the main step diagram. In the above-described embodiment, the computer program can be downloaded and installed from the network via the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the central processing unit 901, performs the above-described functions defined in the system of the present invention.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a field type determination unit, a grouping unit, and a correlation calculation unit. Where the names of these elements do not in some cases constitute a limitation on the elements themselves, for example, the field type determination element may also be described as an "element providing a field type to a packet element".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to perform steps comprising: for any two fields to be analyzed in a database table, judging the field type of each field according to the element of the field; wherein the field types include: a numeric field and a categorical field, the elements in the categorical field belonging to at least two element categories; when one of the two fields is a numerical field and the other field is a typing field, determining elements belonging to the same element category in the typing field, and forming an analysis group by the elements in the numerical field corresponding to the elements; determining an interclass variance and an intraclass variance for each analysis group, and obtaining a correlation index of the two fields according to the interclass variance and the intraclass variance.
According to the technical scheme of the embodiment of the invention, when the correlation analysis is executed on the field in the database table, the field type is judged quickly and accurately according to the field element and a preset regular expression; then, the correlation index can be calculated by adopting corresponding methods according to different field types, for example, when two fields to be analyzed are both numerical fields, the absolute values of the spearman correlation coefficients of the two fields are used as the correlation index, and when the two fields to be analyzed are both classification type fields, the gram correlation coefficients of the two fields are used as the correlation index; specifically, when one of the two fields to be analyzed is a numeric field and the other is a typing field, the embodiment of the present invention first divides the elements in the data value field into a plurality of analysis groups according to the element class in the typing field, and then calculates the inter-group variance and the intra-group variance for each analysis group and uses the quotient of the two as a correlation index, thereby implementing quantitative correlation analysis of the numeric field and the typing field. To sum up, the embodiment of the present invention provides a unified analysis standard from field type determination to correlation calculation, and when two numeric fields, two subtype fields, or a numeric field and a subtype field are faced, quantitative correlation analysis can be performed to obtain a correlation index between zero and one.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method for determining relevance of fields in a database table, comprising:
for any two fields to be analyzed in a database table, judging the field type of each field according to the element of the field; wherein the field types include: a numeric field and a categorical field, the elements in the categorical field belonging to at least two element categories;
when one of the two fields is a numerical field and the other field is a typing field, determining elements belonging to the same element category in the typing field, and forming an analysis group by the elements in the numerical field corresponding to the elements;
determining an interclass variance and an intraclass variance for each analysis group, and obtaining a correlation index of the two fields according to the interclass variance and the intraclass variance.
2. The method of claim 1, wherein determining the field type of each field according to the element of the field comprises:
for any field to be analyzed, judging whether the proportion of elements in the field, which accord with a preset first regular expression, is not less than a first threshold value: if yes, determining the field as a numerical field; the first regular expression is used for matching floating point numbers;
if the proportion of the elements in the field which accord with the first regular expression is smaller than a first threshold, judging whether the number of the elements in the field after the duplication removal is larger than 1 and not larger than a second threshold: if yes, determining the field as a type-divided field; wherein the second threshold is related to and less than the total number of elements in the field.
3. The method of claim 1, wherein determining the field type of each field according to the element of the field comprises:
for any field to be analyzed, judging whether the number of elements in the field after de-duplication is larger than 1 and not larger than a second threshold value: if yes, determining the field as a type-divided field; wherein the second threshold is related to and less than the total number of elements in the field;
if the number of the elements in the field after the duplication removal is 1 or is greater than a second threshold, whether the proportion of the elements in the field which accord with a preset second regular expression is not less than a third threshold is judged: if yes, determining the field as a numerical field; wherein the second regular expression is used for matching floating point numbers and integers.
4. The method of claim 1, wherein obtaining the correlation indicator for the two fields according to the inter-group variance and the intra-group variance comprises:
dividing the variance between the groups by the variance in the groups to obtain an initial value of the correlation degree of the two fields, and determining the natural logarithm of the initial value of the correlation degree as a middle value of the correlation degree;
and transforming the correlation intermediate value to a value interval from zero to one to form a correlation index of the two fields.
5. The method of claim 4, wherein transforming the correlation intermediate value to a value range from zero to one to form a correlation indicator for the two fields comprises:
when the correlation intermediate value is less than zero, determining the correlation index as zero;
when the correlation degree intermediate value is larger than a first numerical value, determining the correlation degree index as one; wherein the first value is a real number greater than one;
when the correlation degree intermediate value is not less than zero and not more than a first numerical value, determining the correlation degree index as a product of the correlation degree intermediate value and a second numerical value; wherein the second value is the inverse of the first value.
6. The method of claim 4, further comprising:
when any two fields to be analyzed in the database table are numerical fields, determining the absolute values of the spearman correlation coefficients of the two fields as the correlation indexes of the two fields;
when any two fields to be analyzed in the database table are classified fields, determining the Cramer correlation coefficient of the two fields as the correlation index of the two fields.
7. The method of claim 6, further comprising:
after obtaining the relevance indexes of any two fields to be analyzed in the database table, inputting the relevance indexes into a preset relevance matrix; wherein the content of the first and second substances,
the row number and the column number of the correlation matrix are both equal to the total number of the fields to be analyzed in the database table, each row and each column respectively correspond to the identifiers of the fields to be analyzed in the database table which are arranged in a preset sequence, any element in the correlation matrix is a correlation index between the field corresponding to the row where the element is located and the field corresponding to the column where the element is located, and the gray value of the element is positively correlated with the correlation index.
8. The method of claim 6, further comprising:
after obtaining the relevance indexes of any two fields to be analyzed in the database table, inputting the relevance indexes into a preset weight connection diagram; wherein the content of the first and second substances,
the weight connection graph comprises nodes which are arranged along the circumferential direction and used for representing fields to be analyzed in the database table, and connecting lines which are arranged between any two nodes and used for representing correlation indexes; the nodes are configured with different colors for representing different field types, the connecting lines are configured with different colors for representing different correlation index types, the width and the color depth of the connecting line are positively correlated with the correlation index represented by the connecting line, and the correlation index types comprise: a relevance indicator between two numeric fields, a relevance indicator between two categorical fields, and a relevance indicator between a numeric field and a categorical field.
9. An apparatus for determining relevance of fields in a database table, comprising:
a field type determination unit to: for any two fields to be analyzed in a database table, judging the field type of each field according to the element of the field; wherein the field types include: a numeric field and a categorical field, the elements in the categorical field belonging to at least two element categories;
a grouping unit for: when one of the two fields is a numerical field and the other field is a typing field, determining elements belonging to the same element category in the typing field, and forming an analysis group by the elements in the numerical field corresponding to the elements;
and the correlation calculation unit is used for determining the interclass variance and the intraclass variance of each analysis group and obtaining the correlation indexes of the two fields according to the interclass variance and the intraclass variance.
10. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202011248181.1A 2020-11-10 2020-11-10 Method and device for determining field relevancy in database table Pending CN113761297A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011248181.1A CN113761297A (en) 2020-11-10 2020-11-10 Method and device for determining field relevancy in database table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011248181.1A CN113761297A (en) 2020-11-10 2020-11-10 Method and device for determining field relevancy in database table

Publications (1)

Publication Number Publication Date
CN113761297A true CN113761297A (en) 2021-12-07

Family

ID=78786034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011248181.1A Pending CN113761297A (en) 2020-11-10 2020-11-10 Method and device for determining field relevancy in database table

Country Status (1)

Country Link
CN (1) CN113761297A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103825784A (en) * 2014-03-24 2014-05-28 中国人民解放军信息工程大学 Non-public protocol field identification method and system
CN104714969A (en) * 2013-12-16 2015-06-17 阿里巴巴集团控股有限公司 Detection method and device for attribute values
CN109117440A (en) * 2017-06-23 2019-01-01 ***通信集团公司 A kind of metadata information acquisition methods, system and computer readable storage medium
CN109240882A (en) * 2018-08-30 2019-01-18 广发证券股份有限公司 A kind of finance data consistency detection system and method
CN109784407A (en) * 2019-01-17 2019-05-21 京东数字科技控股有限公司 The method and apparatus for determining the type of literary name section
CN111104466A (en) * 2019-12-25 2020-05-05 航天科工网络信息发展有限公司 Method for rapidly classifying massive database tables
US10705695B1 (en) * 2016-09-26 2020-07-07 Splunk Inc. Display of interactive expressions based on field name selections

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104714969A (en) * 2013-12-16 2015-06-17 阿里巴巴集团控股有限公司 Detection method and device for attribute values
CN103825784A (en) * 2014-03-24 2014-05-28 中国人民解放军信息工程大学 Non-public protocol field identification method and system
US10705695B1 (en) * 2016-09-26 2020-07-07 Splunk Inc. Display of interactive expressions based on field name selections
CN109117440A (en) * 2017-06-23 2019-01-01 ***通信集团公司 A kind of metadata information acquisition methods, system and computer readable storage medium
CN109240882A (en) * 2018-08-30 2019-01-18 广发证券股份有限公司 A kind of finance data consistency detection system and method
CN109784407A (en) * 2019-01-17 2019-05-21 京东数字科技控股有限公司 The method and apparatus for determining the type of literary name section
CN111104466A (en) * 2019-12-25 2020-05-05 航天科工网络信息发展有限公司 Method for rapidly classifying massive database tables

Similar Documents

Publication Publication Date Title
CN109634941B (en) Medical data processing method and device, electronic equipment and storage medium
WO2021164231A1 (en) Official document abstract extraction method and apparatus, and device and computer readable storage medium
CN108897874B (en) Method and apparatus for processing data
CN113327136B (en) Attribution analysis method, attribution analysis device, electronic equipment and storage medium
CN112579621B (en) Data display method and device, electronic equipment and computer storage medium
CN109614327B (en) Method and apparatus for outputting information
JP2023036681A (en) Task processing method, processing device, electronic equipment, storage medium, and computer program
CN112181936A (en) Database detection method and device
CN112418721A (en) Index determination method and device
CN115409419A (en) Value evaluation method and device of business data, electronic equipment and storage medium
CN115422924A (en) Information matching method and device, electronic equipment and storage medium
CN114741392A (en) Data query method and device, electronic equipment and storage medium
Weine et al. Application of equal local levels to improve QQ plot testing bands with R package qqconf
CN113987086A (en) Data processing method, data processing device, electronic device, and storage medium
CN107291923B (en) Information processing method and device
CN113326255A (en) Method and device for screening effective test data, terminal equipment and storage medium
CN112965943A (en) Data processing method and device, electronic equipment and storage medium
CN113761297A (en) Method and device for determining field relevancy in database table
CN115809228A (en) Data comparison method and device, storage medium and electronic equipment
CN116414814A (en) Data checking method, device, equipment, storage medium and program product
CN115563310A (en) Method, device, equipment and medium for determining key service node
CN112579673A (en) Multi-source data processing method and device
CN113934894A (en) Data display method based on index tree and terminal equipment
CN113704236A (en) Government affair system data quality evaluation method, device, terminal and storage medium
CN113434490A (en) Quality detection method and device for offline imported data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination