CN116383498A - Data matching method and device, electronic equipment and storage medium - Google Patents

Data matching method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116383498A
CN116383498A CN202310348655.7A CN202310348655A CN116383498A CN 116383498 A CN116383498 A CN 116383498A CN 202310348655 A CN202310348655 A CN 202310348655A CN 116383498 A CN116383498 A CN 116383498A
Authority
CN
China
Prior art keywords
data
processed
field
matching
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310348655.7A
Other languages
Chinese (zh)
Inventor
李登高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianren Healthcare Big Data Technology Co Ltd
Original Assignee
Lianren Healthcare Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianren Healthcare Big Data Technology Co Ltd filed Critical Lianren Healthcare Big Data Technology Co Ltd
Priority to CN202310348655.7A priority Critical patent/CN116383498A/en
Publication of CN116383498A publication Critical patent/CN116383498A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Collating Specific Patterns (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a data matching method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of pieces of data to be processed, and dividing the plurality of pieces of data to be processed into data groups to be processed according to a preset rule; determining field similarity of data contents corresponding to two pieces of data to be processed in each data group to be processed under the same characteristic field; determining a matching result corresponding to the current feature field according to the field similarity of the current feature field and a corresponding preset field similarity threshold value, and determining a matching attribute value corresponding to the current feature field based on the matching result and a corresponding similarity influence parameter; and determining the data matching attribute value corresponding to the current data set to be processed according to the matching attribute values of the two pieces of data to be processed in the current data set to be processed under each characteristic field, so as to determine the data matching result corresponding to the current data set to be processed based on the data matching attribute value. According to the technical scheme, the data matching precision is improved.

Description

Data matching method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data management technologies, and in particular, to a data matching method, a data matching device, an electronic device, and a storage medium.
Background
With the development of information technology, various emerging information technologies such as the Internet, cloud service, the Internet of things and the like are fused with the medical industry, wherein the application of a big data system brings unprecedented technological breakthroughs for the health medical industry. With the continuous development of big data technology, the status of data management in the medical industry is also becoming more and more important.
In the data management process, when the same user integrates the treatment data of different medical systems, certain differences of the treatment data of the same user in different medical systems may occur, so that the data cannot be unified into the same data form.
Currently, when matching multiple pieces of medical data, a similarity algorithm is generally used to determine a field coincidence rate between the pieces of data, so as to determine whether the pieces of medical data are similar based on the field coincidence rate. However, when these data are manually determined to be similar data, there may be a case where the field overlapping ratio obtained based on the similarity algorithm is low, resulting in that the computer may determine these data as dissimilar data, which may result in lower matching accuracy when matching is performed based on the computer, affecting the data governance efficiency.
Disclosure of Invention
The invention provides a data matching method, a data matching device, electronic equipment and a storage medium, which are used for improving the data matching accuracy and further improving the data management efficiency on the premise of ensuring the data matching efficiency.
According to an aspect of the present invention, there is provided a data matching method, the method comprising:
acquiring a plurality of pieces of data to be processed, and dividing the plurality of pieces of data to be processed into at least one data group to be processed according to a preset rule; each data group to be processed comprises two pieces of data to be processed, and the data to be processed comprises data content corresponding to each characteristic field;
determining field similarity of data contents corresponding to two pieces of data to be processed, which are included in each data set to be processed, under the same characteristic field;
for each characteristic field, determining a matching result corresponding to the current characteristic field according to the field similarity of the current characteristic field and a corresponding preset field similarity threshold value, and determining a matching attribute value corresponding to the current characteristic field based on the matching result and a corresponding similarity influence parameter; wherein the similarity-affecting parameter comprises a positive-affecting parameter or a negative-affecting parameter;
For each data set to be processed, determining a data matching attribute value corresponding to the current data set to be processed according to the matching attribute values of two pieces of data to be processed included in the current data set to be processed under each characteristic field, so as to determine a data matching result corresponding to the current data set to be processed based on the data matching attribute values.
According to another aspect of the present invention, there is provided a data matching apparatus comprising:
the data group dividing module is used for acquiring a plurality of pieces of data to be processed and dividing the plurality of pieces of data to be processed into at least one data group to be processed according to a preset rule; each data group to be processed comprises two pieces of data to be processed, and the data to be processed comprises data content corresponding to each characteristic field;
the field similarity determining module is used for determining the field similarity of the data content corresponding to the two pieces of data to be processed, which are included in each data set to be processed, under the same characteristic field;
the matching attribute value determining module is used for determining a matching result corresponding to the current characteristic field according to the field similarity of the current characteristic field and a corresponding preset field similarity threshold value for each characteristic field, and determining a matching attribute value corresponding to the current characteristic field based on the matching result and a corresponding similarity influence parameter; wherein the similarity-affecting parameter comprises a positive-affecting parameter or a negative-affecting parameter;
The data matching attribute value determining module is used for determining, for each data set to be processed, a data matching attribute value corresponding to the current data set to be processed according to the matching attribute values of two pieces of data to be processed included in the current data set to be processed under each characteristic field, so as to determine a data matching result corresponding to the current data set to be processed based on the data matching attribute values.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data matching method of any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data matching method according to any embodiment of the present invention.
According to the technical scheme, a plurality of pieces of data to be processed are obtained, the pieces of data to be processed are divided into at least one data set to be processed according to the preset rule, then, the field similarity of data contents corresponding to two pieces of data to be processed, which are included in each data set to be processed, under the same characteristic field is determined, further, according to the field similarity corresponding to the current characteristic field and the corresponding preset field similarity threshold value, the matching result corresponding to the current characteristic field is determined, the matching attribute value corresponding to the current characteristic field is determined based on the matching result and the corresponding similarity influence parameter, finally, according to the matching attribute value of two pieces of data to be processed, which are included in each data set to be processed, the data matching attribute value corresponding to the current data set to be processed is determined based on the data matching attribute value, the situation that the field coincidence rate is very low based on the similarity algorithm in the prior art is solved, the situation that the data is judged to be dissimilar based on the computer is caused, the matching efficiency of the computer can be improved based on the matching result of the data is improved, and the accuracy of the matching effect of the computer is further improved when the matching effect is achieved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data matching method according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data matching device according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device implementing a data matching method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It can be appreciated that before using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized by appropriate ways according to relevant laws and regulations for the type, usage range, usage scenario, etc. of the personal information related to the present disclosure.
Example 1
Fig. 1 is a flowchart of a data matching method according to a first embodiment of the present invention, where the method may be performed by a data matching device, and the data matching device may be implemented in hardware and/or software, and the data matching device may be configured in a terminal and/or a server, where the data matching method is applicable to a case that a plurality of pieces of data to be processed are subjected to identification matching by using a plurality of feature fields when no unique identification feature field exists. As shown in fig. 1, the method includes:
S110, acquiring a plurality of pieces of data to be processed, and dividing the plurality of pieces of data to be processed into at least one data group to be processed according to a preset rule.
It should be noted that, before using the technical solution of the embodiment of the present invention, the user is informed about the acquired information, the application, and the like by law, and is authorized by the user.
In this embodiment, the data to be processed may be data that is stored in a plurality of service systems and needs to be processed. In the practical application process, the data corresponding to the same user may be stored in different service systems, and the format of the data stored may also have a certain difference, when the data stored in different service systems by the same user are integrated, the original data acquired from different service systems can be used as the data to be processed, which needs to be integrated. The data to be processed comprises data content corresponding to each characteristic field. The feature field may be a field for characterizing a user-specific attribute feature. By way of example, the characteristic fields may include "name", "identity", "gender", "age", and "address", etc. Accordingly, the data content may be a data value of the data to be processed under the characteristic field. The preset rule may be a grouping rule preset for grouping data into the data to be processed. It should be noted that, the preset rule may be set based on the user requirement, and may also be updated in the subsequent application process. The data set to be processed may be a data set comprising two pieces of data to be processed.
In a specific implementation, when data processing is performed, the data to be processed stored in the plurality of service systems can be summarized together, and further, after a plurality of pieces of data to be processed are obtained, the data to be processed can be divided into groups according to a preset rule, so that at least one data set to be processed including two pieces of data to be processed can be obtained. The advantages of this arrangement are that: the similarity between any two pieces of data to be processed is facilitated to be compared, and further, the data processing efficiency is improved.
S120, determining field similarity of data contents corresponding to two pieces of data to be processed, which are included in each data group to be processed, under the same characteristic field.
In this embodiment, the field similarity may be the degree of matching between the data values of the data contents under the same characteristic field between two pieces of data to be processed. In practical application, when determining the field similarity of two pieces of data to be processed in a data set to be processed, for each feature field, firstly determining the data content corresponding to the two pieces of data to be processed under the current feature field, then performing data value matching on the two pieces of data content, determining the number of characters with the same data value in the two pieces of data content, further, determining the ratio between the number of characters and the data content with more characters in the two pieces of data content, and taking the ratio obtained at the moment as the field similarity of the current feature field.
In a specific implementation, after at least one group of to-be-processed data sets is obtained, for each group of to-be-processed data sets, two pieces of to-be-processed data in the current data set may be first processed according to a preset similarity algorithm to determine the number of character coincidence of data contents corresponding to the two pieces of to-be-processed data under the same feature field, and further, a ratio between the number of character coincidence and the number of characters of to-be-processed data with relatively more data contents in the two pieces of to-be-processed data under the same feature field may be determined, so that the ratio may be used as a field similarity of data contents corresponding to the two pieces of to-be-processed data included in the current to-be-processed data set under the same feature field.
S130, aiming at each characteristic field, determining a matching result corresponding to the current characteristic field according to the field similarity corresponding to the current characteristic field and a corresponding preset field similarity threshold, and determining a matching attribute value corresponding to the current characteristic field based on the matching result and a corresponding similarity influence parameter.
In this embodiment, the preset field similarity threshold may be a preset boundary value for determining whether the data contents corresponding to the data to be processed under the current feature field match with each other. It should be noted that, the preset field similarity threshold is matched with the feature field, that is, for different feature fields, different preset field similarity thresholds are corresponding. The matching result may be a matching condition of the data content corresponding to the data to be processed under the current feature field. Alternatively, the matching result may include a field match or a field mismatch. The similarity-influencing parameter may be a parameter that affects any similarity to some extent. Alternatively, the similarity-influencing parameters may include a positive-influencing parameter and a negative-influencing parameter. The forward influencing parameter may be a preset parameter that plays a role in promoting a matching attribute value of data content corresponding to the data to be processed under the same characteristic field. The forward influencing parameter may be any positive number, alternatively 1. The negative influence parameter may be a preset parameter that performs an obstructing function on the matching attribute value of the data content corresponding to the data to be processed under the same feature field. The negative-going influencing parameter may be any negative number, alternatively, -1. The matching attribute value may be a numerical value for characterizing the matching degree of the data content corresponding to the data to be processed under this feature field.
Optionally, determining the matching result corresponding to the current feature field according to the field similarity of the current feature field and a corresponding preset field similarity threshold includes: for each feature field: if the field similarity of the current characteristic field is detected to be greater than or equal to a corresponding preset field similarity threshold value, determining that the matching result is field matching; and if the field similarity of the current characteristic field is detected to be smaller than the corresponding preset field similarity threshold value, determining that the matching result is field mismatch.
In a specific implementation, after obtaining the field similarity corresponding to each feature field of the two pieces of data to be processed, since the preset field similarity threshold corresponding to each feature field is different, for each feature field, the preset field similarity threshold corresponding to the current feature field may be determined first, then the field similarity corresponding to the current feature field may be compared with the preset field similarity threshold, if the field similarity corresponding to the current feature field is greater than or equal to the corresponding preset field similarity threshold, it may be determined that the matching result corresponding to the two pieces of data to be processed under the current feature field is field matching, and conversely, if it is detected that the field similarity corresponding to the current feature field is smaller than the corresponding preset field similarity threshold, it may be determined that the matching result corresponding to the two pieces of data to be processed under the current feature field is field mismatch.
For example, if the similarity of the fields corresponding to the current feature field of the two pieces of data to be processed is 0.75, and meanwhile, when the preset field threshold corresponding to the current feature field is 0.8, it may be determined that the matching result corresponding to the current feature field of the two pieces of data to be processed is that the fields are not matched; if the field similarity of the two pieces of data to be processed corresponding to the current characteristic field is 0.85, and meanwhile, when the preset field threshold corresponding to the current characteristic field is 0.8, the matching result corresponding to the two pieces of data to be processed under the current characteristic field can be determined to be field matching.
Further, after the matching results corresponding to the two pieces of data to be processed under the same characteristic field are determined, the matching attribute values corresponding to the two pieces of data to be processed under the same characteristic field can be determined according to the matching results and the corresponding similarity influence parameters. When determining the matching attribute values, different matching results and corresponding matching attribute value determining manners are different, and the determining manners of the matching attribute values can be described below.
Optionally, determining the matching attribute value corresponding to the current feature field based on the matching result and the corresponding similarity-affecting parameter includes: if the matching result is field matching, determining a matching attribute value corresponding to the feature field based on the field similarity of the feature field and the corresponding forward direction influence parameter.
In a specific implementation, for each feature field, if it is detected that the matching result corresponding to the two pieces of data to be processed under the current feature field is field matching, the field similarity corresponding to the current feature field and the forward influence parameter corresponding to the current feature field may be determined, so as to determine the matching attribute value of the two pieces of data to be processed under the current feature field according to the field similarity and the forward influence parameter.
Optionally, determining the matching attribute value corresponding to the feature field based on the field similarity of the feature field and the corresponding forward influencing parameter includes: and multiplying the field similarity with the characteristic field and the corresponding forward influence parameter to obtain a matching attribute value.
In a specific implementation, for each feature field, if it is detected that the matching result corresponding to the current feature field is field matching, a field similarity corresponding to the current feature field and a corresponding forward influence parameter may be determined, and then the field similarity and the forward influence parameter are multiplied, and the product obtained at this time may be used as a matching attribute value corresponding to the current feature field.
For example, if the matching result corresponding to the two pieces of data to be processed under the current feature field is field matching, and the field similarity is 0.85, and the forward influencing parameter is 2, the matching attribute value of the two pieces of data to be processed under the current feature field may be 1.7.
Optionally, determining the matching attribute value corresponding to the current feature field based on the matching result and the corresponding similarity-affecting parameter includes: and if the matching result is that the fields are not matched, taking the negative influence parameters corresponding to the characteristic fields as matching attribute values corresponding to the characteristic fields.
In a specific implementation, if it is detected that the matching results of the two pieces of data to be processed under the current feature field are that the fields are not matched, a negative influence parameter corresponding to the current feature field can be determined, and the negative influence parameter is used as a matching attribute value corresponding to the current feature field.
For example, if the field similarity of the two pieces of data to be processed under the current feature field is 0.75 and the preset field similarity threshold is 0.8, it is determined that the matching results of the two pieces of data to be processed under the current feature field are field mismatch, at this time, it is determined that the negative influence parameter corresponding to the current feature field is-1, and then the matching attribute value corresponding to the current feature field is-1.
It should be noted that, in the actual application process, for two pieces of data to be processed included in each data set to be processed, a situation that the data content corresponding to any feature field in the data to be processed is null may occur, and at this time, when determining the matching attribute values corresponding to the feature field of the two pieces of data to be processed, the matching attribute values may be determined according to the similarity influence parameter corresponding to the feature field.
On the basis of the technical schemes, the method further comprises the following steps: for each characteristic field, if the data content corresponding to the data to be processed under the current characteristic field is detected to be a null value, determining a matching attribute value corresponding to the current characteristic field according to the positive influence parameter and the negative influence parameter corresponding to the current characteristic field.
In a specific implementation, for each feature field, if it is detected that the data content corresponding to at least one piece of data to be processed in two pieces of data to be processed included in the data set to be processed is a null value, a positive influence parameter and a negative influence parameter corresponding to the current feature field can be determined, further, an absolute value of the positive influence parameter and an absolute value of the negative influence parameter are respectively determined, then, an average value of the two absolute values is calculated, and the average value is used as a matching attribute value corresponding to the current feature field.
For example, if it is detected that the data content corresponding to one of the two pieces of data to be processed included in the data set to be processed is null under the characteristic field of "gender", and the positive impact parameter corresponding to the characteristic field of "gender" is 0.5, and the negative impact parameter is-1, the matching attribute value corresponding to the characteristic field may be 0.75.
And S140, determining a data matching attribute value corresponding to the current data set to be processed according to the matching attribute values of the two pieces of data to be processed included in the current data set to be processed under each characteristic field aiming at each data set to be processed, so as to determine a data matching result corresponding to the current data set to be processed based on the data matching attribute values.
In this embodiment, the data matching data value may be a numerical value for characterizing the overall matching condition of two pieces of data to be processed included in each data set to be processed, that is, the determined matching data value on the premise of determining the matching condition of the data contents corresponding to all the feature fields. The data matching result may be a matching condition of two pieces of data to be processed included in each data set to be processed. Alternatively, the data matching result may be data matching, data non-matching, data suspected matching, or the like.
In the actual application process, after determining the matching attribute values of the two pieces of data to be processed included in each data set to be processed under the same characteristic field, for each data set to be processed, the data matching attribute values of the two pieces of data to be processed included in the current data set to be processed can be determined according to the matching attribute values of the two pieces of data to be processed included in the current data set to be processed under the characteristic fields, and then, the data matching results of the two pieces of data to be processed in the current data set to be processed can be determined according to the data matching attribute values.
Optionally, determining, according to the matching attribute values of the two pieces of data to be processed included in the current data set to be processed under each feature field, a data matching attribute value corresponding to the current data set to be processed includes: and determining an average matching attribute value corresponding to each matching attribute value, and taking the average matching attribute value as a data matching attribute value corresponding to the data group to be processed.
In this embodiment, for each data set to be processed, after determining matching attribute values corresponding to two pieces of data to be processed included in the current data set under each feature field, these matching attribute values are added, and a ratio between the added numerical value and the number of feature fields is determined, which may be taken as an average matching attribute value.
In a specific implementation, for each data set to be processed, an average matching attribute value may be determined according to matching attribute values corresponding to two pieces of data to be processed included in the current data set under each feature field, and then the average matching attribute value may be used as a data matching attribute value corresponding to the two pieces of data to be processed included in the current data set to be processed.
Further, after the data matching attribute value corresponding to the current data set to be processed is obtained, the data matching attribute value can be compared with a preset data matching threshold value and a preset data unmatched threshold value, and if the data matching attribute value is detected to be greater than or equal to the data matching threshold value, the data matching result can be determined to be data matching; if the data matching attribute value is detected to be smaller than the data mismatch threshold value, determining that the data matching result is data mismatch; if the data matching attribute value is detected to be between the data matching threshold value and the data non-matching threshold value, the data matching result can be determined to be the data suspected matching.
In the practical application process, after determining the data matching result of the two pieces of data to be processed included in the current data set, for each piece of data to be processed, the data processing modes of the two pieces of data to be processed can be determined respectively according to the difference of the data matching results.
Based on the above, the above technical means further includes: and determining a target data processing mode corresponding to the data group to be processed based on the data matching result, so as to process the data to be processed included in the data group to be processed based on the target data processing mode.
In this embodiment, the target data processing manner may be a data management manner of data to be processed included in the data set to be processed. Alternatively, the target data processing means may include data merging, data group splitting, and other data processing means.
In a specific implementation, for each data set to be processed, if it is detected that the data matching result corresponding to the current data set to be processed is data matching, the two pieces of data to be processed included in the current data set to be processed can be subjected to data merging processing, and meanwhile, the two pieces of data to be processed are marked with data matching identifiers; if the data matching result corresponding to the current data set to be processed is data mismatch, the two pieces of data to be processed included in the current data set to be processed can be subjected to data splitting processing, and meanwhile, data mismatch marks are marked on the two pieces of data to be processed, so that new data sets to be processed can be reconstructed for the two pieces of data to be processed respectively according to the marks, and then, a data matching step is executed again; if the data matching result corresponding to the current data set to be processed is detected to be the data suspected matching, the two pieces of data to be processed included in the current data set to be processed can be subjected to data splitting processing, and meanwhile, data suspected matching identification is marked on the two pieces of data to be processed, so that new data sets to be processed can be reconstructed for the two pieces of data to be processed respectively according to the identification, and then, the data matching step is executed again.
According to the technical scheme, a plurality of pieces of data to be processed are obtained, the pieces of data to be processed are divided into at least one data set to be processed according to the preset rule, then, the field similarity of data contents corresponding to two pieces of data to be processed, which are included in each data set to be processed, under the same characteristic field is determined, further, according to the field similarity corresponding to the current characteristic field and the corresponding preset field similarity threshold value, the matching result corresponding to the current characteristic field is determined, the matching attribute value corresponding to the current characteristic field is determined based on the matching result and the corresponding similarity influence parameter, finally, according to the matching attribute value of two pieces of data to be processed, which are included in each data set to be processed, the data matching attribute value corresponding to the current data set to be processed is determined based on the data matching attribute value, the situation that the field coincidence rate is very low based on the similarity algorithm in the prior art is solved, the situation that the data is judged to be dissimilar based on the computer is caused, the matching efficiency of the computer can be improved based on the matching result of the data is improved, and the accuracy of the matching effect of the computer is further improved when the matching effect is achieved.
Example two
Fig. 2 is a schematic structural diagram of a data matching device according to a second embodiment of the present invention. As shown in fig. 2, the apparatus includes: a data group partitioning module 210, a field similarity determination module 220, a matching attribute value determination module 230, and a data matching attribute value determination module 240.
The data group dividing module 210 is configured to obtain a plurality of pieces of data to be processed, and divide the plurality of pieces of data to be processed into at least one data group to be processed according to a preset rule; each data group to be processed comprises two pieces of data to be processed, and the data to be processed comprises data content corresponding to each characteristic field;
a field similarity determining module 220, configured to determine field similarity of data contents corresponding to two pieces of data to be processed included in each data set to be processed under the same characteristic field;
a matching attribute value determining module 230, configured to determine, for each of the feature fields, a matching result corresponding to a current feature field according to a field similarity of the current feature field and a corresponding preset field similarity threshold, and determine a matching attribute value corresponding to the current feature field based on the matching result and a corresponding similarity influence parameter; wherein the similarity-affecting parameter comprises a positive-affecting parameter or a negative-affecting parameter;
The data matching attribute value determining module 240 is configured to determine, for each of the to-be-processed data sets, a data matching attribute value corresponding to the current to-be-processed data set according to the matching attribute values of the two pieces of to-be-processed data included in the current to-be-processed data set under each of the feature fields, so as to determine a data matching result corresponding to the current to-be-processed data set based on the data matching attribute values.
According to the technical scheme, a plurality of pieces of data to be processed are obtained, the pieces of data to be processed are divided into at least one data set to be processed according to the preset rule, then, the field similarity of data contents corresponding to two pieces of data to be processed, which are included in each data set to be processed, under the same characteristic field is determined, further, according to the field similarity corresponding to the current characteristic field and the corresponding preset field similarity threshold value, the matching result corresponding to the current characteristic field is determined, the matching attribute value corresponding to the current characteristic field is determined based on the matching result and the corresponding similarity influence parameter, finally, according to the matching attribute value of two pieces of data to be processed, which are included in each data set to be processed, the data matching attribute value corresponding to the current data set to be processed is determined based on the data matching attribute value, the situation that the field coincidence rate is very low based on the similarity algorithm in the prior art is solved, the situation that the data is judged to be dissimilar based on the computer is caused, the matching efficiency of the computer can be improved based on the matching result of the data is improved, and the accuracy of the matching effect of the computer is further improved when the matching effect is achieved.
Optionally, the apparatus further includes: and a null value detection module.
And the null value detection module is used for determining a matching attribute value corresponding to the current characteristic field according to the positive influence parameter and the negative influence parameter corresponding to the current characteristic field if the fact that the data content corresponding to the data to be processed under the current characteristic field is null is detected for each characteristic field.
Optionally, for each of the feature fields: the matching attribute value determination module 230 includes: the field matching determination unit and the field mismatch determination unit.
The field matching determining unit is used for determining that the matching result is field matching if the field similarity of the current characteristic field is detected to be greater than or equal to a corresponding preset field similarity threshold value;
and the field mismatch determining unit is used for determining that the matching result is field mismatch if the field similarity of the current characteristic field is detected to be smaller than the threshold value of the similarity of the corresponding preset field.
Optionally, the matching attribute value determining module 230 includes: a first determination unit for matching attribute values and a second determination unit for matching attribute values.
The first determining unit of the matching attribute value is used for determining the matching attribute value corresponding to the characteristic field based on the field similarity of the characteristic field and the corresponding forward direction influence parameter if the matching result is field matching;
And the second determination unit of the matching attribute value is used for taking the negative influence parameter corresponding to the characteristic field as the matching attribute value corresponding to the characteristic field if the matching result is that the fields are not matched.
Optionally, the first determining unit of the matching attribute value is specifically configured to multiply the field similarity with the feature field and the corresponding forward influencing parameter to obtain the matching attribute value.
Optionally, the data matching attribute value determining module 240 is specifically configured to determine an average matching attribute value corresponding to each matching attribute value, and take the average matching attribute value as the data matching attribute value corresponding to the data set to be processed.
Optionally, the apparatus further includes: and the target data processing mode determining module.
And the target data processing mode determining module is used for determining a target data processing mode corresponding to the data group to be processed based on the data matching result so as to process the data to be processed included in the data group to be processed based on the target data processing mode.
The data matching device provided by the embodiment of the invention can execute the data matching method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example III
Fig. 3 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data matching method.
In some embodiments, the data matching method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data matching method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data matching method in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A method of data matching, comprising:
acquiring a plurality of pieces of data to be processed, and dividing the plurality of pieces of data to be processed into at least one data group to be processed according to a preset rule; each data group to be processed comprises two pieces of data to be processed, and the data to be processed comprises data content corresponding to each characteristic field;
determining field similarity of data contents corresponding to two pieces of data to be processed, which are included in each data set to be processed, under the same characteristic field;
For each characteristic field, determining a matching result corresponding to the current characteristic field according to the field similarity of the current characteristic field and a corresponding preset field similarity threshold value, and determining a matching attribute value corresponding to the current characteristic field based on the matching result and a corresponding similarity influence parameter; wherein the similarity-affecting parameter comprises a positive-affecting parameter or a negative-affecting parameter;
for each data set to be processed, determining a data matching attribute value corresponding to the current data set to be processed according to the matching attribute values of two pieces of data to be processed included in the current data set to be processed under each characteristic field, so as to determine a data matching result corresponding to the current data set to be processed based on the data matching attribute values;
wherein the determining, based on the matching result and the corresponding similarity-affecting parameter, a matching attribute value corresponding to the current feature field includes:
if the matching result is field matching, determining a matching attribute value corresponding to the characteristic field based on the field similarity of the characteristic field and the corresponding forward direction influence parameter;
And if the matching result is that the fields are not matched, taking the negative influence parameters corresponding to the characteristic fields as matching attribute values corresponding to the characteristic fields.
2. The method as recited in claim 1, further comprising:
and for each characteristic field, if the data content corresponding to the data to be processed under the current characteristic field is detected to be null, determining a matching attribute value corresponding to the current characteristic field according to the positive influence parameter and the negative influence parameter corresponding to the current characteristic field.
3. The method according to claim 1, wherein the determining the matching result corresponding to the current feature field according to the field similarity of the current feature field and the corresponding preset field similarity threshold includes:
for each of the feature fields:
if the field similarity of the current characteristic field is detected to be greater than or equal to a corresponding preset field similarity threshold value, determining that the matching result is field matching;
and if the field similarity of the current characteristic field is detected to be smaller than the corresponding preset field similarity threshold value, determining that the matching result is field mismatch.
4. The method of claim 1, wherein the determining a matching attribute value corresponding to the feature field based on a field similarity of the feature field and a corresponding forward influencing parameter comprises:
and multiplying the field similarity with the characteristic field and the corresponding forward influence parameter to obtain the matching attribute value.
5. The method according to claim 1, wherein determining the data matching attribute value corresponding to the current pending data set according to the matching attribute values of the two pieces of pending data included in the current pending data set under each of the feature fields includes:
and determining an average matching attribute value corresponding to each matching attribute value, and taking the average matching attribute value as a data matching attribute value corresponding to the data set to be processed.
6. The method as recited in claim 1, further comprising:
and determining a target data processing mode corresponding to the data set to be processed based on the data matching result, so as to process the data to be processed included in the data set to be processed based on the target data processing mode.
7. A data matching apparatus, comprising:
the data group dividing module is used for acquiring a plurality of pieces of data to be processed and dividing the plurality of pieces of data to be processed into at least one data group to be processed according to a preset rule; each data group to be processed comprises two pieces of data to be processed, and the data to be processed comprises data content corresponding to each characteristic field;
the field similarity determining module is used for determining the field similarity of the data content corresponding to the two pieces of data to be processed, which are included in each data set to be processed, under the same characteristic field;
the matching attribute value determining module is used for determining a matching result corresponding to the current characteristic field according to the field similarity of the current characteristic field and a corresponding preset field similarity threshold value for each characteristic field, and determining a matching attribute value corresponding to the current characteristic field based on the matching result and a corresponding similarity influence parameter; wherein the similarity-affecting parameter comprises a positive-affecting parameter or a negative-affecting parameter;
the data matching attribute value determining module is used for determining, for each data set to be processed, a data matching attribute value corresponding to the current data set to be processed according to the matching attribute values of two pieces of data to be processed, included in the current data set to be processed, under each characteristic field, so as to determine a data matching result corresponding to the current data set to be processed based on the data matching attribute values;
Wherein, the matching attribute value determining module includes: a first determining unit for the matching attribute value and a second determining unit for the matching attribute value;
the first determining unit of the matching attribute value is used for determining the matching attribute value corresponding to the characteristic field based on the field similarity of the characteristic field and the corresponding forward direction influence parameter if the matching result is field matching;
and the second determining unit of the matching attribute value is used for taking the negative influence parameter corresponding to the characteristic field as the matching attribute value corresponding to the characteristic field if the matching result is that the fields are not matched.
8. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data matching method of any one of claims 1-6.
9. A computer readable storage medium storing computer instructions for causing a processor to perform the data matching method of any one of claims 1-6.
CN202310348655.7A 2023-04-03 2023-04-03 Data matching method and device, electronic equipment and storage medium Pending CN116383498A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310348655.7A CN116383498A (en) 2023-04-03 2023-04-03 Data matching method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310348655.7A CN116383498A (en) 2023-04-03 2023-04-03 Data matching method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116383498A true CN116383498A (en) 2023-07-04

Family

ID=86962910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310348655.7A Pending CN116383498A (en) 2023-04-03 2023-04-03 Data matching method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116383498A (en)

Similar Documents

Publication Publication Date Title
CN116167352B (en) Data processing method, device, electronic equipment and storage medium
CN116010220A (en) Alarm diagnosis method, device, equipment and storage medium
CN113963197A (en) Image recognition method and device, electronic equipment and readable storage medium
CN115329748B (en) Log analysis method, device, equipment and storage medium
CN116796085A (en) File processing method and device, electronic equipment and storage medium
CN116089985A (en) Encryption storage method, device, equipment and medium for distributed log
CN116383498A (en) Data matching method and device, electronic equipment and storage medium
CN115080003A (en) Zero code platform operation method and device, electronic equipment and storage medium
CN116185765B (en) Alarm processing method and device, electronic equipment and storage medium
CN117395071B (en) Abnormality detection method, abnormality detection device, abnormality detection equipment and storage medium
CN112948246B (en) AB test control method, device and equipment of data platform and storage medium
CN113591088B (en) Identification recognition method and device and electronic equipment
CN117610512A (en) Method, device and equipment for removing header and footer and storage medium
CN116664427A (en) Image processing method, device, equipment and storage medium
CN116975653A (en) Sample information determining method and device, electronic equipment and storage medium
CN117609237A (en) Data table field attribute determining method, device, equipment and storage medium
CN117743575A (en) Work order data processing method, device, equipment and medium
CN117669570A (en) Information matching method and device, electronic equipment and storage medium
CN116127394A (en) Log abnormality detection method and device, electronic equipment and storage medium
CN115794830A (en) Data value determination method and device, electronic equipment and storage medium
CN118012936A (en) Data extraction method, device, equipment and storage medium
CN118113916A (en) Data processing method, device, electronic equipment and storage medium
CN115630068A (en) Abnormal data table determining method, device, equipment and storage medium
CN117194471A (en) Data blood edge analysis method, device, medium, electronic equipment and product
CN116777943A (en) Application method, device and medium for image processing network model parameter clipping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination