CN116070601B - Data splicing method and device, electronic equipment and storage medium - Google Patents

Data splicing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116070601B
CN116070601B CN202310308456.3A CN202310308456A CN116070601B CN 116070601 B CN116070601 B CN 116070601B CN 202310308456 A CN202310308456 A CN 202310308456A CN 116070601 B CN116070601 B CN 116070601B
Authority
CN
China
Prior art keywords
data
sub
processed
splicing
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310308456.3A
Other languages
Chinese (zh)
Other versions
CN116070601A (en
Inventor
李登高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianren Healthcare Big Data Technology Co Ltd
Original Assignee
Lianren Healthcare Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianren Healthcare Big Data Technology Co Ltd filed Critical Lianren Healthcare Big Data Technology Co Ltd
Priority to CN202310308456.3A priority Critical patent/CN116070601B/en
Publication of CN116070601A publication Critical patent/CN116070601A/en
Application granted granted Critical
Publication of CN116070601B publication Critical patent/CN116070601B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data splicing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring at least one group of data to be processed corresponding to a target user; determining splicing attribute values corresponding to all sub-data to be processed under corresponding standardized elements; determining target splicing to-be-processed sub-data corresponding to the target splicing attribute values based on the splicing attribute values; and for each piece of sub-data to be processed, which is different from the target splicing sub-data to be processed, splicing the current sub-data to be processed with the target splicing sub-data to be processed, updating the target splicing sub-data to be processed according to the splicing result, and processing the next piece of sub-data to be processed of the current sub-data to be processed based on the updated target splicing sub-data to be processed until the current sub-data to be processed is the last piece of sub-data to be processed, thereby obtaining target data corresponding to a target user. According to the technical scheme, the effect of automatically splicing the sub-data of the plurality of data sources is achieved.

Description

Data splicing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data management technologies, and in particular, to a data splicing method, a data splicing device, an electronic device, and a storage medium.
Background
With the rapid development of medical health informatization, medical institutions construct corresponding application systems based on various business requirements, and large-scale medical institutions even construct hundreds of application systems. Even if the user visits the same medical institution, a plurality of different systems are used in the whole treatment process, and each system only stores the data related to the corresponding treatment link. In general, a user for medical treatment selects different medical institutions for medical treatment, which results in that data of the same user for medical treatment are scattered in various systems of each medical institution, so that in the process of medical data management, data fragments from multiple sources are spliced and restored into complete information to assist in mining and using data values.
At present, in the prior art, data fragments from a plurality of sources are generally collected together, and the data fragments are screened and spliced based on manual work, so that the problems of low working efficiency, low data fragment splicing accuracy and the like exist.
Disclosure of Invention
The invention provides a data splicing method, a device, electronic equipment and a storage medium, which are used for realizing the effect of automatically splicing sub-data of a plurality of data sources, reducing the manual participation rate in the data treatment process, improving the data treatment efficiency and being beneficial to realizing the mining and application of data value.
According to an aspect of the present invention, there is provided a data splicing method, including:
acquiring at least one group of data to be processed corresponding to a target user; the data to be processed comprises at least one piece of sub data to be processed, wherein the sub data to be processed is data with missing information;
determining a splicing attribute value corresponding to each sub-data to be processed under the corresponding standardized element;
determining target splicing to-be-processed sub-data corresponding to the target splicing attribute values based on the splicing attribute values;
and for each piece of sub-data to be processed, which is different from the target splicing sub-data to be processed, splicing the current piece of sub-data to be processed with the target splicing sub-data to be processed, updating the target splicing sub-data to be processed according to a splicing result, and processing the next piece of sub-data to be processed of the current piece of sub-data to be processed based on the updated target splicing sub-data to be processed until the current piece of sub-data to be processed is the last piece of sub-data to be processed, thereby obtaining target data corresponding to the target user.
According to another aspect of the present invention, there is provided a data splicing apparatus comprising:
The data to be processed acquisition module is used for acquiring at least one group of data to be processed corresponding to the target user; the data to be processed comprises at least one piece of sub data to be processed, wherein the sub data to be processed is data with missing information;
the splicing attribute value determining module is used for determining splicing attribute values corresponding to all the sub-data to be processed under the corresponding standardized elements;
the target splicing pending sub-data determining module is used for determining target splicing pending sub-data corresponding to the target splicing attribute values based on the splicing attribute values;
and the target data determining module is used for splicing the current sub-data to be processed with the target splicing sub-data for each sub-data to be processed, which is different from the target splicing sub-data, and updating the target splicing sub-data to be processed according to the splicing result so as to process the next sub-data to be processed of the current sub-data to be processed based on the updated target splicing sub-data to be processed until the current sub-data to be processed is the last sub-data to be processed, thereby obtaining target data corresponding to the target user.
According to another aspect of the present invention, there is provided an electronic apparatus including:
At least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data stitching method of any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a data stitching method according to any embodiment of the present invention.
According to the technical scheme, at least one group of to-be-processed data corresponding to a target user is obtained, then, the splicing attribute value corresponding to each to-be-processed sub-data under the corresponding standardized element is determined, further, the target splicing to-be-processed sub-data corresponding to the target splicing attribute value is determined based on each splicing attribute value, finally, the to-be-processed sub-data different from the target splicing to-be-processed sub-data are subjected to splicing processing, the target splicing to-be-processed sub-data are updated according to the splicing result, the next to-be-processed sub-data of the current to-be-processed sub-data is processed based on the updated target splicing to-be-processed sub-data until the current to-be-processed sub-data is the last to-be-processed sub-data, and the target data corresponding to the target user is obtained.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a data splicing method according to a first embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data splicing device according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device implementing a data splicing method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data splicing method according to a first embodiment of the present invention, where the method may be applied to a situation of splicing a set of sub-data to be processed corresponding to a target user into complete target data, and the method may be performed by a data splicing device, where the data splicing device may be implemented in a form of hardware and/or software, and the data splicing device may be configured in a terminal and/or a server. As shown in fig. 1, the method includes:
S110, at least one group of data to be processed corresponding to the target user is obtained.
In this embodiment, the target user may be a user whose user data is stored in at least one service system in a dispersed manner, and at the same time, the data stored in each service system is data in which there is lack of information. The data to be processed may be raw data that is not subject to data governance. The data to be processed comprises at least one piece of sub data to be processed, and the sub data to be processed is data with lacking information. The sub-data to be processed may be data which is stored in the corresponding service system by the target user and is involved in the corresponding service link.
In the actual application process, due to the different service demands corresponding to different service systems, when data storage is performed, each service system stores data generated in a corresponding link, and when a target user executes a service flow in a service organization in a different area, service data associated with the target user is stored in each service system in each service organization in a scattered manner. In the data management process, when the complete business data of the target user is to be subjected to data analysis to determine the overall situation of the target user, the sub-data to be processed, which are associated with the target user and stored in different business systems, can be acquired, and the data to be processed is constructed based on the sub-data to be processed.
The technical scheme provided by the embodiment of the invention can be applied to the scene of the target user in the medical institution, for example, when the target user is in the same medical institution, links such as registration, inquiry, examination, assay and payment are possibly included, each link is provided with a corresponding service system, each set of service system only stores data related to the link, the medical data associated with the target user is stored in different service systems in a scattered manner, and at the moment, if the overall medical condition of the target user needs to be analyzed, the sub-data to be processed stored in different service systems can be summarized to obtain a group of sub-data to be processed corresponding to the target user.
S120, determining splicing attribute values corresponding to all sub-data to be processed under corresponding standardized elements.
In this embodiment, the normalization element may be a preset field having a distinct data feature. Alternatively, the standardized elements may include space, time, subjects, objects, resources, and the like. Wherein the space may be a specific geographic location, and may be exemplified by XX medical institution XX department in XX region XX of XX city; the subject may be a user performing an item, and illustratively, may be a consultation user, pharmacist or laboratory staff, etc.; the object may be a moving object with respect to the subject, and may be, illustratively, a visiting user; the resource may be a resource employed in executing any item.
In this embodiment, the splicing attribute value may be a numerical value for characterizing a splicing order of each sub-data to be processed when splicing is performed. Illustratively, the splice attribute value may be an integrity score corresponding to the sub-data to be processed, and may be used to characterize the information integrity of each sub-data to be processed.
It should be noted that, determining the splicing attribute value of each sub-data to be processed under the corresponding standardized element can be understood as: different sub-data to be processed may include different standardized elements, and thus, when determining the splicing attribute value, the splicing attribute value corresponding to the corresponding sub-data to be processed may be determined according to the standardized element included in each sub-data to be processed.
In the actual application process, different weight values can be set for each standardized element in advance, when at least one piece of sub-data to be processed corresponding to a target user is obtained, whether the current sub-data to be processed comprises the standardized element or not can be determined first for each piece of sub-data to be processed, and when the current sub-data to be processed comprises at least one standardized element is detected, the weight corresponding to each standardized element is determined, so that the splicing attribute value corresponding to the current sub-data to be processed is determined based on each standardized element and the corresponding weight value.
Optionally, determining the splicing attribute value corresponding to each sub-data to be processed under the corresponding standardized element includes: determining at least one standardized element included in the current sub-data to be processed and a weight value corresponding to each standardized element aiming at each sub-data to be processed; and determining a splicing attribute value corresponding to the corresponding sub-data to be processed based on the at least one standardized element and the corresponding weight value.
In this embodiment, weight values corresponding to the standardized elements may be predetermined, and these weight values may also be modified based on the splicing success rate in the subsequent splicing process.
In a specific implementation, for each sub-data to be processed, whether the current sub-data to be processed includes a standardized element or not may be detected based on a preset standardized element detection criterion, and when it is detected that the current sub-data to be processed includes at least one standardized element, a weight value corresponding to each standardized element included in the current sub-data to be processed may be determined according to a mapping relationship between the standardized element and the weight value, and further, a splicing attribute value corresponding to the current sub-data to be processed may be determined according to at least one standardized element and the corresponding weight value.
In the practical application process, there may be cases that standardized elements are included in the sub-data to be processed, but the number of fields of the standardized elements in different sub-data to be processed is different, at this time, for the standardized elements including different numbers of fields, the corresponding information integrity is different, and further, the corresponding splicing attribute values should also be different, so when determining the splicing attribute value corresponding to the sub-data to be processed, the determining may also be performed according to the number of fields of the standardized elements in the sub-data to be processed.
Optionally, determining, based on at least one standardized element and the corresponding weight value, a splice attribute value corresponding to the corresponding sub-data to be processed includes: for each standardized element, determining a data segment of the current standardized element in the corresponding data to be processed, and determining a standard attribute value corresponding to the current standardized element based on field matching degree between the data segment and preset data information corresponding to the current standardized element; multiplying each standard attribute value by a corresponding weight value to obtain at least one attribute value to be processed; and adding all the attribute values to be processed to obtain a spliced attribute value.
In this embodiment, the data segment may be a portion of data occupied by the normalization element in the corresponding sub-data to be processed. The preset data information may be preset complete data information corresponding to the standardized element. For example, when the standardized element is a space, the corresponding preset data information may be XX family of XX medical institutions in XX region XX of XX city of XX province. The field matching degree may be a ratio between the number of field matches between the data fragment and the preset data information and the total number of fields of the preset data information.
In the actual application process, when at least one standardized element included in the sub-data to be processed is determined, for each standardized element, the current standardized element in the sub-data to be processed can be positioned according to a preset standardized element identification method to determine a data segment of the current standardized element in the sub-data to be processed, then, preset data information corresponding to the current standardized element is determined, field matching is performed on the data segment and the preset data information, field matching degree is determined according to the number of matching fields and the total field number of the preset data information, and a standard attribute value corresponding to the current standardized element is determined according to a mapping relation between the field matching degree and the standard attribute value which is constructed in advance.
S130, determining target splicing pending sub-data corresponding to the target splicing attribute values based on the splicing attribute values.
In this embodiment, the target splicing attribute value may be a splicing attribute value that satisfies a preset attribute value screening condition. The preset attribute value screening conditions can be preset and meet the screening conditions required by the user. The preset attribute value screening condition may be any screening condition, and optionally, may be the highest value of the selected splicing attribute values. Correspondingly, the sub-data to be processed corresponding to the target splicing attribute value is the target splicing sub-data to be processed.
In the practical application process, when data splicing is performed based on each piece of sub-data to be processed, one piece of target splicing sub-data to be processed can be selected from a plurality of pieces of sub-data to be processed, and the other pieces of sub-data to be processed and the target splicing sub-data to be processed are spliced, so that complete data corresponding to a target user can be determined. In general, the higher the splicing attribute value corresponding to the sub-data to be processed, the higher the information integrity of the sub-data to be processed can be described, so that when the target sub-data to be processed is selected, the target sub-data to be processed can be determined according to each splicing attribute value.
Optionally, determining, based on each splicing attribute value, target splicing pending sub-data corresponding to the target splicing attribute value includes: and taking the splicing attribute value corresponding to the highest value in the splicing attribute values as a target splicing attribute value, and taking the sub-data to be processed corresponding to the target splicing attribute value as target splicing sub-data to be processed.
In a specific implementation, after each splicing attribute value is obtained, the splicing attribute values are sequenced in a sequence from high to low, the splicing attribute value corresponding to the highest value is used as a target splicing attribute value, and meanwhile, sub-data to be processed corresponding to the target splicing attribute value is used as target splicing sub-data to be processed.
And S140, for each piece of sub-data to be processed, which is different from the target splicing sub-data to be processed, splicing the current piece of sub-data to be processed with the target splicing sub-data to be processed, updating the target splicing sub-data to be processed according to the splicing result, and processing the next piece of sub-data to be processed of the current piece of sub-data to be processed based on the updated target splicing sub-data to be processed until the current piece of sub-data to be processed is the last piece of sub-data to be processed, thereby obtaining the target data corresponding to the target user.
In this embodiment, after determining the target splicing pending sub-data, other pending sub-data different from the target splicing pending sub-data may be spliced with the target splicing pending sub-data, and when determining the splicing order in which each pending sub-data is spliced with the target splicing pending sub-data, the determining may also be performed according to the splicing attribute value of each pending sub-data, specifically, the splicing attribute values of each pending sub-data may be arranged in order from high to low, and this order is taken as the splicing order. The target data may be data after stitching all the sub-data to be processed associated with the target user, i.e. data after summarizing all the data stored by the target user in the respective business system.
In the actual application process, for each piece of sub-data to be processed, which is different from the target piece of sub-data to be processed, determining current piece of sub-data to be processed according to a predetermined splicing sequence, performing splicing processing on the current piece of sub-data to be processed and the target piece of sub-data to be processed, determining a splicing result, and processing the target piece of sub-data to be processed according to the splicing result. The splicing result may include two cases that can be spliced or not. When determining the splicing result, the method can determine according to the consistency between the current sub-data to be processed and the target sub-data to be spliced, and the higher the consistency is, the higher the similarity of data information included in the two pieces of sub-data to be processed is, and the consistency can be determined according to the field overlapping degree of the two pieces of data under the same standardized element, so when determining the splicing result of the current sub-data to be processed and the target sub-data to be spliced, the method can determine according to the field overlapping degree of the two pieces of sub-data to be processed under the same standardized element.
Optionally, splicing the current sub-data to be processed with the target sub-data to be spliced, and updating the target sub-data to be spliced according to the splicing result, including: determining the field matching quantity of the sub-data to be processed and the target splicing sub-data to be processed under the corresponding standardized elements so as to determine the coincidence degree based on the field matching quantity; and determining a splicing result based on the coincidence ratio so as to update the target splicing pending sub-data based on the splicing result.
In this embodiment, a corresponding element identifier may be set for each standardized element in advance, when determining the target splicing pending sub-data, the element identifier included in the target splicing pending sub-data may be determined, when receiving the pending sub-data spliced with the target splicing pending sub-data, the element identifier included in the pending sub-data may be first determined to determine, based on the element identifier, at least one standardized element included in the pending sub-data and the target splicing pending sub-data together, then, for each standardized element, the number of field matches of the two pieces of pending sub-data under the current standardized element may be determined according to the data segment of the current standardized element in the pending sub-data and the data segment of the current standardized element in the target splicing pending sub-data, and then, a ratio between the number of field matches and the total number of fields of the current standardized element in the target splicing pending sub-data may be determined, and this ratio may be used as a coincidence corresponding to the current standardized element, and then, the current coincidence corresponding to each standardized element may be added to determine, based on the target splicing result, the pending sub-data may be determined, and the target splicing pending sub-data may be processed based on the coincidence.
Optionally, determining the splicing result based on the overlap ratio to update the target splicing pending sub-data based on the splicing result includes: if the detected overlap ratio is greater than or equal to a preset overlap ratio threshold value, the splicing result is that splicing is possible, the data segments of the sub-data to be processed under the corresponding standardized elements are updated to the data segments corresponding to the same standardized elements in the target splicing sub-data, and the rest data segments in the sub-data to be processed are updated to the target splicing sub-data to be processed, so that the next sub-data to be processed of the current sub-data to be processed is processed based on the updated target splicing sub-data to be processed.
In this embodiment, the preset overlap ratio threshold may be a preset value for determining the overlap ratio between any two pieces of sub-data to be processed. The preset overlap threshold may be any value, and optionally, may be 90%. The corresponding standardized element may be a standardized element included in both the sub-data to be processed and the sub-data to be processed for target splicing. The remaining data segments may be other respective data segments that are distinct from the data segments under the corresponding standardized elements.
In the practical application process, after determining the overlap ratio between the sub-data to be processed and the sub-data to be processed of the target splicing, when detecting that the overlap ratio is greater than or equal to a preset overlap ratio threshold, the sub-data to be processed and the sub-data to be processed of the target splicing can be indicated to be in a state of being spliced, further, data fragments under at least one standardized element included in the sub-data to be processed and the sub-data to be processed of the target splicing can be determined, and each data fragment is updated to a data fragment corresponding to the same standardized element of the sub-data to be processed of the target splicing, namely, the data fragments of the sub-data to be processed and the sub-data to be processed of the target splicing are combined, and meanwhile, the rest data fragments in the data to be processed are updated to the sub-data to be processed of the target splicing, so that the next sub-data to be processed of the current sub-data to be processed can be processed based on the updated sub-data to be processed of the target splicing.
It should be noted that if the splicing result of the current sub-data to be processed and the target splicing sub-data to be processed is that the current sub-data to be processed and the target splicing sub-data to be processed can be spliced, further, when the next sub-data to be processed of the current sub-data to be processed is received, the next sub-data to be processed can be used as the current sub-data to be processed and spliced with the combined target splicing sub-data to be processed, and then, the splicing result is determined, so that the target splicing sub-data to be processed in the splicing process is updated based on the splicing result until the current sub-data to be processed is the last sub-data to be processed, and thus the target data corresponding to the target user can be obtained.
In the actual application process, when the contact ratio is smaller than the preset contact ratio threshold value, the situation that the current sub-data to be processed and the target splicing sub-data to be processed are different sub-data to be processed can be indicated, at this time, the current sub-data to be processed can be used as another target splicing sub-data to be processed, so that when the next sub-data to be processed of the current sub-data to be processed is received, the next sub-data to be processed can be spliced with the target splicing sub-data respectively.
On the basis of the technical schemes, the method further comprises the following steps: if the detected overlap ratio is smaller than the preset overlap ratio threshold value, the splicing result is that splicing is impossible, and the current sub-data to be processed is updated into a data table to which the target splicing sub-data belongs, so that the next sub-data to be processed of the current sub-data to be processed is processed based on each target splicing sub-data included in the data table.
In the practical application process, after determining the coincidence ratio between the current sub-data to be processed and the target splicing pending sub-data, if the coincidence ratio is detected to be smaller than the preset coincidence ratio threshold value, it can be stated that the current sub-data to be processed and the target splicing pending sub-data can be different sub-data to be processed, at this time, the current sub-data to be processed can be updated into a data table to which the target splicing pending sub-data belongs to serve as another target splicing pending sub-data parallel to the target splicing pending sub-data, further, when receiving the next sub-data to be processed of the current sub-data to be processed, the next sub-data to be processed can be spliced with each target splicing pending sub-data included in the data table respectively, and repeating the coincidence ratio determining step until the current sub-data to be the last piece of sub-data to be processed.
If the target splicing waiting sub-data is an event a, if the contact ratio between the current waiting sub-data and the event a is smaller than a preset contact ratio threshold, the current waiting sub-data can be used as an event B, and when the next waiting sub-data to which the current waiting sub-data is received, the current waiting sub-data is spliced with the event a first time, whether the contact ratio exceeds the preset contact ratio threshold is determined, if yes, the waiting sub-data can be updated into the event a, if not, the current waiting sub-data is spliced with the event B, whether the contact ratio exceeds the preset contact ratio threshold is determined, if yes, the waiting sub-data can be updated into the event B, if not, the current waiting sub-data can be used as an event C, and iterating for a plurality of times until the current waiting sub-data is the last waiting sub-data, namely, the data splicing process can be completed, and the target data is output.
It should be noted that, when the sub-data to be processed is spliced with the target splicing sub-data to be processed, the splicing sequence is determined according to the splicing attribute value corresponding to each sub-data to be processed, so that for the sub-data to be processed which does not include the standardized element, the last several bits in the splicing sequence may be located, and when the sub-data to be processed which does not include the standardized element is received, the specific position of the sub-data to be processed in the target splicing sub-data to be processed may be determined according to the field overlapping amount between the sub-data to be processed and the target splicing sub-data to be processed.
According to the technical scheme, at least one group of to-be-processed data corresponding to a target user is obtained, then, the splicing attribute value corresponding to each to-be-processed sub-data under the corresponding standardized element is determined, further, the target splicing to-be-processed sub-data corresponding to the target splicing attribute value is determined based on each splicing attribute value, finally, the to-be-processed sub-data different from the target splicing to-be-processed sub-data are subjected to splicing processing, the target splicing to-be-processed sub-data are updated according to the splicing result, the next to-be-processed sub-data of the current to-be-processed sub-data is processed based on the updated target splicing to-be-processed sub-data until the current to-be-processed sub-data is the last to-be-processed sub-data, and the target data corresponding to the target user is obtained.
Example two
Fig. 2 is a schematic structural diagram of a data splicing device according to a second embodiment of the present invention. As shown in fig. 2, the apparatus includes: the system comprises a to-be-processed data acquisition module 210, a splicing attribute value determination module 220, a target splicing to-be-processed sub-data determination module 230 and a target data determination module 240.
The data to be processed obtaining module 210 is configured to obtain at least one group of data to be processed corresponding to the target user; the data to be processed comprises at least one piece of sub data to be processed, wherein the sub data to be processed is data with missing information;
a splicing attribute value determining module 220, configured to determine a splicing attribute value corresponding to each sub-data to be processed under a corresponding standardized element;
the target splicing pending sub-data determining module 230 is configured to determine target splicing pending sub-data corresponding to the target splicing attribute values based on the splicing attribute values;
the target data determining module 240 is configured to splice the current sub-data to the target splicing pending sub-data for each sub-data to be processed, and update the target splicing pending sub-data according to the splicing result, so as to process the next sub-data to be processed of the current sub-data to be processed based on the updated target splicing pending sub-data until the current sub-data to be processed is the last sub-data to be processed, thereby obtaining target data corresponding to the target user.
According to the technical scheme, at least one group of to-be-processed data corresponding to a target user is obtained, then, the splicing attribute value corresponding to each to-be-processed sub-data under the corresponding standardized element is determined, further, the target splicing to-be-processed sub-data corresponding to the target splicing attribute value is determined based on each splicing attribute value, finally, the to-be-processed sub-data different from the target splicing to-be-processed sub-data are subjected to splicing processing, the target splicing to-be-processed sub-data are updated according to the splicing result, the next to-be-processed sub-data of the current to-be-processed sub-data is processed based on the updated target splicing to-be-processed sub-data until the current to-be-processed sub-data is the last to-be-processed sub-data, and the target data corresponding to the target user is obtained.
Optionally, the splice attribute value determining module 220 includes: and a standardized element determining unit and a splicing attribute value determining unit.
A standardized element determining unit configured to determine, for each of the sub-data to be processed, at least one standardized element included in the current sub-data to be processed, and a weight value corresponding to each of the standardized elements;
and the splicing attribute value determining unit is used for determining the splicing attribute value corresponding to the corresponding sub-data to be processed based on the at least one standardized element and the corresponding weight value.
Optionally, the splice attribute value determining unit includes: the device comprises a standard attribute value determining subunit, a to-be-processed attribute value determining subunit and a splicing attribute value determining subunit.
A standard attribute value determining subunit, configured to determine, for each standardized element, a data segment of a current standardized element in the sub-data to be processed, and determine a standard attribute value corresponding to the current standardized element based on a field matching degree between the data segment and preset data information corresponding to the current standardized element;
the attribute value to be processed determining subunit is used for multiplying each standard attribute value by a corresponding weight value to obtain at least one attribute value to be processed;
And the splicing attribute value determining subunit is used for adding the attribute values to be processed to obtain the splicing attribute value.
Optionally, the target splicing pending sub-data determining module 230 is specifically configured to take a splicing attribute value corresponding to a highest value in the splicing attribute values as a target splicing attribute value, and take pending sub-data corresponding to the target splicing attribute value as target splicing pending sub-data.
Optionally, the target data determining module 240 includes: and the overlap ratio determining unit and the splicing result determining unit.
The overlap ratio determining unit is used for determining the field matching quantity of the sub-data to be processed and the target splicing sub-data under the corresponding standardized elements so as to determine the overlap ratio based on the field matching quantity;
and the splicing result determining unit is used for determining a splicing result based on the coincidence ratio so as to update the target splicing sub-data to be processed based on the splicing result.
Optionally, the splicing result determining unit is specifically configured to, if the detected overlap ratio is greater than or equal to a preset overlap ratio threshold, determine that the splicing result is that the sub-data to be processed can be spliced, update a data segment of the sub-data to be processed under a corresponding standardized element to a data segment corresponding to the same standardized element in the sub-data to be processed of the target splicing, and update a remaining data segment in the sub-data to be processed to the sub-data to be processed of the target splicing, so as to process the sub-data to be processed next to the current sub-data to be processed based on the updated sub-data to be processed of the target splicing.
Optionally, the apparatus further includes: and a data table updating module.
And the data table updating module is used for determining that the splicing result is non-splicing if the contact ratio is detected to be smaller than a preset contact ratio threshold value, and updating the current sub-data to be processed into a data table to which the target sub-data to be processed belongs so as to process the next sub-data to be processed of the current sub-data to be processed based on each target sub-data to be processed included in the data table.
The data splicing device provided by the embodiment of the invention can execute the data splicing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example III
Fig. 3 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data stitching method.
In some embodiments, the data stitching method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data stitching method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data stitching method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of data stitching, comprising:
acquiring at least one group of data to be processed corresponding to a target user; the data to be processed comprises at least one piece of sub data to be processed, wherein the sub data to be processed is data with missing information;
determining a splicing attribute value corresponding to each sub-data to be processed under the corresponding standardized element; the standardized element is a preset field with obvious data characteristics; the splicing attribute value is used for representing the splicing sequence of each sub-data to be processed when being spliced;
Determining target splicing to-be-processed sub-data corresponding to the target splicing attribute values based on the splicing attribute values; the target splicing attribute value is a splicing attribute value meeting a preset attribute value screening condition;
and for each piece of sub-data to be processed, which is different from the target splicing sub-data to be processed, splicing the current piece of sub-data to be processed with the target splicing sub-data to be processed, updating the target splicing sub-data to be processed according to a splicing result, and processing the next piece of sub-data to be processed of the current piece of sub-data to be processed based on the updated target splicing sub-data to be processed until the current piece of sub-data to be processed is the last piece of sub-data to be processed, thereby obtaining target data corresponding to the target user.
2. The method according to claim 1, wherein determining the splice attribute value corresponding to each sub-data to be processed under the corresponding standardized element includes:
determining at least one standardized element included in the current sub-data to be processed and a weight value corresponding to each standardized element for each sub-data to be processed;
and determining a splicing attribute value corresponding to the corresponding sub-data to be processed based on the at least one standardized element and the corresponding weight value.
3. The method of claim 2, wherein determining splice attribute values corresponding to respective sub-data to be processed based on the at least one normalization element and respective weight values comprises:
for each standardized element, determining a data segment of the current standardized element in the sub-data to be processed, and determining a standard attribute value corresponding to the current standardized element based on field matching degree between the data segment and preset data information corresponding to the current standardized element;
multiplying each standard attribute value by a corresponding weight value to obtain at least one attribute value to be processed;
and adding the attribute values to be processed to obtain the splicing attribute value.
4. The method of claim 1, wherein determining the target splice pending sub-data corresponding to the target splice attribute value based on each splice attribute value comprises:
and taking the splicing attribute value corresponding to the highest value in the splicing attribute values as a target splicing attribute value, and taking the sub-data to be processed corresponding to the target splicing attribute value as target splicing sub-data to be processed.
5. The method of claim 1, wherein updating the target splice pending sub-data based on the splice result comprises:
determining field matching quantity of the sub-data to be processed and the target splicing sub-data to be processed under corresponding standardized elements, so as to determine the coincidence degree based on the field matching quantity;
and determining a splicing result based on the overlapping ratio so as to update the target splicing waiting sub-data based on the splicing result.
6. The method of claim 5, wherein determining a splice result based on the overlap ratio to update the target splice pending sub-data based on the splice result comprises:
if the detected overlap ratio is greater than or equal to a preset overlap ratio threshold, the splicing result is that splicing is possible, the data segments of the sub-data to be processed under the corresponding standardized elements are updated to the data segments corresponding to the same standardized elements in the target splicing sub-data to be processed, and the rest data segments in the sub-data to be processed are updated to the target splicing sub-data to be processed, so that the next sub-data to be processed of the current sub-data to be processed is processed based on the updated target splicing sub-data to be processed.
7. The method as recited in claim 5, further comprising:
if the contact ratio is detected to be smaller than a preset contact ratio threshold value, the splicing result is that splicing is impossible, and the current sub-data to be processed is updated into a data table to which the target sub-data to be processed belongs, so that the next sub-data to be processed of the current sub-data to be processed is processed based on each target sub-data to be processed included in the data table.
8. A data stitching device, comprising:
the data to be processed acquisition module is used for acquiring at least one group of data to be processed corresponding to the target user; the data to be processed comprises at least one piece of sub data to be processed, wherein the sub data to be processed is data with missing information;
the splicing attribute value determining module is used for determining splicing attribute values corresponding to all the sub-data to be processed under the corresponding standardized elements; the standardized element is a preset field with obvious data characteristics; the splicing attribute value is used for representing the splicing sequence of each sub-data to be processed when being spliced;
the target splicing pending sub-data determining module is used for determining target splicing pending sub-data corresponding to the target splicing attribute values based on the splicing attribute values; the target splicing attribute value is a splicing attribute value meeting a preset attribute value screening condition;
And the target data determining module is used for splicing the current sub-data to be processed with the target splicing sub-data for each sub-data to be processed, which is different from the target splicing sub-data, and updating the target splicing sub-data to be processed according to the splicing result so as to process the next sub-data to be processed of the current sub-data to be processed based on the updated target splicing sub-data to be processed until the current sub-data to be processed is the last sub-data to be processed, thereby obtaining target data corresponding to the target user.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data stitching method of any of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the data stitching method of any one of claims 1-7.
CN202310308456.3A 2023-03-28 2023-03-28 Data splicing method and device, electronic equipment and storage medium Active CN116070601B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310308456.3A CN116070601B (en) 2023-03-28 2023-03-28 Data splicing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310308456.3A CN116070601B (en) 2023-03-28 2023-03-28 Data splicing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116070601A CN116070601A (en) 2023-05-05
CN116070601B true CN116070601B (en) 2023-06-13

Family

ID=86178774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310308456.3A Active CN116070601B (en) 2023-03-28 2023-03-28 Data splicing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116070601B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010129599A1 (en) * 2009-05-04 2010-11-11 Oblong Industries, Inc. Gesture-based control systems including the representation, manipulation, and exchange of data
CN110895534A (en) * 2018-08-24 2020-03-20 北京京东尚科信息技术有限公司 Data splicing method, device, medium and electronic equipment
WO2022089652A1 (en) * 2020-11-02 2022-05-05 第四范式(北京)技术有限公司 Method and system for processing data tables and automatically training machine learning model

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6055552A (en) * 1997-10-31 2000-04-25 Hewlett Packard Company Data recording apparatus featuring spatial coordinate data merged with sequentially significant command data
US20150143210A1 (en) * 2013-11-18 2015-05-21 Microsoft Corporation Content Stitching Templates
CN112817965B (en) * 2019-11-18 2023-10-17 百度在线网络技术(北京)有限公司 Data splicing method and device, electronic equipment and storage medium
CN110955661B (en) * 2019-11-29 2023-03-21 北京明略软件***有限公司 Data fusion method and device, readable storage medium and electronic equipment
CN112115138A (en) * 2020-08-19 2020-12-22 第四范式(北京)技术有限公司 Method, device and equipment for determining association relation between data tables
CN113312890B (en) * 2021-06-16 2024-04-12 第四范式(北京)技术有限公司 Multi-table splicing method and device, electronic equipment and storage medium
CN113704342A (en) * 2021-07-30 2021-11-26 济南浪潮数据技术有限公司 Method, system, equipment and storage medium for trace accompanying analysis
CN114416695A (en) * 2022-01-19 2022-04-29 中国平安人寿保险股份有限公司 Data splicing function migration method and device, computer equipment and storage medium
CN114254008B (en) * 2022-03-01 2022-05-24 广东电网有限责任公司惠州供电局 Information generation method and device and electronic equipment
CN115168362A (en) * 2022-07-25 2022-10-11 抖音视界有限公司 Data processing method and device, readable medium and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010129599A1 (en) * 2009-05-04 2010-11-11 Oblong Industries, Inc. Gesture-based control systems including the representation, manipulation, and exchange of data
CN110895534A (en) * 2018-08-24 2020-03-20 北京京东尚科信息技术有限公司 Data splicing method, device, medium and electronic equipment
WO2022089652A1 (en) * 2020-11-02 2022-05-05 第四范式(北京)技术有限公司 Method and system for processing data tables and automatically training machine learning model

Also Published As

Publication number Publication date
CN116070601A (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN115242731A (en) Message processing method, device, equipment and storage medium
CN115422924A (en) Information matching method and device, electronic equipment and storage medium
CN115687406B (en) Sampling method, device, equipment and storage medium for call chain data
CN115048352B (en) Log field extraction method, device, equipment and storage medium
CN116070601B (en) Data splicing method and device, electronic equipment and storage medium
CN116545905A (en) Service health detection method and device, electronic equipment and storage medium
CN116303013A (en) Source code analysis method, device, electronic equipment and storage medium
CN115599687A (en) Method, device, equipment and medium for determining software test scene
CN115344627A (en) Data screening method and device, electronic equipment and storage medium
CN112887426B (en) Information stream pushing method and device, electronic equipment and storage medium
CN115495151A (en) Rule engine migration method, device, equipment, storage medium and program product
CN114896418A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN114866437A (en) Node detection method, device, equipment and medium
CN114490408A (en) Test case generation method, device, equipment, storage medium and product
CN114693116A (en) Method and device for detecting code review validity and electronic equipment
CN116089459B (en) Data retrieval method, device, electronic equipment and storage medium
CN112560992B (en) Method, device, electronic equipment and storage medium for optimizing picture classification model
CN116244324B (en) Task data relation mining method and device, electronic equipment and storage medium
CN117632741A (en) Determination method and device of regression test case library, electronic equipment and storage medium
CN118012936A (en) Data extraction method, device, equipment and storage medium
CN115983222A (en) EasyExcel-based file data reading method, device, equipment and medium
CN116502841A (en) Event processing method and device, electronic equipment and medium
CN117806702A (en) Target software acquisition method and device, electronic equipment and storage medium
CN115455060A (en) Data processing method, device, equipment and medium
CN117453746A (en) Method, device, equipment and medium for data cycle screening

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant