CN113672609A - Method for generating resident pregnancy model label based on multi-source data fusion - Google Patents

Method for generating resident pregnancy model label based on multi-source data fusion Download PDF

Info

Publication number
CN113672609A
CN113672609A CN202111025631.5A CN202111025631A CN113672609A CN 113672609 A CN113672609 A CN 113672609A CN 202111025631 A CN202111025631 A CN 202111025631A CN 113672609 A CN113672609 A CN 113672609A
Authority
CN
China
Prior art keywords
data
pregnancy
information
date
birth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111025631.5A
Other languages
Chinese (zh)
Inventor
承孝敏
赵勇
水新莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Information Intelligence Innovation Research Institute
Original Assignee
Yangtze River Delta Information Intelligence Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Information Intelligence Innovation Research Institute filed Critical Yangtze River Delta Information Intelligence Innovation Research Institute
Priority to CN202111025631.5A priority Critical patent/CN113672609A/en
Publication of CN113672609A publication Critical patent/CN113672609A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a method for generating a resident pregnancy model label based on multi-source data fusion, which comprises the following steps: step 1, data aggregation analysis; step 2, gathering birth registration information, pregnancy information, birth information, health care committee/public security birth information, household registration information and birth medical certification information according to business analysis requirements; step 3, checking data; step 4, cleaning data; step 5, fusing data; step 6, fusing multi-source data into a pregnancy model; step 7, generating a pregnancy label based on the pregnancy model; step 8, fusing an intelligent tag generation based on multi-source data to use the pregnant tag; and 9, checking the data quality. The method can quickly find the pregnant woman and track and monitor the pregnant woman, reduces the mortality rate of the pregnant woman and the perinatal infants, is convenient for population information management, and realizes management and verification of the childbirth of the pregnant woman.

Description

Method for generating resident pregnancy model label based on multi-source data fusion
Technical Field
The invention relates to a method for generating a residential pregnancy model label based on multi-source data fusion.
Background
The following problems can exist in the daily pregnancy business checking process:
firstly, a plurality of multiple sources of a pregnant and lying-in model are catalogued with uncertain rights: the pregnant and lying-in resource information catalog has a plurality of department catalogues, the right of each department information resource item is unclear, and the maintenance responsibility of the data item is unclear; meanwhile, there are a number of multi-source cases. For example, Weijian Wei has the resource list of the population system of the whole member, and also has the information of birth registration, pregnancy information, birth information, public security birth information, Weijian Wei all-in-one card for healthy family, and the like.
Secondly, the data standards are not uniform, and the calibers are not consistent: the problem that the caliber of data conflicts and one or more data are inconsistent in the data sharing and exchanging process due to the fact that multiple departments do not accord with the pregnancy and delivery information data items and the data standards. For example, the pregnancy result dictionary entries in the population of the whole population are: live birth, vaginal delivery, cesarean section, termination of pregnancy, spontaneous abortion, artificial abortion within 12 weeks, artificial abortion over 12 weeks, stillbirth and others; and the pregnancy ending follow-up visit dictionary items comprise: live labor in term, spontaneous abortion, induced abortion, miscarriage, hydatidiform mole, ectopic pregnancy, therapeutic induction of labor, low birth weight infant, and premature delivery.
Thirdly, a data exchange process tracing mechanism is imperfect: the business parties involved in the pregnancy information data sharing and exchanging process are departments such as health and fitness committee, public security and the like, the data business process is long, and the examination and approval links are multiple. Meanwhile, auditing cannot be traced in the data application, approval, exchange and use processes.
Fourthly, service collaboration is not deep, and data is unavailable: the business cooperation is in a primary stage, and information islands exist and a plurality of information islands cannot be used. Currently, there are many government departments, and pregnant and lying-in information data are scattered in a plurality of information systems to form a data chimney. For example, the birth population may be new value, such as the birth data published by the bureau of statistics, the birth data published by the police and registered with the family members, and the birth data of the health service committee.
Therefore, it is urgently needed to provide a method for generating a label of a pregnancy model of a resident based on multi-source data fusion to solve the above problems.
Disclosure of Invention
The invention aims to provide a method for generating a resident pregnancy model label based on multi-source data fusion, which can be used for quickly finding a pregnant woman, tracking and monitoring the pregnancy, reducing the mortality of the pregnant woman and the perinatal, facilitating population information management and realizing the management and the verification of the parturient delivery.
In order to achieve the aim, the invention provides a method for generating a resident pregnancy model label based on multi-source data fusion, which comprises the following steps:
step 1, data aggregation analysis;
step 2, gathering birth registration information, pregnancy information, birth information, health care committee/public security birth information, household registration information and birth medical certification information according to business analysis requirements;
step 3, checking data;
step 4, cleaning data;
step 5, fusing data;
step 6, fusing multi-source data into a pregnancy model;
step 7, generating a pregnancy label based on the pregnancy model;
step 8, fusing an intelligent tag generation based on multi-source data to use the pregnant tag;
and 9, checking the data quality.
Preferably, step 1 comprises: defining a series of pregnant and lying-in business data exchange standard for pregnant and lying-in business, which is used for reflecting data structures required by each business application; meanwhile, data provider information is defined in the standard, and the current data is obtained by which service department and which service system request;
the data aggregation method in the step 1 comprises data input and data acquisition aggregation; wherein the content of the first and second substances,
the data entry is to realize the entry of the special business of the pregnant woman through page management operation, and perform template type entry management on the information of the pregnant woman and the like of the community according to a unified standard of 'standard specification of the pregnant woman business';
data collection and aggregation are carried out to butt joint information platforms of all business departments, through an information system which is researched, analyzed and combed, the data of the existing system of the unit is exported by using a data artificial intelligent robot in a guide type operation mode, and then the data of a base number warehouse is imported according to a data import standard.
Preferably, in step 2,
the gathering loading strategy for gathering the birth registration information is incremental addition, and the gathering logic is as follows: firstly, gathering the position of data, namely carrying out birth registration management in a newly-issued certificate system; secondly, the judgment basis of the acquired increasing data every day is the application date, wherein the starting date is the current time minus 7 days, and the ending date is the current time day;
the aggregation loading strategy for aggregating the pregnancy information is incremental addition, and the aggregation logic is as follows: firstly, aggregating the position of data, namely inquiring pregnancy information in the whole member service inquiry in the whole member population service; secondly, the increasing data acquired every day is judged according to the date of pregnancy, wherein the starting date is the current time minus 1 year, and the ending date is the current time day;
the aggregation loading strategy for aggregating the birth information is incremental addition, and the aggregation logic is as follows: firstly, the position of the converged data is the birth information query in the whole member service query in the whole member population service; secondly, the judgment basis of the daily acquired added data is the birth date, wherein the starting date is the current time minus half a year, and the ending date is the current time day;
the aggregation loading strategy for aggregating the Weijian Commission/public Security birth information is incremental addition, and the aggregation logic is as follows: firstly, the position of the converged data is, namely all fields in a GZK _ SC _ WJW _ GACSDJXX data table in a large data center database; secondly, the judgment basis of the acquired data increment every day is the data writing time, namely the current time is reduced by 1 day; the birth registration type is birth registration and the name of the party is reported;
the aggregation loading strategy for aggregating the household registration information is incremental addition, and the aggregation logic is as follows: firstly, the position of the converged data is, namely all fields in a GZK _ SC _ WJW _ GACSDJXX data table in a large data center database; secondly, the judgment basis of the acquired data increment every day is the data writing time, namely the current time is reduced by 1 day; the birth registration type is birth registration and the name of the party is reported;
the convergent loading strategy for converging the birth medicine certification information is incremental addition, and the convergent logic is as follows: firstly, the position of the converged data, namely all fields in a PSN _ BITTH _ CERTIFICATE _ INFO data table in a large data center database; and secondly, the judgment basis of the acquired increasing data every day is the updating time, namely the current time is reduced by 1 day.
Preferably, in step 3:
the data verification is carried out by analyzing fine granularity dimension of a verification object, and aiming at the special pregnancy data, a verification module carries out data uniqueness check and foreign key integrity check; the data checking content comprises information such as type, length, whether the data is empty, precision, range, format and the like, and if the data is not accordant, filtering is carried out; meanwhile, outputting error data including error reasons and error field sequence numbers for the error data;
when data aggregation occurs, all service departments collect aggregated data and transmit data entry information to be verified to a base-level community base number warehouse, a data base verifies whether the entry information is consistent, and if the entry information is consistent, the comparison is returned to be successful; otherwise, returning a comparison error, and returning the information which is collected and accurately converged by each service department to the automatic access module; when the data of the business department are synchronized, the data check is used for comparing the consistency of the imported data and the data of the business department, and if the data of the business department are consistent, the data do not need to be imported repeatedly; if not, the inconsistent information is overwritten with the most recent data.
Preferably, the data cleaning in the step 4 is to perform cleaning processing on the data extracted by the artificial intelligent acquisition and convergence robot, and includes functions of data filtering, data duplication removal, type conversion, coding mapping, file splitting and merging, dimension conversion and the like, and is used for performing inconsistent data conversion, data granularity conversion, data dirtying removal and conversion rule calculation; the data which does not meet the requirements comprises incomplete data, error data and repeated data;
aiming at the incomplete data, performing data filtering on the incomplete data and performing necessary completion by using an algorithm or manual association according to the service attribute;
aiming at the problem that error data, such as numerical data, is input into full-angle numerical characters or invisible characters exist before and after the data, finding out the error data in a mode of writing SQL sentences, and trying to repair the error data by using an algorithm; if the date format is incorrect or the date is out of range, the date format needs to be verified and then repaired;
aiming at the repeated data, for each data in the slave source service system, according to a main key and an identification main body of a table, the main key field repetition, the main body name repetition or other service rules can be identified as the same main body, the latest correct data is judged, all fields of the repeated data records are exported and confirmed and sorted by departments, the repeated data is deleted, and finally a correct data is reserved.
Preferably, the data fusion in the step 5 is to generate new data from the pregnant and labor special item according to a conversion rule and store the new data in a data warehouse of a data base, wherein the data conversion supports one-to-many, many-to-one and many-to-many mapping relation processing among data fields;
for one source, if the source is credible data, the data can be directly put into a database; for a plurality of sources, producing credible data according to a data quality index evaluation method and a data survival rule, and then warehousing the data; for multi-source data fusion, data level fusion, feature level fusion and decision level fusion are included; the data level fusion is to directly use SQL to perform association and fusion after simply processing the original data, and perform data feature extraction after fusion; the feature level fusion is based on extracting data features and then performing association fusion on the data by using a correlation algorithm; the decision-level fusion is to make decisions on each data source, and then perform correlation fusion on the decisions to finally obtain a consistency decision result.
Preferably, in step 6, pregnancy report data is generated according to the birth registration management, pregnancy information, birth information, Weijian/public security birth registration information, household registration information and multi-source data fusion in the medical proof of birth in the specific pregnancy model, and the fused fields comprise the name of a female, the identity card of the female, the pregnancy condition, the pregnancy start date, the pregnancy stop date, the delivery result and the pregnancy result.
Preferably, the label in step 7 is a label,
the bore diameter of the label for non-delivery is calculated as: meanwhile, the conditions that the pregnancy date is a value, the pregnancy date is less than the current 280 days and the pregnancy termination date is not a value are met; or the pregnancy start date is a value, the pregnancy start date is less than the current 280 days, and the pregnancy end date is a zero value; or, the birth control method also meets the conditions that the ending date has a value in the birth registration, the state during the registration is that the pregnancy application, the ending date is less than the current 280 days, and the pregnancy ending date has no value;
the bore of the childbirth label is calculated as: meanwhile, the pregnancy termination date has a value and is greater than the pregnancy termination date; or simultaneously, the pregnancy termination date is greater than the last birth date or the last birth date of the current pregnant woman; or using the Weijian/public security birth registration information and the household registration information born information table to carry out correlation, and judging whether the current pregnant woman information is pregnant with termination or not by the acquired pregnant woman/husband information of the infant birth, wherein the birth date of the infant is equal to the termination date of the pregnancy;
the calculated caliber of the suspected delivery label is as follows: meanwhile, the pregnant woman meets the requirements that the pregnancy starting date of the current pregnant woman is a certain value, the pregnancy ending date is a certain value and the pregnancy starting date is more than 280 days.
Preferably, in step 8,
the tag logic is: firstly, associating a female identification card in a pregnancy report with an existing population management identification card of a community to obtain a resident ID and a basic unit ID; secondly, if the median of the delivery result is that the pregnant woman is not delivered, the pregnant woman label is not; then, adding special data of the pregnancy information in special information management; finally, if the female identification card in the pregnancy report is not in the oral management of the community, generating a corresponding task to be checked;
the pregnancy label includes a resident ID, a pregnancy start date, a pregnancy end date, a pregnancy edd, a pregnancy result, a basic unit ID, and a renewal time.
Preferably, in step 9,
the data quality inspection is implemented by formulating a data quality rule, performing completeness check, data format comparison, data duplication check and relation check on data in the data management process such as data aggregation and data fusion or the quality inspection process according to the data quality rule and the data standard, acquiring effective data to a bottom warehouse, and feeding problem data back to corresponding departments so as to realize distribution, rectification and closed-loop operation;
meanwhile, the service data defects are summarized by analyzing the pregnant and lying-in business data, and a data integrity verification model, a data format verification model, a data range verification model, a data duplication checking model, a relation verification model and a data desensitization model are established according to the pregnant and lying-in business data.
According to the technical scheme, the hdsf + hive + kylin Big data platform is built, the population data of health care committee members from one source and multi-source supplementary fused data, such as multi-source dynamic heterogeneous data of manual supplement, Big data center supplementary data and the like, are combined based on Big Data Integration (BDI), the pregnancy information data are combed according to the pregnancy business by accessing data services such as data aggregation, data verification, data cleaning, data fusion and the like, automatic centralized management and data management are adopted, the pregnancy model is combed based on characteristic level data fusion and a data service mode, and data enabling is provided for the pregnancy scene.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a method for generating a residential pregnancy model label based on multi-source data fusion according to the present invention;
FIG. 2 is a schematic diagram of multi-source data fusion in the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
Referring to fig. 1, the invention provides a method for generating a resident pregnancy model label based on multi-source data fusion, which generates a verified resident pregnancy model by using a multi-source data fusion intelligent label, realizes whole-course service and management on a pregnant woman meeting legal conditions for fertility, and establishes pregnancy check and visit. The method comprises the following steps:
step 1, data aggregation analysis;
step 2, gathering birth registration information, pregnancy information, birth information, health care committee/public security birth information, household registration information and birth medical certification information according to business analysis requirements;
step 3, checking data;
step 4, cleaning data;
step 5, fusing data;
step 6, fusing multi-source data into a pregnancy model;
step 7, generating a pregnancy label based on the pregnancy model;
step 8, fusing an intelligent tag generation based on multi-source data to use the pregnant tag;
and 9, checking the data quality.
Specifically, the step 1 comprises: according to the requirement analysis of the pregnant and lying-in business, a series of pregnant and lying-in business data exchange standard of the pregnant and lying-in business is defined and used for reflecting a data structure required by each business application; meanwhile, data provider information is defined in the standard, and the current data is obtained by which service department and which service system request;
the data aggregation method in the step 1 comprises data input and data acquisition aggregation; wherein the content of the first and second substances,
the data entry is to realize the entry of the special business of the pregnant woman through page management operation, and perform template type entry management on the information of the pregnant woman and the like of the community according to a unified standard of 'standard specification of the pregnant woman business';
data collection and aggregation are carried out to butt joint information platforms of all business departments, through an information system which is researched, analyzed and combed, the data of the existing system of the unit is exported by using a data artificial intelligent robot in a guide type operation mode, and then the data of a base number warehouse is imported according to a data import standard. The type of the import file supports a text file, an excel file, an XML file, a database and the like.
In the step 2, the process is carried out,
the convergence system for converging the birth registration information is an interface information comprehensive service platform, the convergence loading strategy is incremental addition, and the convergence logic is as follows: firstly, gathering the position of data, namely carrying out birth registration management in a newly-issued certificate system; secondly, the judgment basis of the acquired increasing data every day is the application date, wherein the starting date is the current time minus 7 days, and the ending date is the current time day;
the convergence system for converging the pregnancy information is an oral information comprehensive service platform, the convergence loading strategy is incremental addition, and the convergence logic is as follows: firstly, aggregating the position of data, namely inquiring pregnancy information in the whole member service inquiry in the whole member population service; secondly, the increasing data acquired every day is judged according to the date of pregnancy, wherein the starting date is the current time minus 1 year, and the ending date is the current time day;
the convergence system for converging the birth information is an interface information comprehensive service platform, the convergence loading strategy is incremental addition, and the convergence logic is as follows: firstly, the position of the converged data is the birth information query in the whole member service query in the whole member population service; secondly, the judgment basis of the daily acquired added data is the birth date, wherein the starting date is the current time minus half a year, and the ending date is the current time day;
the convergence system for converging the Weijian Commission/public Security birth information is a city big data center, the convergence loading strategy is incremental addition, and the convergence logic is as follows: firstly, the position of the converged data is, namely all fields in a GZK _ SC _ WJW _ GACSDJXX data table in a large data center database; secondly, the judgment basis of the acquired data increment every day is the data writing time, namely the current time is reduced by 1 day; the birth registration type is birth registration and the name of the party is reported;
the aggregation system for aggregating the household registration information is a city big data center, the aggregation loading strategy is incremental addition, and the aggregation logic is as follows: firstly, the position of the converged data is, namely all fields in a GZK _ SC _ WJW _ GACSDJXX data table in a large data center database; secondly, the judgment basis of the acquired data increment every day is the data writing time, namely the current time is reduced by 1 day; the birth registration type is birth registration and the name of the party is reported;
the convergence system for converging the birth medicine certification information is a city big data center, the convergence loading strategy is incremental addition, and the convergence logic is as follows: firstly, the position of the converged data, namely all fields in a PSN _ BITTH _ CERTIFICATE _ INFO data table in a large data center database; secondly, the judgment basis of the acquired increasing data every day is the updating time, namely the current time is reduced by 1 day;
further, in step 3:
the data verification is carried out by analyzing fine granularity dimension of a verification object, and aiming at the special pregnancy data, a verification module carries out data uniqueness check and foreign key integrity check; the data checking content comprises information such as type, length, whether the data is empty, precision, range, format and the like, and if the data is not accordant, filtering is carried out; meanwhile, outputting error data including error reasons and error field sequence numbers for the error data;
when data aggregation occurs, all service departments collect aggregated data and transmit data entry information to be verified to a base-level community base number warehouse, a data base verifies whether the entry information is consistent, and if the entry information is consistent, the comparison is returned to be successful; otherwise, returning a comparison error, and returning the information which is collected and accurately converged by each service department to the automatic access module; when the data of the business department are synchronized, the data check is used for comparing the consistency of the imported data and the data of the business department, and if the data of the business department are consistent, the data do not need to be imported repeatedly; if not, the inconsistent information is overwritten with the most recent data.
Pregnant and lying-in data are extracted from health and public security service systems and contain historical data, and the functions of cleaning, converting and loading the data are realized, so that the problems that some data are wrong and some data conflict with each other are avoided, the wrong or conflicting data are obviously not wanted, and the data which do not meet the requirements need to be filtered by a data cleaning module, and on one hand, the collected and gathered data can be correctly, completely and normatively loaded to a destination; on the other hand, an exception handling mechanism in the data integration process is realized, such as: handling transmission exceptions, data loading exceptions, data structure and quality exceptions, and the like.
The data cleaning processing is the core content of the data base, the system provides basic data service, and the main steps comprise data extraction, data cleaning, data conversion, data loading and the like. The data cleaning and integrating application comprises a plurality of data services, wherein the data cleaning and integrating application comprises a series of predefined basic data processing services, specifically, the data cleaning in the step 4 is to clean the data extracted by the artificial intelligent acquisition and convergence robot, and the data cleaning and integrating application comprises functions of data filtering, data duplication removal, type conversion, coding mapping, file splitting and merging, dimension conversion and the like, and is used for performing inconsistent data conversion, data granularity conversion, data dirtying removal and conversion rule calculation. The inconsistent conversion process is a data integration process, and focuses on uniformly processing the data of the same type from different service systems; unified processing is performed on data with ambiguous, repeated, incomplete, or violating business or logic rules in a source business system, and generally includes: NULL value processing, date format conversion, data type conversion, and the like. The data which does not meet the requirements comprises incomplete data, error data and repeated data;
for incomplete data, the data is mainly data null values caused by the loss of some information which should be available, such as names of people, address information of people and the like. If the department business system is not used as a necessary item, some data can lose significance, subsequent comparison cannot be carried out, and sharing significance is lost, and the data is filtered out and is supplemented by using an algorithm or manual association according to business attributes.
For the error data, the reason for this kind of error is that the service system is not sound enough, and it is not judged after receiving the input and directly written into the background database, for example, the numerical data is input into full-angle digital characters, there is a carriage return operation behind the character string data, the date format is incorrect, the date is out of bounds, etc. The data is classified, and for the problems of full-angle characters and invisible characters before and after the data, the data can be found out only by writing SQL sentences, and the data is attempted to be repaired by using an algorithm. Errors such as incorrect date formats or date overruns can result in incomplete data due to the inability to obtain the most current data, which requires verification by the client and attempted repair.
Aiming at the repeated data, for each data in the slave source service system, according to a main key and an identification main body of a table, the main key field repetition, the main body name repetition or other service rules can be identified as the same main body, the latest correct data is judged, all fields of the repeated data records are derived, departments confirm and sort the data, the repeated data is deleted, and a correct data is reserved.
Step 5, merging the data into new data generated by the pregnant special data according to a conversion rule and storing the new data in a data warehouse of a data base, wherein the data conversion supports one-to-many, many-to-one and many-to-many mapping relation processing among data fields;
as shown in fig. 2, for one source, the pregnancy result information provided by the health care is designated as credible data, and the data can be directly put into a database. The information of the female identification card provided by the public security is credible data, and the data can be directly put into a database. And for a plurality of sources, designating a plurality of data providers to provide data, generating credible data according to a data quality index evaluation method and a data survival rule, and then performing data storage. For multi-source data fusion, data level fusion, feature level fusion and decision level fusion are included; the data level fusion is to directly use SQL to perform association and fusion after simply processing the original data, and perform data feature extraction after fusion; the feature level fusion is based on extracting data features and then performing association fusion on the data by using a correlation algorithm; the decision-level fusion is to make decisions on each data source, and then perform correlation fusion on the decisions to finally obtain a consistency decision result.
In step 6, pregnancy report data is generated according to the fusion of the birth registration management, the pregnancy information, the birth information, the medical health commission/public security birth registration information, the household registration information and the multi-source data in the medical birth certificate in the special pregnancy model, and the fused fields comprise the name of a female, the identity card of the female, the pregnancy condition, the pregnancy start date, the pregnancy end date, the delivery result and the pregnancy result.
In the labor outcome label in step 7,
the bore diameter of the label for non-delivery is calculated as: meanwhile, the conditions that the pregnancy date is a value, the pregnancy date is less than the current 280 days and the pregnancy termination date is not a value are met; or the pregnancy start date is a value, the pregnancy start date is less than the current 280 days, and the pregnancy end date is a zero value; or, the birth control method also meets the conditions that the ending date has a value in the birth registration, the state during the registration is that the pregnancy application, the ending date is less than the current 280 days, and the pregnancy ending date has no value;
the bore of the childbirth label is calculated as: meanwhile, the pregnancy termination date has a value and is greater than the pregnancy termination date; or simultaneously, the pregnancy termination date is greater than the last birth date or the last birth date of the current pregnant woman; or using the Weijian/public security birth registration information and the household registration information born information table to carry out correlation, and judging whether the current pregnant woman information is pregnant with termination or not by the acquired pregnant woman/husband information of the infant birth, wherein the birth date of the infant is equal to the termination date of the pregnancy;
the calculated caliber of the suspected delivery label is as follows: meanwhile, the pregnant woman meets the requirements that the pregnancy starting date of the current pregnant woman is a certain value, the pregnancy ending date is a certain value and the pregnancy starting date is more than 280 days.
In a step 8 of the method, the step of the method,
the tag logic is: firstly, associating a female identification card in a pregnancy report with an existing population management identification card of a community to obtain a resident ID and a basic unit ID; secondly, if the median of the delivery result is that the pregnant woman is not delivered, the pregnant woman label is not; then, adding special data of the pregnancy information in special information management; finally, if the female identification card in the pregnancy report is not in the oral management of the community, generating a corresponding task to be checked;
the pregnancy label includes a resident ID, a pregnancy start date, a pregnancy end date, a pregnancy edd, a pregnancy result, a basic unit ID, and a renewal time.
In a step 9 of the method, the step of the method,
the data quality inspection is implemented by formulating a data quality rule, performing completeness check, data format comparison, data duplication check and relation check on data in the data management process such as data aggregation and data fusion or the quality inspection process according to the data quality rule and the data standard, acquiring effective data to a bottom warehouse, and feeding problem data back to corresponding departments so as to realize distribution, rectification and closed-loop operation;
meanwhile, the service data defects are summarized by analyzing the pregnant and lying-in business data, and a data integrity verification model, a data format verification model, a data range verification model, a data duplication checking model, a relation verification model and a data desensitization model are established according to the pregnant and lying-in business data.
In summary, in the big data era, the traditional data sharing method cannot meet the timeliness of massive pregnancy data sharing, and the timeliness and intelligence of pregnancy check visiting can be affected. According to the method, the multi-source data are fused with the data of business departments such as health and public security, a pregnancy model is generated, and then a pregnancy label is generated through the intelligent label, so that support can be provided for the special pregnancy item of the residents who visit and check the general social workers. Meanwhile, the data use degree condition of data fusion is improved, feature level data fusion is carried out on the data sharing characteristics of various extracted heterogeneous data sources, pregnancy data and birth information are extracted for data fusion, the prediction features of suspected delivery and the like are calculated by using an algorithm according to the daily data condition, and a pregnancy/birth model label is generated.
Therefore, the pregnant woman can be found quickly by the method, the pregnant woman is brought into early pregnancy health care management, population information management of women of childbearing age is realized, population information is fully mined and utilized, the mortality rate of pregnant and lying-in women and the mortality rate of perinatal infants can be reduced by tracking and monitoring the pregnant woman, and the health and the safety of mothers and infants are achieved.
In addition, huge married and fertile-age crowds exist in the family separation households, the situations are complex, and difficulties and problems are brought to current urban family planning management service work.
Meanwhile, health, statistics and public security shared data are acquired in a data management mode, the data quality inspection mechanism is established, the data quality of the system is improved through the quality control of health, statistics and public security data, meanwhile, the whole population information and public health service related data are managed and then provided to a base layer for use, and the parturient childbirth management and verification are achieved.
The preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, however, the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the technical idea of the present invention, and these simple modifications are within the protective scope of the present invention.
It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition.
In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims (10)

1. A method for generating a resident pregnancy model label based on multi-source data fusion is characterized by comprising the following steps:
step 1, data aggregation analysis;
step 2, gathering birth registration information, pregnancy information, birth information, health care committee/public security birth information, household registration information and birth medical certification information according to business analysis requirements;
step 3, checking data;
step 4, cleaning data;
step 5, fusing data;
step 6, fusing multi-source data into a pregnancy model;
step 7, generating a pregnancy label based on the pregnancy model;
step 8, fusing an intelligent tag generation based on multi-source data to use the pregnant tag;
and 9, checking the data quality.
2. The method for generating the residential pregnancy model label based on multi-source data fusion as claimed in claim 1, wherein step 1 comprises: defining a series of pregnant and lying-in business data exchange standard for pregnant and lying-in business, which is used for reflecting data structures required by each business application; meanwhile, data provider information is defined in the standard, and the current data is obtained by which service department and which service system request;
the data aggregation method in the step 1 comprises data input and data acquisition aggregation; wherein the content of the first and second substances,
the data entry is to realize the entry of the special business of the pregnant woman through page management operation, and perform template type entry management on the information of the pregnant woman and the like of the community according to a unified standard of 'standard specification of the pregnant woman business';
data collection and aggregation are carried out to butt joint information platforms of all business departments, through an information system which is researched, analyzed and combed, the data of the existing system of the unit is exported by using a data artificial intelligent robot in a guide type operation mode, and then the data of a base number warehouse is imported according to a data import standard.
3. The method for generating the label of the pregnancy model of the residents based on the multi-source data fusion as claimed in claim 1, wherein in step 2,
the gathering loading strategy for gathering the birth registration information is incremental addition, and the gathering logic is as follows: firstly, gathering the position of data, namely carrying out birth registration management in a newly-issued certificate system; secondly, the judgment basis of the acquired increasing data every day is the application date, wherein the starting date is the current time minus 7 days, and the ending date is the current time day;
the aggregation loading strategy for aggregating the pregnancy information is incremental addition, and the aggregation logic is as follows: firstly, aggregating the position of data, namely inquiring pregnancy information in the whole member service inquiry in the whole member population service; secondly, the increasing data acquired every day is judged according to the date of pregnancy, wherein the starting date is the current time minus 1 year, and the ending date is the current time day;
the aggregation loading strategy for aggregating the birth information is incremental addition, and the aggregation logic is as follows: firstly, the position of the converged data is the birth information query in the whole member service query in the whole member population service; secondly, the judgment basis of the daily acquired added data is the birth date, wherein the starting date is the current time minus half a year, and the ending date is the current time day;
the aggregation loading strategy for aggregating the Weijian Commission/public Security birth information is incremental addition, and the aggregation logic is as follows: firstly, the position of the converged data is, namely all fields in a GZK _ SC _ WJW _ GACSDJXX data table in a large data center database; secondly, the judgment basis of the acquired data increment every day is the data writing time, namely the current time is reduced by 1 day; the birth registration type is birth registration and the name of the party is reported;
the aggregation loading strategy for aggregating the household registration information is incremental addition, and the aggregation logic is as follows: firstly, the position of the converged data is, namely all fields in a GZK _ SC _ WJW _ GACSDJXX data table in a large data center database; secondly, the judgment basis of the acquired data increment every day is the data writing time, namely the current time is reduced by 1 day; the birth registration type is birth registration and the name of the party is reported;
the convergent loading strategy for converging the birth medicine certification information is incremental addition, and the convergent logic is as follows: firstly, the position of the converged data, namely all fields in a PSN _ BITTH _ CERTIFICATE _ INFO data table in a large data center database; and secondly, the judgment basis of the acquired increasing data every day is the updating time, namely the current time is reduced by 1 day.
4. The method for generating the residential pregnancy model label based on multi-source data fusion as claimed in claim 1, wherein in step 3:
the data verification is carried out by analyzing fine granularity dimension of a verification object, and aiming at the special pregnancy data, a verification module carries out data uniqueness check and foreign key integrity check; the data checking content comprises information such as type, length, whether the data is empty, precision, range, format and the like, and if the data is not accordant, filtering is carried out; meanwhile, outputting error data including error reasons and error field sequence numbers for the error data;
when data aggregation occurs, all service departments collect aggregated data and transmit data entry information to be verified to a base-level community base number warehouse, a data base verifies whether the entry information is consistent, and if the entry information is consistent, the comparison is returned to be successful; otherwise, returning a comparison error, and returning the information which is collected and accurately converged by each service department to the automatic access module; when the data of the business department are synchronized, the data check is used for comparing the consistency of the imported data and the data of the business department, and if the data of the business department are consistent, the data do not need to be imported repeatedly; if not, the inconsistent information is overwritten with the most recent data.
5. The method for generating the resident pregnancy model label based on the multi-source data fusion according to claim 1, wherein the data cleaning in the step 4 is to perform cleaning processing on the data extracted by the artificial intelligent acquisition and convergence robot, and includes functions of data filtering, data duplication removal, type conversion, code mapping, file splitting and merging, dimension conversion and the like, and is used for performing inconsistent data conversion, data granularity conversion, data dirtying and conversion rule calculation; the data which does not meet the requirements comprises incomplete data, error data and repeated data;
aiming at the incomplete data, performing data filtering on the incomplete data and performing necessary completion by using an algorithm or manual association according to the service attribute;
aiming at the problem that error data, such as numerical data, is input into full-angle numerical characters or invisible characters exist before and after the data, finding out the error data in a mode of writing SQL sentences, and trying to repair the error data by using an algorithm; if the date format is incorrect or the date is out of range, the date format needs to be verified and then repaired;
aiming at the repeated data, for each data in the slave source service system, according to a main key and an identification main body of a table, the main key field repetition, the main body name repetition or other service rules can be identified as the same main body, the latest correct data is judged, all fields of the repeated data records are exported and confirmed and sorted by departments, the repeated data is deleted, and finally a correct data is reserved.
6. The method for generating the resident pregnancy model label based on the multi-source data fusion as claimed in claim 1, wherein the data fusion in step 5 is to generate new data from the data of the specific pregnancy items according to the conversion rules and store the new data in the data warehouse of the data base, wherein the data conversion supports the mapping relation processing of one-to-many, many-to-one and many-to-many between the data fields;
for one source, if the source is credible data, the data can be directly put into a database; for a plurality of sources, producing credible data according to a data quality index evaluation method and a data survival rule, and then warehousing the data; for multi-source data fusion, data level fusion, feature level fusion and decision level fusion are included; the data level fusion is to directly use SQL to perform association and fusion after simply processing the original data, and perform data feature extraction after fusion; the feature level fusion is based on extracting data features and then performing association fusion on the data by using a correlation algorithm; the decision-level fusion is to make decisions on each data source, and then perform correlation fusion on the decisions to finally obtain a consistency decision result.
7. The method for generating the resident pregnancy model label based on the multi-source data fusion as claimed in claim 1, wherein the pregnancy report data is generated in step 6 according to the birth registration management, the pregnancy information, the birth information, the health care committee/police birth registration information, the household registration information and the multi-source data fusion in the medical proof of birth in the specific model of the pregnancy, and the fused fields include the name of the female, the identity card of the female, the pregnancy condition, the pregnancy start date, the end date of the pregnancy, the delivery result and the pregnancy result.
8. The method for generating labels of pregnancy models of residents based on multi-source data fusion as claimed in claim 1, wherein in the label in step 7,
the bore diameter of the label for non-delivery is calculated as: meanwhile, the conditions that the pregnancy date is a value, the pregnancy date is less than the current 280 days and the pregnancy termination date is not a value are met; or the pregnancy start date is a value, the pregnancy start date is less than the current 280 days, and the pregnancy end date is a zero value; or, the birth control method also meets the conditions that the ending date has a value in the birth registration, the state during the registration is that the pregnancy application, the ending date is less than the current 280 days, and the pregnancy ending date has no value;
the bore of the childbirth label is calculated as: meanwhile, the pregnancy termination date has a value and is greater than the pregnancy termination date; or simultaneously, the pregnancy termination date is greater than the last birth date or the last birth date of the current pregnant woman; or using the Weijian/public security birth registration information and the household registration information born information table to carry out correlation, and judging whether the current pregnant woman information is pregnant with termination or not by the acquired pregnant woman/husband information of the infant birth, wherein the birth date of the infant is equal to the termination date of the pregnancy;
the calculated caliber of the suspected delivery label is as follows: meanwhile, the pregnant woman meets the requirements that the pregnancy starting date of the current pregnant woman is a certain value, the pregnancy ending date is a certain value and the pregnancy starting date is more than 280 days.
9. The method for generating the label of the pregnancy model of the residents based on the multi-source data fusion as claimed in claim 1, wherein in step 8,
the tag logic is: firstly, associating a female identification card in a pregnancy report with an existing population management identification card of a community to obtain a resident ID and a basic unit ID; secondly, if the median of the delivery result is that the pregnant woman is not delivered, the pregnant woman label is not; then, adding special data of the pregnancy information in special information management; finally, if the female identification card in the pregnancy report is not in the oral management of the community, generating a corresponding task to be checked;
the pregnancy label includes a resident ID, a pregnancy start date, a pregnancy end date, a pregnancy edd, a pregnancy result, a basic unit ID, and a renewal time.
10. The method for generating the label of the pregnancy model of the residents based on the multi-source data fusion as claimed in claim 1, wherein in step 9,
the data quality inspection is implemented by formulating a data quality rule, performing completeness check, data format comparison, data duplication check and relation check on data in the data management process such as data aggregation and data fusion or the quality inspection process according to the data quality rule and the data standard, acquiring effective data to a bottom warehouse, and feeding problem data back to corresponding departments so as to realize distribution, rectification and closed-loop operation;
meanwhile, the service data defects are summarized by analyzing the pregnant and lying-in business data, and a data integrity verification model, a data format verification model, a data range verification model, a data duplication checking model, a relation verification model and a data desensitization model are established according to the pregnant and lying-in business data.
CN202111025631.5A 2021-09-02 2021-09-02 Method for generating resident pregnancy model label based on multi-source data fusion Withdrawn CN113672609A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111025631.5A CN113672609A (en) 2021-09-02 2021-09-02 Method for generating resident pregnancy model label based on multi-source data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111025631.5A CN113672609A (en) 2021-09-02 2021-09-02 Method for generating resident pregnancy model label based on multi-source data fusion

Publications (1)

Publication Number Publication Date
CN113672609A true CN113672609A (en) 2021-11-19

Family

ID=78548288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111025631.5A Withdrawn CN113672609A (en) 2021-09-02 2021-09-02 Method for generating resident pregnancy model label based on multi-source data fusion

Country Status (1)

Country Link
CN (1) CN113672609A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631640A (en) * 2023-07-14 2023-08-22 福州康为网络技术有限公司 Method and platform for generating personalized demand scheme of pregnant woman
CN116756162A (en) * 2023-06-28 2023-09-15 蝉鸣科技(西安)有限公司 Method and system for guaranteeing data consistency

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116756162A (en) * 2023-06-28 2023-09-15 蝉鸣科技(西安)有限公司 Method and system for guaranteeing data consistency
CN116756162B (en) * 2023-06-28 2024-03-12 蝉鸣科技(西安)有限公司 Method and system for guaranteeing data consistency
CN116631640A (en) * 2023-07-14 2023-08-22 福州康为网络技术有限公司 Method and platform for generating personalized demand scheme of pregnant woman
CN116631640B (en) * 2023-07-14 2024-05-31 福州康为网络技术有限公司 Method and platform for generating personalized demand scheme of pregnant woman

Similar Documents

Publication Publication Date Title
CN110781236A (en) Method for constructing government affair big data management system
US11334599B2 (en) Systems and methods for electronic data record synchronization
CN111046035B (en) Data automation processing method, system, computer equipment and readable storage medium
CN113672609A (en) Method for generating resident pregnancy model label based on multi-source data fusion
CN111078780A (en) AI optimization data management method
CN102246174A (en) Automated assertion reuse for improved record linkage in distributed & autonomous healthcare environments with heterogeneous trust models
CN112231333A (en) Ecological environment data sharing and exchanging method and system
CN106663101A (en) Ontology mapping method and apparatus
CN101436200A (en) Standardized information management system and standard updating method thereof
CN109542967A (en) Smart city data-sharing systems and method based on XBRL standard
CN110109908B (en) Analysis system and method for mining potential relationship of person based on social basic information
CN112687399A (en) Infectious disease monitoring and early warning system based on artificial intelligence informatization
CN111126957B (en) Inspection service lineage data acquisition and integration method for inspection service collaborative flow
CN111191153A (en) Information technology consultation service display device
CN111899132B (en) Method for automatically identifying case not found in specified period
CN105677745A (en) General efficient self-service data search system and implementation method
CN115982429B (en) Knowledge management method and system based on flow control
CN110019237B (en) System and method for analyzing criminal whereabouts based on map
CN113742498B (en) Knowledge graph construction and updating method
CN115481105A (en) Data management method, device, electronic equipment and storage medium
RU105492U1 (en) AUTOMATED SYSTEM FOR REALIZATION OF REQUESTS OF THE MANAGEMENT BODY TO SUBSIDIARY STRUCTURE ELEMENTS ON THE BASIS OF MODIFIED EXCEL TABLES
CN114036316A (en) Intelligent laboratory management system based on knowledge graph visualization
KR20140123647A (en) System for analyzing intellectual property
CN117520324A (en) Government affair data cleaning method and device, electronic equipment and storage medium
CN115965329A (en) Scientific and technological big data intelligent decision analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20211119