WO2019196279A1 - 疾病异常数据检测方法及装置、计算机装置及存储介质 - Google Patents

疾病异常数据检测方法及装置、计算机装置及存储介质 Download PDF

Info

Publication number
WO2019196279A1
WO2019196279A1 PCT/CN2018/099610 CN2018099610W WO2019196279A1 WO 2019196279 A1 WO2019196279 A1 WO 2019196279A1 CN 2018099610 W CN2018099610 W CN 2018099610W WO 2019196279 A1 WO2019196279 A1 WO 2019196279A1
Authority
WO
WIPO (PCT)
Prior art keywords
disease
data
time point
abnormal value
abnormal
Prior art date
Application number
PCT/CN2018/099610
Other languages
English (en)
French (fr)
Inventor
阮晓雯
徐亮
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to SG11202008183UA priority Critical patent/SG11202008183UA/en
Publication of WO2019196279A1 publication Critical patent/WO2019196279A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a disease abnormal data detecting method and apparatus, a computer apparatus, and a non-volatile readable storage medium.
  • a first aspect of the present application provides a method for detecting abnormality of a disease, the method comprising:
  • time series data X, X [x 0 , x 1 , x 2 , ..., x t ];
  • the time window size w is selected, and for each time point i in the time point w to the time point t, the mean ⁇ i and the standard deviation ⁇ i of the disease monitoring data in the time window corresponding to the time point i are calculated, the time point
  • Disease surveillance data determining the time point corresponding to moving-zscore i x i value is greater than the abnormality threshold value, if the value of moving-zscore disease surveillance data corresponding to the time point i x i is greater than the abnormality threshold value, then the disease surveillance data corresponding to the time point i x i as an outlier.
  • a second aspect of the present application provides a disease abnormality data detecting apparatus, the apparatus comprising:
  • Calculation means for selecting the time window size w, w against time point to the time point t at each time point i, [mu] i to calculate the mean and the standard within the time point i corresponding to a time window of disease surveillance data difference [sigma] i
  • the calculating unit is further configured to calculate a moving-zscore value of the disease monitoring data x i corresponding to the time point i according to the mean value ⁇ i and the standard deviation ⁇ i of the disease monitoring data in the time window corresponding to each time point i Mz i , get the list of mz [mz w , mz w+1 , mz w+2 ,...,mz t ], where
  • a determining unit configured to determine an abnormal value threshold of the time series data X according to the mz list
  • Determination means for moving-zscore disease surveillance data value is determined corresponding to the time point i x i is greater than the abnormality threshold value, if the value of moving-zscore disease surveillance data corresponding to the time point i x i is greater than the outlier
  • the threshold value is that the disease monitoring data x i corresponding to the time point i is an abnormal value.
  • a third aspect of the present application provides a computer apparatus comprising a memory and a processor, the memory storing at least one computer readable instruction, the processor executing the at least one computer readable instruction to implement the Disease abnormal data detection method.
  • a fourth aspect of the present application provides a non-volatile readable storage medium having stored thereon at least one computer readable instruction, wherein the at least one computer readable instruction
  • the disease abnormality data detecting method is implemented when executed by a processor.
  • the application introduces a time window, and the historical neighboring data with the most reference value is distributed as a reference, and the detection result is more reasonable. Due to the small amount of data in the time window, it is difficult for the data in the time window to satisfy the normal distribution, so the traditional zscore method cannot be used to detect the abnormal value in the disease monitoring data.
  • the present application calculates the moving-zscore value of multiple disease monitoring data, ensures that the number is large enough to meet the condition of normal distribution, and then determines the abnormal value threshold based on all moving-zscore values, thereby detecting abnormalities in the disease monitoring data. value. Therefore, the present application has higher applicability for abnormal detection of disease monitoring data, and achieves efficient and accurate abnormality detection of disease monitoring data.
  • Embodiment 1 is a flow chart of a method for detecting disease abnormality data provided in Embodiment 1 of the present application.
  • FIG. 2 is a structural diagram of a disease abnormality data detecting apparatus provided in Embodiment 2 of the present application.
  • FIG. 3 is a schematic diagram of a computer device according to Embodiment 4 of the present application.
  • the disease abnormality data detecting method of the present application is applied to one or more computer devices.
  • the computer device is a device capable of automatically performing numerical calculation and/or information processing according to an instruction set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor and an application specific integrated circuit (ASIC). , Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.
  • ASIC application specific integrated circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded devices etc.
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device.
  • Embodiment 1 is a flow chart of a method for detecting disease abnormality data provided in Embodiment 1 of the present application.
  • the disease abnormality data detecting method is applied to a computer device.
  • the disease abnormal data detecting method detects abnormal values in the disease monitoring data, so that early public health events of disease epidemics or outbreaks can be identified early, and corresponding control measures are taken as early as possible to reduce the losses caused by public health emergencies to lowest.
  • the method for detecting abnormality data includes the following steps:
  • the disease monitoring data may include monitoring data for diseases such as influenza, hand, foot and mouth disease, measles, and mumps.
  • a disease monitoring network composed of a plurality of monitoring points may be established in a preset area (eg, a province, a city, a region), and the disease monitoring data is acquired from the monitoring point.
  • Medical institutions, schools, child care institutions, pharmacies, etc. can be selected as monitoring points to conduct disease monitoring and data collection for the corresponding target population.
  • a place that meets the preset conditions can be selected as the monitoring point.
  • the preset condition may include a number of people, a scale, and the like. For example, select a school with a predetermined number of schools and child care institutions as monitoring points.
  • Another example is to select a pharmacy that has reached the preset size (for example, by daily turnover) as a monitoring point.
  • select a hospital for example, the number of people who seek medical treatment in Japan to reach a preset size as a monitoring point.
  • Disease monitoring data at different times constitute time series data for disease surveillance.
  • disease monitoring data collected on a daily basis can be used to form time series data for disease surveillance.
  • the disease monitoring data collected on a weekly basis may constitute time series data for disease monitoring.
  • Medical institutions (mainly including hospitals) are the best place to capture early warning signs of disease and are the first choice for disease surveillance.
  • Disease surveillance data can be obtained based on patient visits.
  • the disease monitoring data can be obtained according to the drug sales of the pharmacy.
  • the medical institution, the school, the child care institution, and the pharmacy are mainly selected for the collection of disease monitoring data.
  • the above selection of data sources does not limit the addition or replacement of other focused populations or sites in other embodiments as a source of data for monitoring.
  • hotels can be included in the disease surveillance area to obtain disease surveillance data for hotel residents.
  • the disease monitoring data collected by any type of monitoring point can constitute time series data of disease monitoring.
  • the disease monitoring data collected by the hospital can be taken to constitute time series data of disease monitoring.
  • the disease monitoring data collected by the plurality of types of monitoring points can be combined to form time series data of disease monitoring.
  • the disease monitoring data collected by the hospital can be used as the main, supplemented by the disease monitoring data of the pharmacy, and constitute the time series data of the disease monitoring.
  • the disease monitoring data may include disease data such as the number of visits to the disease, the rate of visits, the number of cases, and the incidence rate.
  • disease data such as the number of visits to the disease, the rate of visits, the number of cases, and the incidence rate.
  • the number of daily visits to a disease eg, flu
  • a medical institution eg, a hospital
  • the number of daily visits of the disease eg, flu
  • the daily incidence of a student's disease eg, influenza
  • influenza can be obtained from the school, and the daily incidence of the disease (eg, influenza) can be used as disease monitoring data.
  • time point 0 and the time point t represent the start time and the end time of the time series data, and are not limited to the time series data. Any point in time can be used as the time point 0.
  • the time window size w is an adjustable parameter and can be adjusted according to actual conditions.
  • the disease monitoring data is daily disease monitoring data (eg, the number of daily visits to the disease), and the time window size w can be taken as 7 (ie, one week).
  • the disease monitoring data is weekly disease monitoring data (eg, the number of weekly visits to the disease), and the time window size w can be taken as 3 (ie, three weeks).
  • Step 103 Calculate the moving-zscore value mz i of the disease monitoring data x i corresponding to the time point i according to the mean value ⁇ i and the standard deviation ⁇ i of the disease monitoring data in the time window corresponding to each time point i , to obtain mz List [mz w ,mz w+1 ,mz w+2 ,...,mz t ], where
  • Step 104 Determine an abnormal value threshold of the time series data X according to the mz list.
  • the preset quantile or the preset percentile of the mz list may be calculated, and the preset quantile or the preset percentile is used as the abnormal value threshold. For example, calculate the 0.979th quantile of the mz list and the 0.99th quantile of the mz list as the outlier threshold. As another example, the 0.5th quantile (ie, the median) of the mz list is calculated, and the 0.5th quantile (ie, the median) of the mz list is used as the abnormal value threshold.
  • the average number of mz lists can be calculated, and the average is used as an abnormal value threshold.
  • Step 105 determines whether moving-zscore value of the time point i corresponding disease surveillance data x i is greater than the abnormality threshold value, if the moving-zscore value of the time point i corresponding disease surveillance data x i is greater than the abnormality threshold value, Then, the disease monitoring data x i corresponding to the time point i is an abnormal value.
  • the outlier threshold is 12, 9 disease surveillance data corresponding to the time point x moving-zscore 9 13 value, is greater than the abnormality threshold value, the monitoring data x 9 disease an abnormal value.
  • the disease surveillance data for 29 consecutive time points are:
  • the moving-zscore value mz i of the disease monitoring data x i obtaining a list of mz [mz w , mz w+1 , mz w+2 , . . . , mz t ]; determining an abnormality of the time series data X according to the mz list the threshold value; whether the moving-zscore value determination time point i corresponding disease surveillance data x i is greater than the abnormality threshold value, moving-zscore value if the time point i corresponding disease surveillance data x i is greater than the abnormality threshold value, Then, the disease monitoring data x i corresponding to the time point i is an abnormal value.
  • the first embodiment calculates the moving-zscore value of the plurality of disease monitoring data, ensures that the quantity is large enough to satisfy the condition of the normal distribution, and then determines the abnormal value threshold according to all the moving-zscore values, thereby detecting the disease monitoring data. Outliers. Therefore, the first embodiment has higher applicability for abnormal detection of disease monitoring data, and achieves efficient and accurate abnormality detection of disease monitoring data.
  • disease abnormality data detection may be performed in conjunction with other disease abnormality data detecting methods (other than the disease abnormal data detecting method of the first embodiment).
  • the method may further include the following steps: acquiring an abnormal value obtained by detecting the time series data by another disease abnormality data detecting method; and using the abnormal value obtained by the disease abnormal data detecting method of the first embodiment and other abnormal data of the disease The abnormal values obtained by the detection method are compared; the abnormal value obtained by the disease abnormality data detecting method of the first embodiment is compared with the abnormal value obtained by the other disease abnormal data detecting method to obtain the final abnormal value.
  • Other methods for detecting abnormal data of diseases may include one type or multiple types. Each of the other disease abnormal data detection methods can obtain a corresponding set of outliers.
  • the disease abnormal data detection method of the first embodiment is used to detect the abnormality data of the disease monitoring time to obtain the first abnormal value
  • the second disease abnormal data detecting method different from the disease abnormal data detecting method of the first embodiment is used.
  • the disease abnormality data is detected by the disease-monitored time series data to obtain a second abnormal value
  • the first abnormal value and the second abnormal value are compared to obtain a final abnormal value. If the first abnormal value and the second abnormal value are consistent, the first abnormal value/to the second abnormal value is used as the final abnormal value; otherwise, if the first abnormal value and the If the second abnormal value is inconsistent, the first abnormal value and the second abnormal value are discarded.
  • the disease abnormal data detection method of the first embodiment is used to detect the abnormality data of the disease monitoring time to obtain the first abnormal value, and the second abnormality data detection method different from the disease abnormal data detecting method of the first embodiment is used.
  • the method (for example, the Grubbs detection method) performs the disease abnormal data detection on the time series data of the disease monitoring to obtain a second abnormal value, and uses the third disease abnormal data detection method different from the disease abnormal data detection method of the first embodiment (for example, shellfish).
  • the detection method of the disease abnormality data is performed on the time series data of the disease monitoring to obtain a third abnormal value, and the first abnormal value, the second abnormal value and the third abnormal value are compared to obtain a final abnormality. value. If at least two outliers of the first outlier, the second outlier, and the third outlier are consistent, the consistent outlier is used as the final outlier.
  • the disease abnormality data detecting method of the first embodiment can be used to detect the disease abnormal data of the regional and regional hospitals respectively, and obtain the detection result for the region (ie, the regional abnormal value) and the detection for the regional hospitals.
  • the result (ie, the regional outpatient abnormal value), the abnormal value of the area and the abnormal value of the hospital in the area are compared, and the final abnormal value is obtained according to the comparison between the abnormal value of the area and the abnormal value of the hospital of the area.
  • test results for the region are consistent with the test results for the regional hospitals, the test results for the region/test results for the regional hospitals are used as the final test results; otherwise, if the test results for the region and the hospitals for the districts If the test results are inconsistent, the test results for the region and the test results for the regional hospitals are discarded.
  • the detection of disease abnormal data in the region is based on the time series data of the disease monitoring in the region (for example, the disease monitoring data collected by the disease control department of the region from the hospitals in each region) to obtain the regional abnormal value, and the regional hospitals are performed.
  • the disease abnormality data detection is based on the time series data of the disease monitoring of the hospitals in the area, and the abnormal values of the hospitals in the area are obtained.
  • the abnormal value is found out from the disease monitoring data (for example, the number of cases) in a certain period of time in the region, and the disease monitoring data in the same period of time in the subordinate hospital of the region Find the outliers and take the intersection of the two dimensions as the final test result.
  • the disease monitoring data for example, the number of cases
  • the number of cases in the districts of 2014-3-3, 2014-3-4, 2014-3-5, 2014-3-6 are 160, 250, 170, 180, respectively, and the subordinate hospitals in the area (for example, the cumulative number of visits to the disease in the area is the most The number of cases during this time was 130, 180, 125, 140. From the distribution of the number of cases in the area, the number of abnormal cases occurred on the day of 2014-3-4, and the number of abnormal cases appeared in the hospitals on this day. Therefore, it was found that the abnormality of the disease appeared on this day.
  • the number of cases in the districts of 2014-3-3, 2014-3-4, 2014-3-5, 2014-3-6 are 160, 210, 170, 180, respectively, and the hospitals in the area (such as the cumulative number of visits to the disease in the area)
  • the largest number of hospitals were 130, 140, 125, 140 during this time. From the distribution of the number of cases in the area, the number of abnormal cases occurred on the day of 2014-3-4, but the number of abnormal cases did not occur on the hospitals. This is because there are many hospitals in the area.
  • the number of cases in this area is the sum of the number of cases in all the hospitals of the subordinates. Some hospitals have a small increase on this day, which is not abnormal in the hospital dimension. But the sum of all small increases is unusual in the regional dimension.
  • the disease abnormal data detection method of the first embodiment is used to detect the disease abnormal data of the regional and regional hospitals respectively, and the final test result is obtained according to the detection result of the region and the detection result of the regional hospital, so that a more accurate disease abnormality can be obtained. Data detection results.
  • FIG. 2 is a structural diagram of a disease abnormality data detecting apparatus according to Embodiment 2 of the present application.
  • the disease abnormality data detecting apparatus 10 may include an obtaining unit 201, a calculating unit 202, a determining unit 203, and a determining unit 204.
  • the disease monitoring data may include monitoring data for diseases such as influenza, hand, foot and mouth disease, measles, and mumps.
  • a disease monitoring network composed of a plurality of monitoring points may be established in a preset area (for example, a province, a city, a region), and the disease monitoring data is acquired from the monitoring point, and the disease monitoring data constitutes time series data constituting the disease monitoring.
  • Medical institutions, schools, child care institutions, pharmacies, etc. can be selected as monitoring points to conduct disease monitoring and data collection for the corresponding target population.
  • a place that meets the preset conditions can be selected as the monitoring point.
  • the preset condition may include a number of people, a scale, and the like. For example, select a school with a predetermined number of schools and child care institutions as monitoring points. Another example is to select a pharmacy that has reached the preset size (for example, by daily turnover) as a monitoring point. For another example, select a hospital (for example, the number of people who seek medical treatment in Japan) to reach a preset size as a monitoring point.
  • Disease monitoring data at different times constitute time series data for disease surveillance.
  • disease monitoring data collected on a daily basis can be used to form time series data for disease surveillance.
  • the disease monitoring data collected on a weekly basis may constitute time series data for disease monitoring.
  • Medical institutions (mainly including hospitals) are the best place to capture early warning signs of disease and are the first choice for disease surveillance.
  • Disease surveillance data can be obtained based on patient visits.
  • the disease monitoring data can be obtained according to the drug sales of the pharmacy.
  • the medical institution, the school, the child care institution, and the pharmacy are mainly selected for the collection of disease monitoring data.
  • the above selection of data sources does not limit the addition or replacement of other focused populations or sites in other embodiments as a source of data for monitoring.
  • hotels can be included in the disease surveillance area to obtain disease surveillance data for hotel residents.
  • the disease monitoring data collected by any type of monitoring point can constitute time series data of disease monitoring.
  • the disease monitoring data collected by the hospital can be taken to constitute time series data of disease monitoring.
  • the disease monitoring data collected by the plurality of types of monitoring points can be combined to form time series data of disease monitoring.
  • the disease monitoring data collected by the hospital can be mainly used, supplemented by the disease monitoring data participated by the pharmacy, and constitute time series data of disease monitoring.
  • the disease monitoring data may include disease data such as the number of visits to the disease, the rate of visits, the number of cases, and the incidence rate.
  • disease data such as the number of visits to the disease, the rate of visits, the number of cases, and the incidence rate.
  • the number of daily visits to a disease eg, flu
  • a medical institution eg, a hospital
  • the number of daily visits of the disease eg, flu
  • the daily incidence of a student's disease eg, influenza
  • influenza can be obtained from the school, and the daily incidence of the disease (eg, influenza) can be used as disease monitoring data.
  • time point 0 and the time point t represent the start time and the end time of the time series data, and are not limited to the time series data. Any point in time can be used as the time point 0.
  • the time window size w is an adjustable parameter and can be adjusted according to actual conditions.
  • the disease monitoring data is daily disease monitoring data (eg, the number of daily visits to the disease), and the time window size w can be taken as 7 (ie, one week).
  • the disease monitoring data is weekly disease monitoring data (eg, the number of weekly visits to the disease), and the time window size w can be taken as 3 (ie, three weeks).
  • the calculating unit 202 is further configured to calculate a moving-zscore value mz of the disease monitoring data x i corresponding to the time point i according to the mean value ⁇ i and the standard deviation ⁇ i of the disease monitoring data in the time window corresponding to each time point i i , get the mz list [mz w , mz w+1 , mz w+2 ,...,mz t ], where
  • the determining unit 203 is configured to determine an abnormal value threshold of the time series data X according to the mz list.
  • the preset quantile or the preset percentile of the mz list may be calculated, and the preset quantile or the preset percentile is used as the abnormal value threshold. For example, calculate the 0.979th quantile of the mz list and the 0.99th quantile of the mz list as the outlier threshold. As another example, the 0.5th quantile (ie, the median) of the mz list is calculated, and the 0.5th quantile (ie, the median) of the mz list is used as the abnormal value threshold.
  • the average number of mz lists can be calculated, and the average is used as an abnormal value threshold.
  • Determination unit 204 determines whether the moving-zscore disease surveillance data value is determined corresponding to the time point i x i is greater than the abnormality threshold value, moving-zscore disease surveillance data value if the time corresponding to point i x i is greater than the abnormality
  • the value threshold value indicates that the disease monitoring data x i corresponding to the time point i is an abnormal value.
  • the outlier threshold is 12, 9 disease surveillance data corresponding to the time point x moving-zscore 9 13 value, is greater than the abnormality threshold value, the monitoring data x 9 disease an abnormal value.
  • the disease surveillance data for 29 consecutive time points are:
  • the moving-zscore value mz i of the disease monitoring data x i obtaining a list of mz [mz w , mz w+1 , mz w+2 , . . . , mz t ]; determining an abnormality of the time series data X according to the mz list the threshold value; whether the moving-zscore value determination time point i corresponding disease surveillance data x i is greater than the abnormality threshold value, moving-zscore value if the time point i corresponding disease surveillance data x i is greater than the abnormality threshold value, Then, the disease monitoring data x i corresponding to the time point i is an abnormal value.
  • the second embodiment a time window is introduced, and the historical neighboring data with the most reference value is distributed as a reference, and the detection result is more reasonable. Due to the small amount of data in the time window, it is difficult for the data in the time window to satisfy the normal distribution, so the traditional zscore method cannot be used to detect the abnormal value in the disease monitoring data.
  • the second embodiment calculates the moving-zscore value of the plurality of disease monitoring data, ensures that the quantity is large enough to satisfy the condition of the normal distribution, and then determines the abnormal value threshold according to all the moving-zscore values, thereby detecting the disease monitoring data. Outliers. Therefore, the second embodiment has higher applicability for abnormal detection of disease monitoring data, and achieves efficient and accurate abnormality detection of disease monitoring data.
  • disease abnormality data detection may be performed in conjunction with other disease abnormality data detecting means (other disease abnormality data detecting means adopts a disease abnormality data detecting method different from that of the first embodiment).
  • the disease abnormality data detecting apparatus of the second embodiment may further include: a detecting unit, configured to acquire an abnormal value obtained by detecting the time series data by another disease abnormality data detecting device; and a comparing unit, configured to use the embodiment
  • the abnormal value obtained by the disease abnormality data detecting device of the second is compared with the abnormal value obtained by the other disease abnormal data detecting device, and the abnormal value obtained by the disease abnormal data detecting device of the second embodiment and the other abnormal data of the disease are compared.
  • the comparison result of the abnormal values obtained by the detecting device obtains the final abnormal value.
  • the disease abnormality data detecting device of the second embodiment performs the disease abnormal data detection on the time series data of the disease monitoring to obtain the first abnormal value, and uses the other disease abnormal data detecting device to perform the disease abnormality on the time series data of the disease monitoring.
  • the data detection obtains a second outlier, and the first outlier and the second outlier are compared to obtain a final outlier. If the first abnormal value and the second abnormal value are consistent, the first abnormal value/to the second abnormal value is used as the final abnormal value; otherwise, if the first abnormal value and the If the second abnormal value is inconsistent, the first abnormal value and the second abnormal value are discarded.
  • the disease abnormality data detecting device of the second embodiment performs the disease abnormal data detection on the time series data of the disease monitoring to obtain the first abnormal value, and uses the second disease abnormal data detecting device (the second disease abnormal data detecting device uses, for example, Grubbs detection method) performing disease abnormal data detection on the time series data of the disease monitoring to obtain a second abnormal value, and using the third disease abnormal data detecting device (the third disease abnormal data detecting device adopts, for example, the Yes detection method)
  • the time series data of the disease monitoring is detected by the disease abnormality data to obtain a third abnormal value, and the first abnormal value, the second abnormal value and the third abnormal value are compared to obtain a final abnormal value. If at least two outliers of the first outlier, the second outlier, and the third outlier are consistent, the consistent outlier is used as the final outlier.
  • the disease abnormality data detecting device is combined with other disease abnormality data detecting devices to obtain more accurate disease abnormal data detecting results.
  • the disease abnormality data detecting device of the second embodiment can be used to detect the disease abnormal data of the regional and regional hospitals respectively, and obtain the detection result for the region (ie, the regional abnormal value) and the detection for the regional hospitals.
  • the result (ie, the regional outpatient abnormal value), the abnormal value of the area and the abnormal value of the hospital in the area are compared, and the final abnormal value is obtained according to the comparison between the abnormal value of the area and the abnormal value of the hospital of the area.
  • test results for the region are consistent with the test results for the regional hospitals, the test results for the region/test results for the regional hospitals are used as the final test results; otherwise, if the test results for the region and the hospitals for the districts If the test results are inconsistent, the test results for the region and the test results for the regional hospitals are discarded.
  • the detection of disease abnormal data in the region is based on the time series data of the disease monitoring in the region (for example, the disease monitoring data collected by the disease control department of the region from the hospitals in each region) to obtain the regional abnormal value, and the regional hospitals are performed.
  • the disease abnormality data detection is based on the time series data of the disease monitoring of the hospitals in the area, and the abnormal values of the hospitals in the area are obtained.
  • the disease abnormality data detecting device of the second embodiment finds abnormal values from disease monitoring data (for example, the number of cases) in a certain period of time in the region, and finds out disease monitoring data in the same time period of the subordinate hospitals in the region. Outliers, taking the intersection of two dimensions as the final test result.
  • the number of cases in the districts of 2014-3-3, 2014-3-4, 2014-3-5, 2014-3-6 are 160, 250, 170, 180, respectively, and the subordinate hospitals in the area (for example, the cumulative number of visits to the disease in the area is the most The number of cases during this time was 130, 180, 125, 140. From the distribution of the number of cases in the area, the number of abnormal cases occurred on the day of 2014-3-4, and the number of abnormal cases appeared in the hospitals on this day. Therefore, it was found that the abnormality of the disease appeared on this day.
  • the number of cases in the districts of 2014-3-3, 2014-3-4, 2014-3-5, 2014-3-6 are 160, 210, 170, 180, respectively, and the hospitals in the area (such as the cumulative number of visits to the disease in the area)
  • the largest number of hospitals were 130, 140, 125, 140 during this time. From the distribution of the number of cases in the area, the number of abnormal cases occurred on the day of 2014-3-4, but the number of abnormal cases did not occur on the hospitals. This is because there are many hospitals in the area.
  • the number of cases in this area is the sum of the number of cases in all the hospitals of the subordinates. Some hospitals have a small increase on this day, which is not abnormal in the hospital dimension. But the sum of all small increases is unusual in the regional dimension.
  • the disease abnormal data detecting device of the second embodiment is used to detect the disease abnormal data of the regional and regional hospitals respectively, and the final test result is obtained according to the detection result of the region and the detection result of the regional hospital, so that a more accurate disease abnormality can be obtained. Data detection results.
  • the embodiment provides a non-volatile readable storage medium, where the non-readable storage medium stores computer readable instructions, and the computer readable instructions are implemented by the processor to implement the disease abnormal data detecting method.
  • Step 103 Calculate the moving-zscore value mz i of the disease monitoring data x i corresponding to the time point i according to the mean value ⁇ i and the standard deviation ⁇ i of the disease monitoring data in the time window corresponding to each time point i , to obtain mz List [mz w ,mz w+1 ,mz w+2 ,...,mz t ];
  • Step 104 Determine an abnormal value threshold of the time series data X according to the mz list.
  • Step 105 determines whether moving-zscore value of the time point i corresponding disease surveillance data x i is greater than the abnormality threshold value, if the moving-zscore value of the time point i corresponding disease surveillance data x i is greater than the abnormality threshold value, Then, the disease monitoring data x i corresponding to the time point i is an abnormal value.
  • the computer readable instructions when executed by the processor, implement the functions of the various modules/units of the apparatus embodiments described above, such as units 201-204 of FIG. 2:
  • the calculating unit 202 is further configured to calculate the moving-zscore of the disease monitoring data x i corresponding to the time point i according to the mean ⁇ i and the standard deviation ⁇ i of the disease monitoring data in the time window corresponding to each time point i
  • the value mz i gives the list of mz [mz w , mz w+1 , mz w+2 ,...,mz t ], where
  • a determining unit 203 configured to determine an abnormal value threshold of the time series data X according to the mz list
  • Determination unit 204 determines whether the moving-zscore disease surveillance data value is determined corresponding to the time point i x i is greater than the abnormality threshold value, moving-zscore disease surveillance data value if the time corresponding to point i x i is greater than the abnormality
  • the value threshold value indicates that the disease monitoring data x i corresponding to the time point i is an abnormal value.
  • FIG. 3 is a schematic diagram of a computer apparatus according to Embodiment 4 of the present application.
  • the computer device 1 includes a memory 20, a processor 30, and computer readable instructions 40 stored in the memory 20 and executable on the processor 30, such as a disease abnormality data detection program.
  • the processor 30 executes the computer readable instructions 40 to implement the steps in the above-described embodiment of the disease abnormality data detecting method, such as steps 101-105 shown in FIG.
  • the processor 30, when executing the computer readable instructions 40 implements the functions of the various modules/units of the apparatus embodiments described above, such as units 201-204 of FIG.
  • the computer readable instructions 40 may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30, To complete this application.
  • the one or more modules/units may be a series of computer readable instruction segments capable of performing a particular function for describing the execution of the computer readable instructions 40 in the computer device 1.
  • the computer readable instructions 40 may be divided into the obtaining unit 201, the calculating unit 202, the determining unit 203, and the determining unit 204 in FIG. 2, and the specific functions of each unit are as follows.
  • the computer device 1 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. It will be understood by those skilled in the art that the schematic diagram 3 is merely an example of the computer device 1, and does not constitute a limitation of the computer device 1, and may include more or less components than those illustrated, or may combine some components, or different.
  • the components, such as the computer device 1, may also include input and output devices, network access devices, buses, and the like.
  • the processor 30 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, and the processor 30 is a control center of the computer device 1, and connects the entire computer device 1 by using various interfaces and lines. Various parts.
  • the memory 20 can be used to store the computer readable instructions 40 and/or modules/units by running or executing computer readable instructions and/or modules/units stored in the memory 20, and The various functions of the computer device 1 are realized by calling data stored in the memory 20.
  • the memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be Data (such as audio data, phone book, etc.) created according to the use of the computer device 1 is stored.
  • the memory 20 may include a high-speed random access memory, and may also include a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (Secure Digital, SD).
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (Secure Digital, SD).
  • SMC smart memory card
  • SD Secure Digital
  • Card flash card, at least one disk storage device, flash device, or other volatile solid state storage device.
  • the modules/units integrated by the computer device 1 can be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the present application implements all or part of the processes in the foregoing embodiments, and may also be implemented by computer-readable instructions, which may be stored in a non-volatile manner. In reading a storage medium, the computer readable instructions, when executed by a processor, implement the steps of the various method embodiments described above. Wherein, the computer readable instructions comprise computer readable instruction code, which may be in the form of source code, an object code form, an executable file or some intermediate form or the like.
  • the computer readable medium can include any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash drive, a removable hard drive, a magnetic disk, an optical disk, a computer memory, a read only memory (ROM, Read-Only) Memory), random access memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media.
  • a recording medium a USB flash drive
  • a removable hard drive a magnetic disk, an optical disk
  • a computer memory a read only memory (ROM, Read-Only) Memory
  • RAM random access memory
  • electrical carrier signals telecommunications signals
  • software distribution media e.g., software distribution media.
  • each functional unit in each embodiment of the present application may be integrated in the same processing unit, or each unit may exist physically separately, or two or more units may be integrated in the same unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software function modules.

Landscapes

  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种疾病异常数据检测方法,所述方法包括:获取时间点0到时间点t的疾病监测数据,构成时间序列数据(101);选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值和标准差(102);根据每个时间点i对应的时间窗内疾病监测数据的均值和标准差,计算所述时间点i对应的疾病监测数据的moving-zscore值,得到mz列表(103);根据所述mz列表确定所述时间序列数据的异常值阈值(104);若时间点i对应的疾病监测数据的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据为异常值(105)。还提供一种疾病异常数据检测装置、计算机装置及可读存储介质。可以实现高效准确的疾病监测数据异常检测。

Description

疾病异常数据检测方法及装置、计算机装置及存储介质
本申请要求于2018年04月11日提交中国专利局,申请号为201810321852.9发明名称为“疾病异常数据检测方法及装置、计算机装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,具体涉及一种疾病异常数据检测方法及装置、计算机装置和非易失性可读存储介质。
背景技术
随着全球经济一体化进程的加快,经济与交流活动增加,人群流动日益频繁,为疾病的传播与爆发提供了有利环境,公共卫生健康问题越来越严峻。同时,社会与自然环境也发生着变化,环境污染、自然灾害等影响公众健康事件的增多也增加了突发公共卫生事件爆发的可能性。
如何检测出疾病异常数据,以便能够早期识别到疾病流行或爆发的突发公共卫生事件,尽早采取相应的控制措施,将突发公共卫生事件造成的损失降到最低,成为亟待解决的问题。
现有的异常检测方法,例如zscore异常检测方法、格拉布斯(Grubbs)异常检测方法都要求数据满足正态分布,而实际上很多时候无法满足这个要求。对于传统的四分位法,会用到所有的数据,而过去久远的数据对当前数据的参考价值低,结果更容易出现偏差。
发明内容
鉴于以上内容,有必要提出一种疾病异常数据检测方法及装置、计算机装置和非易失性可读存储介质,其可以实现高效准确的疾病监测数据异常检测。
本申请的第一方面提供一种疾病异常数据检测方法,所述方法包括:
获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];
选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;
根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算 所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
Figure PCTCN2018099610-appb-000001
根据所述mz列表确定所述时间序列数据X的异常值阈值;
判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
本申请的第二方面提供一种疾病异常数据检测装置,所述装置包括:
获取单元,用于获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];
计算单元,用于选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;
所述计算单元,还用于根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
Figure PCTCN2018099610-appb-000002
确定单元,用于根据所述mz列表确定所述时间序列数据X的异常值阈值;
判断单元,用于判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
本申请的第三方面提供一种计算机装置,所述计算机装置包括存储器和处理器,所述存储器存储至少一条计算机可读指令,所述处理器执行所述至少一条计算机可读指令以实现所述疾病异常数据检测方法。
本申请的第四方面提供一种非易失性可读存储介质,所述非易失性可读存储介质上存储有至少一条计算机可读指令,其特征在于,所述至少一条计算机可读指令被处理器执行时实现所述疾病异常数据检测方法。
本申请获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t];根据所述mz列表确定所述时间序列数据X的异常值阈值;判断时间点i对应的疾病监测数据x i的moving-zscore值是否大 于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
本申请引入了时间窗,将最具有参考价值的历史邻近数据分布来作为参考,检测结果更加合理。由于时间窗内数据量较少,导致时间窗内的数据很难满足正态分布,因而不能采用传统的zscore方法检测疾病监测数据中的异常值。本申请计算多个疾病监测数据的moving-zscore值,保证了数量足够大,能够满足正态分布的条件,然后再根据所有moving-zscore值确定异常值阈值,从而检测出疾病监测数据中的异常值。因此,本申请对于疾病监测数据的异常检测具有更高的可适用性,实现了高效准确的疾病监测数据异常检测。
附图说明
图1是本申请实施例一提供的疾病异常数据检测方法的流程图。
图2是本申请实施例二提供的疾病异常数据检测装置的结构图。
图3是本申请实施例四提供的计算机装置的示意图。
具体实施方式
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。
优选地,本申请的疾病异常数据检测方法应用在一个或者多个计算机装置中。所述计算机装置是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机装置可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机装置可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
实施例一
图1是本申请实施例一提供的疾病异常数据检测方法的流程图。所述疾病异常数据检测方法应用于计算机装置。所述疾病异常数据检测方法检测疾病监测数据中的异常值,以便能够早期识别到疾病流行或爆发的突发公共卫生事件,尽早采取相应的控制措施,将突发公共卫生事件造成的损失降到最低。
如图1所示,所述疾病异常数据检测方法具体包括以下步骤:
步骤101,获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t]。
所述疾病监测数据可以包括流感、手足口病、麻疹、流行性腮腺炎等疾病的监测数据。
可以在预设区域(例如省市、地区)建立由多个监测点组成的疾病监测网络,从所述监测点获取所述疾病监测数据。可以选择医疗机构、学校和幼托机构、药店等作为监测点,分别对相应的目标人群进行疾病监测及数据采集。可以选择满足预设条件的场所作为监测点。所述预设条件可以包括人数、规模等。例如,选择学生人数达到预设数量的学校和幼托机构作为监控点。又如,选择规模(例如以日营业额统计)达到预设规模的药店作为监控点。再如,选择规模(例如以日就医人数统计)达到预设规模的医院作为监控点。
不同时间的疾病监测数据构成疾病监测的时间序列数据。例如,可以将以日为单位采集到的疾病监测数据构成疾病监测的时间序列数据。或者,可以将以周为单位采集到的疾病监测数据构成疾病监测的时间序列数据。
医疗机构(主要包括医院)是最能捕捉疾病早期暴发预兆的场所,是开展疾病监测的首选。可以根据病人就诊情况,获取疾病监测数据。
一部分疾病人会自行去药店购药来缓解早期症状,因此,可以根据药店的药品销售情况,获取疾病监测数据。
儿童和青少年是疾病的高危人群以及疾病传播过程中的重要环节,也应该加强对该人群的监测。学校和幼托机构是监测儿童和青少年疾病发病情况的较佳场所。可以根据学校和幼托机构的儿童和青少年的请假情况,获得疾病监测数据。
因此,本申请中主要选择医疗机构、学校和幼托机构、药店这三类场所进行疾病监测数据的采集。当然,上述对数据源的选择,并不能限制在另外的实施方案中增加或替换其他重点关注人群或场所作为监测的数据源。例如,可以将宾馆纳入疾病监测范围,获取宾馆入住人员的疾病监测数据。
根据需要,可以取任意一类监控点(例如医疗机构)采集的疾病监测数据构成疾病监测的时间序列数据。例如,可以取医院采集的疾病监测数据构成疾病监测的时间序列数据。或者,可以结合多类监控点采集的疾病监测数据构成疾病监测的时间序列数据。例如,可以以医院采集的疾病监测数据为 主,以药店参加的疾病监测数据作为补充,构成疾病监测的时间序列数据。
疾病监测数据可以包括疾病的就诊数、就诊率、发病数、发病率等患病数据。例如,可以从医疗机构(例如医院)获取疾病(例如流感)的每日就诊数,将疾病(例如流感)的每日就诊数作为疾病监测数据。又如,可以从学校获取学生的疾病(例如流感)的每日发病数,将疾病(例如流感)的每日发病数作为疾病监测数据。
需要说明的是,时间点0和时间点t表示时间序列数据的起始时间和终止时间,并非对时间序列数据时间上的限定。可以将任意时间点作为时间点0。
步骤102,选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t。
时间窗为给定时间点之前邻近的一段时间。例如,设时间窗大小w为4,给定时间点为t=10,时间窗就是t=6到t=9这段时间;给定时间点为t=11,时间窗就是t=7到t=11这段时间;依此类推。
时间窗大小w是可调参数,可以根据实际情况进行调节。
在一实施例中,所述疾病监测数据为每日的疾病监测数据(例如疾病的每日就诊数),所述时间窗大小w可以取为7(即一周)。
在另一实施例中,所述疾病监测数据为每周的疾病监测数据(例如疾病的每周就诊数),所述时间窗大小w可以取为3(即三周)。
步骤103,根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
Figure PCTCN2018099610-appb-000003
步骤102-103从时间点w开始计算各个时间点对应的时间窗内疾病监测数据的均值和标准差,以及从时间点w开始计算各个时间点对应的疾病监测数据的moving-zscore值。这是因为从t=0到t=w-1这段时间内得到的时间窗的长度小于w,没有足够的w个疾病监测数据来计算疾病监测数据的均值和标准差。
步骤104,根据所述mz列表确定所述时间序列数据X的异常值阈值。
可以计算所述mz列表的预设分位数或预设百分位数,将所述预设分位数或预设百分位数作为所述异常值阈值。例如,计算mz列表的第0.997分位数,将mz列表的第0.997分位数作为异常值阈值。又如,计算mz列表的第0.5分位数(即中位数),将mz列表的第0.5分位数(即中位数)作为异常值阈值。
或者,可以计算mz列表的平均数,将所述平均数作为异常值阈值。
步骤105,判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
例如,异常值阈值为12,时间点9对应的疾病监测数据x 9的moving-zscore值为13,大于所述异常值阈值,则疾病监测数据x 9为异常值。
举例说明:
29个连续时间点的疾病监测数据为:
[3,3,4,5,4,6,7,6,7,8,9,8,17,8,7,9,10,11,12,12,13,12,14,14,15,16,16,17,16]。
取时间窗大小w=4,也就是从第5个数4开始计算moving-zscore值,对于第5个数,其时间窗内的数为[3,3,4,5],均值为3.75,标准差为0.829,则moving-zscore值为(4-3.75)/0.829=0.302(四舍五入保留小数点后三位)。
按照同样的方法,依次计算出第5个数后面所有的moving-zscore值,得到mz列表:
[0.302,2.828,2.713,0.447,1.147,3.0,2.828,0.447,12.728,0.662,0.927,0.246,0.063,2.236,1.859,1.342,2.110,0.0,4.041,1.508,2.110,2.065,1.508,2.111,0.0]。
再取mz列表的0.995分位数作为异常值阈值,得到11.685。
列表中只有12.728是大于这个异常值阈值的,12.728对应的疾病监测数据17,所以17就是异常值。
实施例一的疾病异常数据检测方法获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t];根据所述mz列表确定所述时间序列数据X的异常值阈值;判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
实施例一引入了时间窗,将最具有参考价值的历史邻近数据分布来作为参考,检测结果更加合理。由于时间窗内数据量较少,导致时间窗内的数据很难满足正态分布,因而不能采用传统的zscore方法检测疾病监测数据中的异常值。实施例一计算多个疾病监测数据的moving-zscore值,保证了数量足够大,能够满足正态分布的条件,然后再根据所有moving-zscore值确定异常值阈值,从而检测出疾病监测数据中的异常值。因此,实施例一对于疾病监 测数据的异常检测具有更高的可适用性,实现了高效准确的疾病监测数据异常检测。
在另一实施例中,可以结合其他的疾病异常数据检测方法(不同于实施例一的疾病异常数据检测方法)进行疾病异常数据检测。具体地,还可以包括以下步骤:获取其他的疾病异常数据检测方法对所述时间序列数据进行检测得到的异常值;将实施例一的疾病异常数据检测方法得到的异常值与其他的疾病异常数据检测方法得到的异常值进行比较;根据实施例一的疾病异常数据检测方法得到的异常值与其他的疾病异常数据检测方法得到的异常值的比较结果得到最终的异常值。其他的疾病异常数据检测方法可以包括一种,也可以包括多种。每种其他的疾病异常数据检测方法可以得到一组对应的异常值。
例如,采用实施例一的疾病异常数据检测方法对疾病监测的时间序列数据进行疾病异常数据检测得到第一异常值,采用与实施例一的疾病异常数据检测方法不同的第二疾病异常数据检测方法(例如Grubbs检测方法)对所述疾病监测的时间序列数据进行疾病异常数据检测得到第二异常值,比较所述第一异常值和所述到第二异常值得到最终的异常值。若所述第一异常值和所述到第二异常值一致,则以所述第一异常值/到第二异常值作为最终的异常值;否则,若所述第一异常值和所述到第二异常值不一致,则舍弃所述第一异常值与所述第二异常值。
又如,采用实施例一的疾病异常数据检测方法对疾病监测的时间序列数据进行疾病异常数据检测得到第一异常值,采用与实施例一的疾病异常数据检测方法不同的第二疾病异常数据检测方法(例如Grubbs检测方法)对所述疾病监测的时间序列数据进行疾病异常数据检测得到第二异常值,采用与实施例一的疾病异常数据检测方法不同的第三疾病异常数据检测方法(例如贝叶斯检测方法)对所述疾病监测的时间序列数据进行疾病异常数据检测得到第三异常值,比较所述第一异常值、所述第二异常值和所述第三异常值得到最终的异常值。若所述第一异常值、所述第二异常值和所述第三异常值中至少两个异常值一致,则以一致的异常值作为最终的异常值。
在实施例一的疾病异常数据检测方法的基础上,结合其他的疾病异常数据检测方法进行疾病异常数据检测,可以获得更加准确的疾病异常数据检测结果。
在另一实施例中,可以采用实施例一的疾病异常数据检测方法分别对地区和地区下属医院进行疾病异常数据检测,得到针对地区的检测结果(即地区异常值)和针对地区下属医院的检测结果(即地区下属医院异常值),比较所述地区异常值和所述地区下属医院异常值,根据所述地区异常值和所述地区下属医院异常值的比较结果得到最终的异常值。若针对地区的检测结果 和针对地区下属医院的检测结果一致,则以针对地区的检测结果/针对地区下属医院的检测结果作为最终的检测结果;否则,若针对地区的检测结果和针对地区下属医院的检测结果不一致,则舍弃所述针对地区的检测结果和针对地区下属医院的检测结果。
其中,对地区进行疾病异常数据检测是根据所述地区的疾病监测的时间序列数据(例如地区的疾控部门从各个地区下属医院收集到的疾病监测数据)得到地区异常值,对地区下属医院进行疾病异常数据检测是根据所述地区下属医院的疾病监测的时间序列数据得到所述地区下属医院的异常值。
具体来说,按照实施例一的疾病异常数据检测方法,从地区某时间段内的疾病监测数据(例如病例数)中找出异常值,从该地区下属医院同样时间段内的疾病监测数据中找出异常值,取两个维度的交集来作为最终的检测结果。
例如,地区在2014-3-3,2014-3-4,2014-3-5,2014-3-6病例数分别为160,250,170,180,该地区下属医院(例如该地区下该疾病累计就诊次数最多的医院)在这段时间的病例数分别为130,180,125,140。从地区病例数分布来看,在2014-3-4这天出现了异常病例数,同时下属医院在这天也出现了异常病例数,所以综合判定在这天出现了疾病就诊异常的现象。
又如,地区在2014-3-3,2014-3-4,2014-3-5,2014-3-6病例数分别为160,210,170,180,该地区下属医院(例如该地区下该疾病累计就诊次数最多的医院)在这段时间病例数分别为130,140,125,140。从地区病例数分布来看,在2014-3-4这天出现了异常病例数,但是下属医院在这天并未出现异常病例数。这是由于地区下有很多家医院,地区在这一天的就诊病例数是由下属所有的医院的病例数的总和,部分医院在这一天有一个小幅度的增长,在医院维度上并不是异常,但是所有的小幅度增长后的总和在地区维度上就表现为异常了。
采用实施例一的疾病异常数据检测方法分别对地区和地区下属医院进行疾病异常数据检测,根据针对地区的检测结果和针对地区下属医院的检测结果得到最终的检测结果,可以获得更加准确的疾病异常数据检测结果。
实施例二
图2为本申请实施例二提供的疾病异常数据检测装置的结构图。如图2所示,所述疾病异常数据检测装置10可以包括:获取单元201、计算单元202、确定单元203、判断单元204。
获取单元201,用于获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x i]。
所述疾病监测数据可以包括流感、手足口病、麻疹、流行性腮腺炎等疾病的监测数据。
可以在预设区域(例如省市、地区)建立由多个监测点组成的疾病监测网络,从所述监测点获取所述疾病监测数据,由所述疾病监测数据构成构成疾病监测的时间序列数据。可以选择医疗机构、学校和幼托机构、药店等作为监测点,分别对相应的目标人群进行疾病监测及数据采集。可以选择满足预设条件的场所作为监测点。所述预设条件可以包括人数、规模等。例如,选择学生人数达到预设数量的学校和幼托机构作为监控点。又如,选择规模(例如以日营业额统计)达到预设规模的药店作为监控点。再如,选择规模(例如以日就医人数统计)达到预设规模的医院作为监控点。
不同时间的疾病监测数据构成疾病监测的时间序列数据。例如,可以将以日为单位采集到的疾病监测数据构成疾病监测的时间序列数据。或者,可以将以周为单位采集到的疾病监测数据构成疾病监测的时间序列数据。
医疗机构(主要包括医院)是最能捕捉疾病早期暴发预兆的场所,是开展疾病监测的首选。可以根据病人就诊情况,获取疾病监测数据。
一部分疾病人会自行去药店购药来缓解早期症状,因此,可以根据药店的药品销售情况,获取疾病监测数据。
儿童和青少年是疾病的高危人群以及疾病传播过程中的重要环节,也应该加强对该人群的监测。学校和幼托机构是监测儿童和青少年疾病发病情况的较佳场所。可以根据学校和幼托机构的儿童和青少年的请假情况,获得疾病监测数据。
因此,本申请中主要选择医疗机构、学校和幼托机构、药店这三类场所进行疾病监测数据的采集。当然,上述对数据源的选择,并不能限制在另外的实施方案中增加或替换其他重点关注人群或场所作为监测的数据源。例如,可以将宾馆纳入疾病监测范围,获取宾馆入住人员的疾病监测数据。
根据需要,可以取任意一类监控点(例如医疗机构)采集的疾病监测数据构成疾病监测的时间序列数据。例如,可以取医院采集的疾病监测数据构成疾病监测的时间序列数据。或者,可以结合多类监控点采集的疾病监测数据构成疾病监测的时间序列数据。例如,可以以医院采集的疾病监测数据为主,以药店参加的疾病监测数据作为补充,构成疾病监测的时间序列数据。
疾病监测数据可以包括疾病的就诊数、就诊率、发病数、发病率等患病数据。例如,可以从医疗机构(例如医院)获取疾病(例如流感)的每日就诊数,将疾病(例如流感)的每日就诊数作为疾病监测数据。又如,可以从学校获取学生的疾病(例如流感)的每日发病数,将疾病(例如流感)的每日发病数作为疾病监测数据。
需要说明的是,时间点0和时间点t表示时间序列数据的起始时间和终止时间,并非对时间序列数据时间上的限定。可以将任意时间点作为时间点0。
计算单元202,用于选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t。
时间窗为给定时间点之前邻近的一段时间。例如,设时间窗大小w为4,给定时间点为t=10,时间窗就是t=6到t=9这段时间;给定时间点为t=11,时间窗就是t=7到t=11这段时间;依此类推。
时间窗大小w是可调参数,可以根据实际情况进行调节。
在一实施例中,所述疾病监测数据为每日的疾病监测数据(例如疾病的每日就诊数),所述时间窗大小w可以取为7(即一周)。
在另一实施例中,所述疾病监测数据为每周的疾病监测数据(例如疾病的每周就诊数),所述时间窗大小w可以取为3(即三周)。
计算单元202,还用于根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
Figure PCTCN2018099610-appb-000004
计算单元202从时间点w开始计算各个时间点对应的时间窗内疾病监测数据的均值和标准差,以及从时间点w开始计算各个时间点对应的疾病监测数据的moving-zscore值。这是因为从t=0到t=w-1这段时间内得到的时间窗的长度小于w,没有足够的w个疾病监测数据来计算疾病监测数据的均值和标准差。
确定单元203,用于根据所述mz列表确定所述时间序列数据X的异常值阈值。
可以计算所述mz列表的预设分位数或预设百分位数,将所述预设分位数或预设百分位数作为所述异常值阈值。例如,计算mz列表的第0.997分位数,将mz列表的第0.997分位数作为异常值阈值。又如,计算mz列表的第0.5分位数(即中位数),将mz列表的第0.5分位数(即中位数)作为异常值阈值。
或者,可以计算mz列表的平均数,将所述平均数作为异常值阈值。
判断单元204,用于判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
例如,异常值阈值为12,时间点9对应的疾病监测数据x 9的moving-zscore值为13,大于所述异常值阈值,则疾病监测数据x 9为异常值。
举例说明:
29个连续时间点的疾病监测数据为:
[3,3,4,5,4,6,7,6,7,8,9,8,17,8,7,9,10,11,12,12,13,12,14,14,15,16,16,17,16]。
取时间窗大小w=4,也就是从第5个数4开始计算moving-zscore值,对于第5个数,其时间窗内的数为[3,3,4,5],均值为3.75,标准差为0.829,则moving-zscore值为(4-3.75)/0.829=0.302(四舍五入保留小数点后三位)。
按照同样的方法,依次计算出第5个数后面所有的moving-zscore值,得到mz列表:
[0.302,2.828,2.713,0.447,1.147,3.0,2.828,0.447,12.728,0.662,0.927,0.246,0.063,2.236,1.859,1.342,2.110,0.0,4.041,1.508,2.110,2.065,1.508,2.111,0.0]。
再取mz列表的0.995分位数作为异常值阈值,得到11.685。
列表中只有12.728是大于这个异常值阈值的,12.728对应的疾病监测数据17,所以17就是异常值。
实施例二的疾病异常数据检测装置获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t];根据所述mz列表确定所述时间序列数据X的异常值阈值;判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
实施例二引入了时间窗,将最具有参考价值的历史邻近数据分布来作为参考,检测结果更加合理。由于时间窗内数据量较少,导致时间窗内的数据很难满足正态分布,因而不能采用传统的zscore方法检测疾病监测数据中的异常值。实施例二计算多个疾病监测数据的moving-zscore值,保证了数量足够大,能够满足正态分布的条件,然后再根据所有moving-zscore值确定异常值阈值,从而检测出疾病监测数据中的异常值。因此,实施例二对于疾病监测数据的异常检测具有更高的可适用性,实现了高效准确的疾病监测数据异常检测。
在另一实施例中,可以结合其他的疾病异常数据检测装置(其他的疾病异常数据检测装置采用不同于实施例一的疾病异常数据检测方法)进行疾病异常数据检测。具体地,实施例二的疾病异常数据检测装置还可以包括:检测单元,用于获取其他的疾病异常数据检测装置对所述时间序列数据进行检测得到的异常值;比较单元,用于将实施例二的疾病异常数据检测装置得到 的异常值与所述其他的疾病异常数据检测装置得到的异常值进行比较,根据实施例二的疾病异常数据检测装置得到的异常值与所述其他的疾病异常数据检测装置得到的异常值的比较结果得到最终的异常值。
例如,采用实施例二的疾病异常数据检测装置对疾病监测的时间序列数据进行疾病异常数据检测得到第一异常值,采用其他的疾病异常数据检测装置对所述疾病监测的时间序列数据进行疾病异常数据检测得到第二异常值,比较所述第一异常值和所述到第二异常值得到最终的异常值。若所述第一异常值和所述到第二异常值一致,则以所述第一异常值/到第二异常值作为最终的异常值;否则,若所述第一异常值和所述到第二异常值不一致,则舍弃所述第一异常值与所述第二异常值。
又如,采用实施例二的疾病异常数据检测装置对疾病监测的时间序列数据进行疾病异常数据检测得到第一异常值,采用第二疾病异常数据检测装置(该第二疾病异常数据检测装置例如采用Grubbs检测方法)对所述疾病监测的时间序列数据进行疾病异常数据检测得到第二异常值,采用第三疾病异常数据检测装置(该第三疾病异常数据检测装置例如采用叶斯检测方法)对所述疾病监测的时间序列数据进行疾病异常数据检测得到第三异常值,比较所述第一异常值、所述第二异常值和所述第三异常值得到最终的异常值。若所述第一异常值、所述第二异常值和所述第三异常值中至少两个异常值一致,则以一致的异常值作为最终的异常值。
在实施例一的疾病异常数据检测装置的基础上,结合其他的疾病异常数据检测装置进行疾病异常数据检测,可以获得更加准确的疾病异常数据检测结果。
在另一实施例中,可以采用实施例二的疾病异常数据检测装置分别对地区和地区下属医院进行疾病异常数据检测,得到针对地区的检测结果(即地区异常值)和针对地区下属医院的检测结果(即地区下属医院异常值),比较所述地区异常值和所述地区下属医院异常值,根据所述地区异常值和所述地区下属医院异常值的比较结果得到最终的异常值。若针对地区的检测结果和针对地区下属医院的检测结果一致,则以针对地区的检测结果/针对地区下属医院的检测结果作为最终的检测结果;否则,若针对地区的检测结果和针对地区下属医院的检测结果不一致,则舍弃所述针对地区的检测结果和针对地区下属医院的检测结果。
其中,对地区进行疾病异常数据检测是根据所述地区的疾病监测的时间序列数据(例如地区的疾控部门从各个地区下属医院收集到的疾病监测数据)得到地区异常值,对地区下属医院进行疾病异常数据检测是根据所述地区下属医院的疾病监测的时间序列数据得到所述地区下属医院的异常值。
具体来说,实施例二的疾病异常数据检测装置从地区某时间段内的疾病 监测数据(例如病例数)中找出异常值,从该地区下属医院同样时间段内的疾病监测数据中找出异常值,取两个维度的交集来作为最终的检测结果。
例如,地区在2014-3-3,2014-3-4,2014-3-5,2014-3-6病例数分别为160,250,170,180,该地区下属医院(例如该地区下该疾病累计就诊次数最多的医院)在这段时间的病例数分别为130,180,125,140。从地区病例数分布来看,在2014-3-4这天出现了异常病例数,同时下属医院在这天也出现了异常病例数,所以综合判定在这天出现了疾病就诊异常的现象。
又如,地区在2014-3-3,2014-3-4,2014-3-5,2014-3-6病例数分别为160,210,170,180,该地区下属医院(例如该地区下该疾病累计就诊次数最多的医院)在这段时间病例数分别为130,140,125,140。从地区病例数分布来看,在2014-3-4这天出现了异常病例数,但是下属医院在这天并未出现异常病例数。这是由于地区下有很多家医院,地区在这一天的就诊病例数是由下属所有的医院的病例数的总和,部分医院在这一天有一个小幅度的增长,在医院维度上并不是异常,但是所有的小幅度增长后的总和在地区维度上就表现为异常了。
采用实施例二的疾病异常数据检测装置分别对地区和地区下属医院进行疾病异常数据检测,根据针对地区的检测结果和针对地区下属医院的检测结果得到最终的检测结果,可以获得更加准确的疾病异常数据检测结果。
实施例三
本实施例提供一种非易失性可读存储介质,该非易失性可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述疾病异常数据检测方法实施例中的步骤,例如图1所示的步骤101-105:
步骤101,获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];
步骤102,选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;
步骤103,根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t];
步骤104,根据所述mz列表确定所述时间序列数据X的异常值阈值;
步骤105,判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
或者,该计算机可读指令被处理器执行时实现上述装置实施例中各模块/单元的功能,例如图2中的单元201-204:
获取单元201,用于获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];
计算单元202,用于选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;
所述计算单元202,还用于根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
Figure PCTCN2018099610-appb-000005
确定单元203,用于根据所述mz列表确定所述时间序列数据X的异常值阈值;
判断单元204,用于判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
实施例四
图3为本申请实施例四提供的计算机装置的示意图。所述计算机装置1包括存储器20、处理器30以及存储在所述存储器20中并可在所述处理器30上运行的计算机可读指令40,例如疾病异常数据检测程序。所述处理器30执行所述计算机可读指令40时实现上述疾病异常数据检测方法实施例中的步骤,例如图1所示的步骤101-105。或者,所述处理器30执行所述计算机可读指令40时实现上述装置实施例中各模块/单元的功能,例如图2中的单元201-204。
示例性的,所述计算机可读指令40可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器20中,并由所述处理器30执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令40在所述计算机装置1中的执行过程。例如,所述计算机可读指令40可以被分割成图2中的获取单元201、计算单元202、确定单元203、判断单元204,各单元具体功能参见实施例二。
所述计算机装置1可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图3仅仅是计算机装置1的示例,并不构成对计算机装置1的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机装置1还可以包括输入输出设备、网络接入设备、总线等。
所称处理器30可以是中央处理单元(Central Processing Unit,CPU),还 可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器30也可以是任何常规的处理器等,所述处理器30是所述计算机装置1的控制中心,利用各种接口和线路连接整个计算机装置1的各个部分。
所述存储器20可用于存储所述计算机可读指令40和/或模块/单元,所述处理器30通过运行或执行存储在所述存储器20内的计算机可读指令和/或模块/单元,以及调用存储在存储器20内的数据,实现所述计算机装置1的各种功能。所述存储器20可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机装置1的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器20可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述计算机装置1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性可读存储介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
在本申请所提供的几个实施例中,应该理解到,所揭露的计算机装置和方法,可以通过其它的方式实现。例如,以上所描述的计算机装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
另外,在本申请各个实施例中的各功能单元可以集成在相同处理单元中, 也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。计算机装置权利要求中陈述的多个单元或计算机装置也可以由同一个单元或计算机装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种疾病异常数据检测方法,其特征在于,所述方法包括:
    获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];
    选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;
    根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
    Figure PCTCN2018099610-appb-100001
    根据所述mz列表确定所述时间序列数据X的异常值阈值;
    判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
  2. 如权利要求1所述的方法,其特征在于,所述根据所述mz列表确定所述时间序列数据X的异常值阈值包括:
    计算所述mz列表的预设分位数或预设百分位数,将所述预设分位数或预设百分位数作为所述异常值阈值;或者
    计算所述mz列表的平均数,将所述平均数作为所述异常值阈值。
  3. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    获取其他的疾病异常数据检测方法对所述时间序列数据进行检测得到的异常值;
    将所述疾病异常数据检测方法得到的异常值与所述其他的疾病异常数据检测方法得到的异常值进行比较;
    根据所述疾病异常数据检测方法得到的异常值与所述其他的疾病异常数据检测方法得到的异常值的比较结果得到最终的异常值。
  4. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    分别对地区和地区下属医院进行疾病异常数据检测,得到针对地区的地区异常值和针对地区下属医院的地区下属医院异常值;
    比较所述地区异常值和所述地区下属医院异常值,根据所述地区异常值 和所述地区下属医院异常值的比较结果得到最终的异常值。
  5. 如权利要求1-4中任一项所述的方法,其特征在于,所述疾病监测数据包括疾病的就诊数、就诊率、发病数、发病率。
  6. 如权利要求1-4中任一项所述的方法,其特征在于,所述获取时间点0到时间点t的疾病监测数据包括:
    在预设区域建立由多个监测点组成的疾病监测网络,从所述监测点获取所述疾病监测数据。
  7. 如权利要求6所述的方法,其特征在于,所述监测点包括满足预设人数或规模的医疗机构、学校和幼托机构、药店。
  8. 一种疾病异常数据检测装置,其特征在于,所述装置包括:
    获取单元,用于获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];
    计算单元,用于选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;
    所述计算单元,还用于根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
    Figure PCTCN2018099610-appb-100002
    确定单元,用于根据所述mz列表确定所述时间序列数据X的异常值阈值;
    判断单元,用于判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
  9. 一种计算机装置,其特征在于,所述计算机装置包括存储器和处理器,所述存储器存储至少一条计算机可读指令,所述处理器执行所述至少一条计算机可读指令以实现以下步骤:
    获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];
    选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;
    根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算 所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
    Figure PCTCN2018099610-appb-100003
    根据所述mz列表确定所述时间序列数据X的异常值阈值;
    判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
  10. 如权利要求9所述的计算机装置,其特征在于,所述根据所述mz列表确定所述时间序列数据X的异常值阈值包括:
    计算所述mz列表的预设分位数或预设百分位数,将所述预设分位数或预设百分位数作为所述异常值阈值;或者
    计算所述mz列表的平均数,将所述平均数作为所述异常值阈值。
  11. 如权利要求9所述的计算机装置,其特征在于,所述处理器还用于执行所述至少一条计算机可读指令以实现以下步骤:
    获取其他的疾病异常数据检测方法对所述时间序列数据进行检测得到的异常值;
    将所述疾病异常数据检测方法得到的异常值与所述其他的疾病异常数据检测方法得到的异常值进行比较;
    根据所述疾病异常数据检测方法得到的异常值与所述其他的疾病异常数据检测方法得到的异常值的比较结果得到最终的异常值。
  12. 如权利要求9所述的计算机装置,其特征在于,所述处理器还用于执行所述至少一条计算机可读指令以实现以下步骤:
    分别对地区和地区下属医院进行疾病异常数据检测,得到针对地区的地区异常值和针对地区下属医院的地区下属医院异常值;
    比较所述地区异常值和所述地区下属医院异常值,根据所述地区异常值和所述地区下属医院异常值的比较结果得到最终的异常值。
  13. 如权利要求9-12中任一项所述的计算机装置,其特征在于,所述疾病监测数据包括疾病的就诊数、就诊率、发病数、发病率。
  14. 如权利要求9-12中任一项所述的计算机装置,其特征在于,所述获取时间点0到时间点t的疾病监测数据包括:
    在预设区域建立由多个监测点组成的疾病监测网络,从所述监测点获取所述疾病监测数据。
  15. 一种非易失性可读存储介质,所述非易失性可读存储介质上存储有至少一条计算机可读指令,其特征在于,所述至少一条计算机可读指令被处理器执行时实现以下步骤:
    获取时间点0到时间点t的疾病监测数据,构成时间序列数据X,X=[x 0,x 1,x 2,…,x t];
    选取时间窗大小w,针对时间点w至时间点t中的每个时间点i,计算所述时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,所述时间点i对应的时间窗的大小为w,i=w,w+1,…,t;
    根据每个时间点i对应的时间窗内疾病监测数据的均值μ i和标准差σ i,计算所述时间点i对应的疾病监测数据x i的moving-zscore值mz i,得到mz列表[mz w,mz w+1,mz w+2,…,mz t],其中
    Figure PCTCN2018099610-appb-100004
    根据所述mz列表确定所述时间序列数据X的异常值阈值;
    判断时间点i对应的疾病监测数据x i的moving-zscore值是否大于所述异常值阈值,若时间点i对应的疾病监测数据x i的moving-zscore值大于所述异常值阈值,则所述时间点i对应的疾病监测数据x i为异常值。
  16. 如权利要求15所述的存储介质,其特征在于,所述根据所述mz列表确定所述时间序列数据X的异常值阈值包括:
    计算所述mz列表的预设分位数或预设百分位数,将所述预设分位数或预设百分位数作为所述异常值阈值;或者
    计算所述mz列表的平均数,将所述平均数作为所述异常值阈值。
  17. 如权利要求15所述的存储介质,其特征在于,所述至少一条计算机可读指令被所述处理器执行时还实现以下步骤:
    获取其他的疾病异常数据检测方法对所述时间序列数据进行检测得到的异常值;
    将所述疾病异常数据检测方法得到的异常值与所述其他的疾病异常数据检测方法得到的异常值进行比较;
    根据所述疾病异常数据检测方法得到的异常值与所述其他的疾病异常数据检测方法得到的异常值的比较结果得到最终的异常值。
  18. 如权利要求15所述的存储介质,其特征在于,所述至少一条计算机可读指令被所述处理器执行时还实现以下步骤:
    分别对地区和地区下属医院进行疾病异常数据检测,得到针对地区的地 区异常值和针对地区下属医院的地区下属医院异常值;
    比较所述地区异常值和所述地区下属医院异常值,根据所述地区异常值和所述地区下属医院异常值的比较结果得到最终的异常值。
  19. 如权利要求15-18中任一项所述的存储介质,其特征在于,所述疾病监测数据包括疾病的就诊数、就诊率、发病数、发病率。
  20. 如权利要求15-18中任一项所述的存储介质,其特征在于,所述获取时间点0到时间点t的疾病监测数据包括:
    在预设区域建立由多个监测点组成的疾病监测网络,从所述监测点获取所述疾病监测数据。
PCT/CN2018/099610 2018-04-11 2018-08-09 疾病异常数据检测方法及装置、计算机装置及存储介质 WO2019196279A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
SG11202008183UA SG11202008183UA (en) 2018-04-11 2018-08-09 Disease abnormal data detection method, device, computing device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810321852.9 2018-04-11
CN201810321852.9A CN108597616A (zh) 2018-04-11 2018-04-11 疾病异常数据检测方法及装置、计算机装置及存储介质

Publications (1)

Publication Number Publication Date
WO2019196279A1 true WO2019196279A1 (zh) 2019-10-17

Family

ID=63621736

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/099610 WO2019196279A1 (zh) 2018-04-11 2018-08-09 疾病异常数据检测方法及装置、计算机装置及存储介质

Country Status (3)

Country Link
CN (1) CN108597616A (zh)
SG (1) SG11202008183UA (zh)
WO (1) WO2019196279A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110299208A (zh) * 2019-05-22 2019-10-01 平安科技(深圳)有限公司 疾病监测数据异常检测方法、***、设备及存储介质
CN112152833B (zh) * 2019-06-29 2023-06-06 北京金山云网络技术有限公司 一种网络异常报警方法、装置及电子设备
CN112152834B (zh) * 2019-06-29 2023-06-06 北京金山云网络技术有限公司 一种网络异常报警方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102805620A (zh) * 2012-08-16 2012-12-05 深圳市邦健电子有限公司 一种房颤检测方法和装置
CN104027100A (zh) * 2014-06-12 2014-09-10 山东中弘信息科技有限公司 一种基于最近历史值的血压异常数据处理方法
CN104523265A (zh) * 2014-12-31 2015-04-22 广州视源电子科技股份有限公司 异常心电数据的判断方法和装置
CN106371939A (zh) * 2016-09-12 2017-02-01 山东大学 一种时序数据异常检测方法及其***

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102130798A (zh) * 2011-03-25 2011-07-20 中国电子科技集团公司第三十研究所 一种分布式网络多维流量异常检测方法和装置
CN106685750B (zh) * 2015-11-11 2019-12-24 华为技术有限公司 ***异常检测方法和装置
CN106055885B (zh) * 2016-05-26 2018-12-11 哈尔滨工业大学 基于过采样投影近似基追踪无人机飞行数据异常检测方法
CN106296435B (zh) * 2016-08-18 2018-02-02 西安科技大学 一种矿井瓦斯监测异常数据识别方法
CN106657038B (zh) * 2016-12-08 2019-12-27 西安交通大学 一种基于对称度Sketch的网络流量异常检测与定位方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102805620A (zh) * 2012-08-16 2012-12-05 深圳市邦健电子有限公司 一种房颤检测方法和装置
CN104027100A (zh) * 2014-06-12 2014-09-10 山东中弘信息科技有限公司 一种基于最近历史值的血压异常数据处理方法
CN104523265A (zh) * 2014-12-31 2015-04-22 广州视源电子科技股份有限公司 异常心电数据的判断方法和装置
CN106371939A (zh) * 2016-09-12 2017-02-01 山东大学 一种时序数据异常检测方法及其***

Also Published As

Publication number Publication date
SG11202008183UA (en) 2020-09-29
CN108597616A (zh) 2018-09-28

Similar Documents

Publication Publication Date Title
WO2019196282A1 (zh) 疾病异常数据检测方法及装置、计算机装置及存储介质
Weissman et al. Locally informed simulation to predict hospital capacity needs during the COVID-19 pandemic
Huter et al. Effectiveness of digital technologies to support nursing care: results of a scoping review
US20180301228A1 (en) Geographic utilization of artificial intelligence in real-time for disease identification and alert notification
Spinelli et al. Missed visits associated with future preexposure prophylaxis (PrEP) discontinuation among PrEP users in a municipal primary care health network
Kulldorff et al. A maximized sequential probability ratio test for drug and vaccine safety surveillance
CN108597617B (zh) 流行病分级预测方法及装置、计算机装置和可读存储介质
Wongvibulsin et al. Development of severe COVID-19 adaptive risk predictor (SCARP), a calculator to predict severe disease or death in hospitalized patients with COVID-19
Hatah et al. The influence of cultural and religious orientations on social support and its potential impact on medication adherence
WO2019196279A1 (zh) 疾病异常数据检测方法及装置、计算机装置及存储介质
Salleron et al. Effectiveness research in inflammatory bowel disease: a necessity and a methodological challenge
Sohn et al. Association of social deprivation and outdoor air pollution with pulmonary tuberculosis in spatiotemporal analysis
Axelsson et al. Cost-effectiveness and long-term follow-up of three forms of minimal-contact cognitive behaviour therapy for severe health anxiety: Results from a randomised controlled trial
WO2019196283A1 (zh) 流行病预测方法、计算机装置及非易失性可读存储介质
Grimmer et al. Identifying functional decline: a methodological challenge
Saberi et al. Antiretroviral therapy adherence and use of an electronic shared medical record among people living with HIV
Talbert-Slagle et al. State variation in HIV/AIDS health outcomes: the effect of spending on social services and public health
Ming et al. Feasibility of post-hospitalization telemedicine video visits for children with medical complexity
Morris Smart biomedical sensors, big healthcare data analytics, and virtual care technologies in monitoring, detection, and prevention of COVID-19
US20180349558A1 (en) Systems and methods for autonomous discharge queue management
Silal et al. The National COVID-19 Epi Model (NCEM): Estimating cases, admissions and deaths for the first wave of COVID-19 in South Africa
Robinson et al. Changes in CD4 count among persons living with HIV/AIDS following Hurricane Katrina
Kengne et al. Nurse-led care for asthma at primary level in rural sub-Saharan Africa: the experience of Bafut in Cameroon
Hernandez et al. On an optimal testing strategy for workplace settings operating during the COVID-19 pandemic
US11189371B2 (en) Systems and methods for adjusting medical treatment to reduce likelihood of prescription cascade

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18914373

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 27.01.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18914373

Country of ref document: EP

Kind code of ref document: A1