CN117851836B

CN117851836B - Intelligent data analysis method for pension information service system

Info

Publication number: CN117851836B
Application number: CN202410245134.3A
Authority: CN
Inventors: 倪佳斌; 应必善; 蔡修成; 朱小龙; 李川
Original assignee: Suzhou Pukang Intelligent Old Age Industry Technology Co ltd; Zhejiang Pukang Intelligent Elderly Care Industry Technology Co ltd
Current assignee: Suzhou Pukang Intelligent Old Age Industry Technology Co ltd; Zhejiang Pukang Intelligent Elderly Care Industry Technology Co ltd
Priority date: 2024-03-05
Filing date: 2024-03-05
Publication date: 2024-05-28
Anticipated expiration: 2044-03-05
Also published as: CN117851836A

Abstract

The invention relates to the technical field of physiological data processing, in particular to a data intelligent analysis method for a pension information service system. The method comprises the steps of obtaining data structure similarity among samples in each dimension and data abnormality degree of each sample in each dimension in each day; further combining the relative distances between the data sequences of the corresponding samples within each day to obtain a distance measure between each sample; obtaining a local range density for each sample; in the process of performing CURE clustering on all samples, an initial sample cluster is obtained; any sample of preset quantity in each initial sample cluster is taken as a group of representative points, the optimal representative point group is screened out by obtaining the representative degree of each group of representative points, the initial sample clusters are clustered, a clustering result is obtained, and the old people are served. The invention improves the clustering effect by obtaining the proper representative points in the sample cluster, and provides personalized service for the elderly.

Description

Intelligent data analysis method for pension information service system

Technical Field

The invention relates to the technical field of physiological data processing, in particular to a data intelligent analysis method for a pension information service system.

Background

The intelligent data analysis method of the pension information service system is developed rapidly in the industry and becomes one of key means for improving the pension service quality and efficiency; the system platform can analyze the health data, daily activity information and social interaction of the user so as to improve the service level of the aged, and plays an important role in promoting the health and social contact of the aged.

Considering that the old people with different physical states can be clustered by monitoring the health data of the old people and utilizing the thought of clustering, and more personalized and accurate services are provided for the old people with different states according to the clustering result. In the prior art, a CURE clustering algorithm is utilized to perform clustering analysis on physiological parameter data of different old people; in the traditional CURE clustering algorithm, the distance between clusters is measured according to the positions of representative points in the clusters and shrinkage factors, but the difference of the shrinkage factors of the selected representative points causes larger difference in distance measurement between the obtained clusters, and proper representative points in the clusters are not obtained, so that the accuracy of the distance measurement between the clusters is lower, the clustering effect is poor, and personalized service cannot be provided.

Disclosure of Invention

In order to solve the technical problem that proper representative points in clusters are not obtained and the clustering effect is poor, the invention aims to provide a data intelligent analysis method for an aged care information service system, and the adopted technical scheme is as follows:

The invention provides a data intelligent analysis method for a pension information service system, which comprises the following steps:

taking each elderly person as one sample, acquiring multidimensional physiological parameter data of each sample at each moment, and acquiring a multidimensional data sequence of each sample at each hour in each day;

Obtaining the data structure similarity among the samples according to the correlation coefficient of the data sequence among each sample in each day in one dimension; obtaining the data abnormality degree of each sample in each day according to the correlation coefficient of the data sequence of each sample in each day and other days in the time neighborhood range;

Obtaining a distance measurement between each sample according to the data structure similarity between each sample in each dimension, the data abnormality degree of the corresponding sample in each day and the relative distance between data sequences; obtaining the local range density of each sample according to the distance measurement between each sample and each other sample in the sample neighborhood range;

In the process of performing CURE clustering on all samples, an initial sample cluster is obtained according to the distance measurement between each sample; in each initial sample cluster, obtaining a representative of a center sample in a sample neighborhood according to the distance measurement and the density difference characteristics between each sample in each sample neighborhood; taking a preset number of samples in each initial sample cluster as a group of representative points, and obtaining the representative degree of each group of representative points according to the representativeness and the corresponding distance measurement between each representative point in each group of representative points; screening out an optimal representative point group according to the representative degree of each group of representative points in each initial cluster, and clustering the initial sample clusters to obtain a clustering result;

And serving the old people according to the clustering result.

Further, the method for acquiring the data structure similarity comprises the following steps: acquiring a data sequence of each sample at all hours in each day under each dimension to form an overall data sequence of each sample in each day; and calculating the correlation coefficient of the whole data sequence in each sample in each day, and averaging the correlation coefficients in all days to obtain the data structure similarity between each sample.

Further, the method for acquiring the data anomaly degree comprises the following steps:

obtaining the data abnormality degree according to an obtaining formula of the data abnormality degree, wherein the obtaining formula of the data abnormality degree is as follows: formula one; wherein/> Representing the degree of data abnormality of the ith sample in the ith dimension in the (r) th day; omega represents the number of other days around the time neighborhood range centered on the r-th day; /(I)Representing the data sequence of the ith sample in the ith dimension at the t-th hour on the r-th day; representing the data sequence at the t hour of the ith sample in the ith dimension on day r+u; Representing the correlation coefficient of the data sequence at the t-th hour between the r-th and r+u-th days for the i-th sample in the l dimension; /(I) Representing a minimum function.

Further, the method for obtaining the distance measurement comprises the following steps:

Obtaining a distance measure according to an obtaining formula of the distance measure, wherein the obtaining formula of the distance measure is as follows: A second formula; wherein/> Representing a distance measure between the i-th sample and the j-th sample; m represents the dimension number of the physiological parameter data; /(I)Representing data structure similarity between the ith sample and the jth sample in the ith dimension; n represents the acquisition days of the physiological parameter data; /(I)Representing the degree of data abnormality of the ith sample in the ith dimension within the v-th day; /(I)Representing the degree of data abnormality of the jth sample in the ith dimension in the ith day; /(I)Representing the relative distance between the ith sample and the jth sample in the ith dimension over the data sequence on the ith day; exp () represents an exponential function that bases on a natural constant.

Further, the method for obtaining the local range density comprises the following steps:

Performing negative correlation mapping on the distance measurement between each sample and each other sample in a sample neighborhood range, and accumulating all negative correlation mapping results to obtain a first accumulated value; and calculating the ratio of the first accumulated value to the preset distance threshold value to obtain the local range density of each sample.

Further, the representative acquisition method includes:

Obtaining other samples corresponding to the maximum distance measurement between each sample and each other sample in each sample neighborhood range in each initial sample cluster, and taking the other samples as comparison samples; calculating the density difference between the sample in the sample neighborhood range and the comparison sample, and averaging all density difference results to obtain the representative of the center sample in the sample neighborhood range.

Further, the representative degree obtaining method includes:

for each set of representative points in each initial sample cluster, calculating a representative mean value between each representative point as a first representative mean value; calculating the product of the first representative mean value between each representative point and the corresponding distance measurement to obtain a first product value;

and calculating the average value of the first product values among all the representative points to obtain the representative degree of each group of the representative points.

Further, the method for obtaining the optimal representative point group comprises the following steps:

and selecting a group of representative points with the maximum representative degree corresponding to all groups of representative points in each initial sample cluster as an optimal representative point group.

Further, the method for acquiring the clustering result comprises the following steps:

calculating the average value of the distance measurement between all the representative points in the optimal representative point group between each initial sample cluster as the average distance between each initial sample cluster; and clustering all initial sample clusters by adopting CURE according to the average distance to obtain new initial sample clusters until the number of the preset cluster clusters is reached, and obtaining a clustering result.

Further, the correlation coefficient is a pearson correlation coefficient.

The invention has the following beneficial effects:

in order to know whether the variation trend of physiological parameters of two samples within each day is similar, under one dimension, the similarity of data structures among the samples is obtained according to the correlation coefficient of the data sequence among each sample within each day; according to the correlation coefficient of the data sequence of each sample between each day and other days in the time neighborhood range, the data abnormality degree of each sample in each day is obtained, and the physiological condition change trend of the old in different time periods is analyzed; according to the similarity of the data structure between each sample in each dimension, the degree of data abnormality of the corresponding sample in each day and the relative distance between the data sequences, the distance measurement between each sample is obtained, and the fine difference between the samples is captured better, so that the accuracy of the distance measurement is improved; according to the distance measurement between each sample and each other sample in the sample neighborhood range, the local range density of each sample is obtained, the difference of life patterns among the aged is considered, the spatial relationship among the samples is analyzed, and the distribution condition of data can be captured better; in the process of performing CURE clustering on all samples, an initial sample cluster is obtained according to distance measurement between each sample; in each initial sample cluster, according to the distance measurement and the density difference characteristic between each sample in each sample neighborhood range, obtaining the representativeness of a central sample in the sample neighborhood range, better identifying and describing the structure and the characteristics of each cluster, and evaluating the representativeness of each sample in a local range; taking a preset number of samples in each initial sample cluster as a group of representative points, obtaining the representative degree of each group of representative points according to the representativeness between each representative point in each group of representative points and the corresponding distance measurement, and identifying samples with higher representativeness to better understand the structure and distribution of the clusters; and screening out an optimal representative point group according to the representative degree of each group of representative points in each initial cluster, and clustering the initial sample clusters to obtain a clustering result, so that the real cluster structure of the data can be found more accurately and rapidly. The invention improves the clustering effect by obtaining the proper representative points in the sample cluster, and provides personalized service for the elderly.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a data intelligent analysis method for a pension information service system according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects of the data intelligent analysis method for the pension information service system according to the invention, which is provided by the invention, with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of the data intelligent analysis method for the pension information service system provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a method for intelligently analyzing data of an information service system for aged people according to an embodiment of the present invention is shown, where the method specifically includes:

Step S1: taking each elderly person as one sample, acquiring multidimensional physiological parameter data of each sample at each moment, and acquiring multidimensional data sequences of each sample at each hour in each day.

In one embodiment of the present invention, to improve the level of care services, the health data of the elderly is continuously monitored to provide more personalized and accurate services; firstly, real-time monitoring physiological parameter data of the old through wearable equipment such as an intelligent watch, a healthy bracelet and the like, taking each old as one sample, and acquiring multidimensional physiological parameter data of each sample at each moment; the multi-dimensions include: heart rate, blood pressure, blood sample saturation, body temperature, blood glucose level, respiratory rate, etc.; the multi-dimensional data sequence of each sample in each hour of each day is obtained, and physiological parameter data of different aged people are analyzed, so that more personalized and accurate service is provided for the aged people.

It should be noted that, due to external factors such as equipment errors, noise may exist in the data, which interferes with subsequent data analysis; in one embodiment of the invention, in order to facilitate the processing of subsequent data, the obtained physiological parameter data is subjected to denoising pretreatment, so that the influence of noise can be eliminated, and the quality of the data is improved. The specific denoising algorithm is a technical means well known to those skilled in the art, and will not be described in detail herein.

It should be noted that, in the embodiment of the present invention, the time interval for acquiring data is 1min; in other embodiments of the present invention, the frequency of the data may be specifically set according to specific situations, which are not limited and described herein.

Step S2: obtaining the data structure similarity among the samples according to the correlation coefficient of the data sequence among each sample in each day in one dimension; and obtaining the data abnormality degree of each sample in each day according to the correlation coefficient of the data sequence of each sample in each day and other days in the time neighborhood range.

Different physiological parameter data can be generated when different activities are performed due to the fact that living habits of different old people are different, for example, body temperature, heart rate and the like in the exercise process are improved to a certain extent; the correlation coefficient can quantify the linear relation between the data sequences of the two samples, know whether the variation trend of the physiological parameters of the two samples within each day is similar, and find the common characteristics and rules among the samples by comparing the similarities, so that the physiological condition of the old can be analyzed more accurately. So in one dimension, the data structure similarity between samples is obtained based on the correlation coefficient of the data sequence between each sample over the day.

Preferably, in one embodiment of the present invention, the method for acquiring data structure similarity includes:

Acquiring a data sequence of each sample at all hours in each day under each dimension to form an overall data sequence of each sample in each day; and calculating the correlation coefficient of the whole data sequence in each sample in each day, and averaging the correlation coefficients in all days to obtain the data structure similarity between each sample. In one embodiment of the invention, the formula for data structure similarity is expressed as: a formula III; wherein/> Representing data structure similarity between the ith sample and the jth sample in the ith dimension; /(I)Representing the overall data sequence of the ith sample in the ith dimension over the r-th day; /(I)Representing the overall data sequence of the jth sample in the ith dimension over the (r) th day; a correlation coefficient representing the overall data sequence between the ith sample and the jth sample in the ith dimension over the nth day; n represents the number of days of acquisition of physiological parameter data.

In the formula of the data structure similarity, the larger the correlation coefficient of the whole data sequence in each day between each sample in the first dimension is, the smaller the data difference is, the higher the data structure similarity is, namely, the life patterns among the old people are closer.

It should be noted that, in one embodiment of the present invention, the correlation coefficient is a pearson correlation coefficient; the specific pearson correlation coefficient is a technical means well known to those skilled in the art, and will not be described herein.

The daily life of the same aged is not constant, different schedule activities can occur between different days, and different physiological parameter data are generated; by analyzing the data sequence of each sample between different days, the physiological condition change trend of the old in different time periods can be analyzed; if one sample is highly correlated with data changes on one day and on other days within the neighborhood, there is a greater likelihood that the data is normal on that day; conversely, if the correlation coefficient of one sample is small for data on one day and data on other days in the neighborhood, the more likely that the data on that day is abnormal; the degree of data anomalies for each sample per day is obtained from the correlation coefficients of the data sequences for each sample between each day and other days within the time neighborhood.

Preferably, in one embodiment of the present invention, the method for acquiring the degree of abnormality of data includes:

obtaining the data abnormality degree according to an obtaining formula of the data abnormality degree, wherein the obtaining formula of the data abnormality degree is as follows: formula one; wherein/> Representing the degree of data abnormality of the ith sample in the ith dimension in the r day; omega represents the number of other days around the time neighborhood range centered on the r-th day; /(I)Representing the data sequence of the ith sample in the ith dimension at the t-th hour on the r-th day; representing the data sequence at the t hour of the ith sample in the ith dimension on day r+u; Representing the correlation coefficient of the data sequence at the t-th hour between the r-th and r+u-th days for the i-th sample in the l dimension; /(I) Representing a minimum function.

In the acquisition formula of the degree of abnormality of the data,Representing the minimum correlation coefficient for the data sequence per hour between the r-th and r+u-th days for the i-th sample in the l-th dimension; normalizing the minimum value of the correlation coefficient, wherein the smaller the minimum value of the correlation coefficient is, the larger the data sequence difference of the sample in the corresponding hour between two days is, the larger the data abnormality degree is, and the larger the physiological activity difference is; the smaller the correlation coefficient of the data sequence per hour between each day and the other days in the time neighborhood of the sample, the more likely an abnormality occurs on the corresponding days, and the greater the degree of abnormality of the data.

It should be noted that, in one embodiment of the present invention, the time neighborhood range is a range with a size of 7 built around each day, that is, ω is a checked value of 3; in other embodiments of the present invention, the size of the time neighborhood range may be specifically set according to specific situations, which is not limited and described herein.

Step S3: obtaining a distance measurement between each sample according to the data structure similarity between each sample in each dimension, the data abnormality degree of the corresponding sample in each day and the relative distance between data sequences; the local range density of each sample is obtained from a distance measure between each sample and each other sample within the sample neighborhood. Because the distance measurement between the traditional samples only considers the data difference under the corresponding moments of the two samples, the sample data under different activities are ignored to generate different changes, and in order to more comprehensively understand the internal structure and relation of the data, the similarity of the data structure, the degree of abnormality of the data in each day and the relative distance are comprehensively analyzed; data structure similarity considers the data distribution or pattern of samples in a certain dimension, and the higher the similarity, the smaller the distance measurement between the samples; the degree of data abnormality in each day focuses on the local characteristics of the data, and the greater the degree of data abnormality, the poorer the accuracy of analyzing the distance measurement between samples; the relative distance provides global information of the relationship between the samples, the greater the relative distance, the greater the distance measure between the samples; the distance measure between each sample is obtained from the data structure similarity between each sample in each dimension, the degree of data anomalies of the corresponding sample within each day, and the relative distance between the data sequences.

Preferably, in one embodiment of the present invention, the method for obtaining the distance metric includes:

Obtaining a distance measure according to an obtaining formula of the distance measure, wherein the obtaining formula of the distance measure is as follows: A second formula; wherein/> Representing a distance measure between the i-th sample and the j-th sample; m represents the dimension number of the physiological parameter data; /(I)Representing data structure similarity between the ith sample and the jth sample in the ith dimension; n represents the acquisition days of the physiological parameter data; representing the degree of data abnormality of the ith sample in the ith dimension within the v-th day; /(I) Representing the degree of data abnormality of the jth sample in the ith dimension in the ith day; /(I)Representing the relative distance between the ith sample and the jth sample in the ith dimension over the data sequence on the ith day; exp () represents an exponential function that bases on a natural constant.

In the distance measurement acquisition formula, the natural constant-based exponential function is used for obtaining the distance measurementPerforming negative correlation mapping, wherein the larger the similarity of the data structure under each dimension is, the smaller the difference between the data is, and the smaller the distance measurement between samples is; The average value of the data abnormality degree of the ith sample and the jth sample in the ith dimension in the ith day is represented, the larger the average value is, the higher the data abnormality degree in the corresponding days is, the lower the accuracy of data analysis is, the larger the relative distance is required to be regulated, the influence of irrelevant factors is avoided, the larger the relative distance is, and the distance measurement between the samples is larger.

In one embodiment of the present invention, the method for obtaining the relative distance includes: acquiring a data sequence of each sample at all hours in each day under each dimension to form an overall data sequence of each sample in each day, and calculating Euclidean distance of the overall data sequence of each sample in each day as a relative distance; the specific euclidean distance is a technical means well known to those skilled in the art, and will not be described herein.

Because the sample size is too large, the relationship between all samples can be very time-consuming and computationally intensive to directly process, the old people with similar life patterns are selected through the distance measurement between samples in the neighborhood range of each sample, and the reference samples which are similar in some key characteristics but not identical can greatly reduce the data size to be processed, do not lose much information, and improve the calculation efficiency and accuracy; by calculating a distance measure between each sample and each other sample within the sample neighborhood, the density around the sample can be estimated; the greater the distance metric, the less the sample is gathered and the less the local range density of the sample; and obtaining the local range density of each sample according to the distance measurement between each sample and each other sample in the sample neighborhood range and the preset distance threshold value.

Preferably, in one embodiment of the present invention, the method for obtaining the local area density includes:

Performing negative correlation mapping on the distance measurement between each sample and each other sample in the sample neighborhood range, and accumulating all negative correlation mapping results to obtain a first accumulated value; and calculating the ratio of the first accumulated value to the preset distance threshold value to obtain the local range density of each sample. In one embodiment of the invention, the formula for the local area density is expressed as: A formula IV; wherein/> Representing the local range density of the ith sample; /(I)Representing a distance measure between the ith sample and the kth other samples in the sample neighborhood; l represents a preset distance threshold; representing the number of other samples in the i-th sample neighborhood; /(I) An exponential function based on a natural constant is represented.

In the formula of density, the natural constant-based exponential function is used for thePerforming negative correlation mapping, wherein the larger the distance measurement between samples is, the more discrete the distribution of the samples is; /(I)The ratio of the first accumulated value to the preset distance threshold value is shown, which shows the distribution condition and similarity of the samples in the space, and the larger the ratio is, the closer the distribution among the samples is, and the higher the density of the samples is.

It should be noted that, in one embodiment of the present invention, the sample neighborhood range is a range size formed by taking each sample as a center and other samples with a distance metric between each sample smaller than a preset distance threshold, and the preset distance threshold is 20; in other embodiments of the present invention, the size of the sample neighborhood range may be specifically set according to specific situations, which is not limited and described herein. In other embodiments of the present invention, the positive-negative correlation may be constructed by other basic mathematical operations, and the specific means are technical means well known to those skilled in the art, and will not be described herein.

Step S4: in the process of performing CURE clustering on all samples, an initial sample cluster is obtained according to distance measurement between each sample; in each initial sample cluster, obtaining a representative of a center sample in a sample neighborhood according to the distance measurement and the density difference characteristics between each sample in each sample neighborhood; taking a preset number of samples in each initial sample cluster as a group of representative points, and obtaining the representative degree of each group of representative points according to the representativeness and the corresponding distance measurement between each representative point in each group of representative points; and screening out optimal representative points according to the representative degree of each group of representative points in each initial cluster, and clustering the initial sample clusters to obtain a clustering result.

In the clustering process, samples are distributed into different clusters according to the characteristic similarity degree, so that samples in each cluster are similar as much as possible, and sample points among different clusters are different as much as possible; to improve the quality and effect of clustering, an initial cluster of samples is obtained from a distance metric between each sample during the CURE clustering of all samples.

It should be noted that, the specific CURE clustering is a technical means well known to those skilled in the art, and will not be described herein.

To better identify and describe the structure and characteristics of each cluster, evaluate the representativeness of each sample in a local range, comprehensively considering density difference characteristics and distance metrics; distance metrics are important indicators for assessing similarity or variability between samples; the density difference describes the distribution of the sample in space; the farther the distance metric, the greater the density difference, the more likely the sample is an outlier or edge point, more representative; in each initial sample cluster, a representative sample in the sample neighborhood is obtained according to the distance measurement and the density difference characteristic between each sample in each sample neighborhood.

Preferably, in one embodiment of the present invention, the representative acquisition method includes:

Obtaining other samples corresponding to the maximum distance measurement between each sample and each other sample in each sample neighborhood range in each initial sample cluster, and taking the other samples as comparison samples; calculating the density difference between the sample in the sample neighborhood range and the comparison sample, and averaging all density difference results to obtain the representative of the center sample in the sample neighborhood range. In one embodiment of the invention, a representative formula is expressed as: A fifth formula; wherein/> Representative of the p-th sample in each initial sample cluster; /(I)Representing the number of samples in the neighborhood range of the p-th sample in each initial sample cluster; /(I)Representing the local range density of the τ -th sample within the neighborhood of the p-th sample in each initial sample cluster; /(I)Representing a comparison sample/>, in each initial sample cluster, with the greatest distance metric from the τ point in the neighborhood of the p-th sampleIs a local range density of (c).

In a representative formula of the present invention,Representing the τ sample and the comparison sample/>, within the neighborhood of the p-th sample, in each initial sample clusterThe larger the difference, the larger the density variation of the samples in different directions within the neighborhood, the more likely to be edge points and outliers in the initial sample cluster, the larger the representativeness.

In order to simplify calculation and improve clustering efficiency, samples which can represent the characteristics of initial sample clusters are selected for analysis, and any preset number of samples in each initial sample cluster are taken as a group of representative points; by comparing the distance between representative points with the representativeness, the structure and distribution of clusters can be better understood, helping to identify samples with higher representativeness; the larger the distance measure between the representative points, the more scattered the representative, the more obvious the features of the initial sample cluster, and the higher the representative degree; and obtaining the representative degree of each group of representative points according to the representativeness among each representative point in each group of representative points and the corresponding distance measurement.

Preferably, in one embodiment of the present invention, the representative degree acquiring method includes:

For each set of representative points in each initial sample cluster, calculating a representative mean value between each representative point as a first representative mean value; calculating the product of the first representative mean value between each representative point and the corresponding distance measurement to obtain a first product value; and calculating the average value of the first product values among all the representative points to obtain the representative degree of each group of the representative points. In one embodiment of the invention, the formula for the representative degree is expressed as: a formula six; wherein R represents the degree of representation of each set of representative points in each initial sample cluster; /(I) Expressed in each initial sample cluster as the/>, of the representative points of each groupRepresentative of the individual representative points; /(I)Representative of the b-th representative point in each set of representative points represented in each initial sample cluster; /(I)Represents the/>, of each group of representative pointsA distance measure between the representative point and the b-th representative point; n represents a preset number of representative points of each group in each initial sample cluster.

In the formula of the representative degree of the present invention,Expressed in each initial sample cluster, the/>The larger the first representative average value is, the larger the distance measure between the representative points is, the more important the edge or anomaly of the initial sample cluster is represented, and the more discrete the distribution between the representative points is, the better the shape of the corresponding initial sample cluster is represented.

It should be noted that, in an embodiment of the present invention, the preset number of operators of each group of representative points may be specifically set according to specific situations, which is not described herein.

The representative degree is used for measuring the proper degree of a sample serving as the representative of a sample cluster, and the higher the representative degree is, the characteristics of the cluster where the optimal representative point is located can be accurately reflected, so that the real cluster structure of the data can be more accurately and rapidly found, visual understanding of a clustering result can be provided, and the clustering precision is improved. And screening out an optimal representative point group according to the representative degree of each group of representative points in each initial cluster, and clustering the initial sample clusters to obtain a clustering result.

Preferably, in one embodiment of the present invention, the method for acquiring the optimal representative point group includes:

Preferably, in one embodiment of the present invention, the method for obtaining the clustering result includes:

Calculating the average value of the distance measurement between all the representative points in the optimal representative point group between each initial sample cluster as the average distance between each initial sample cluster; and clustering all the initial sample clusters by using CURE according to the average distance to obtain new initial sample clusters until the number of the preset cluster clusters is reached, and obtaining a clustering result.

It should be noted that, in one embodiment of the present invention, the number of preset clusters is 5; in other embodiments of the present invention, the number of the preset clusters may be specifically set by the operators according to specific situations, which is not limited and described herein.

Step S5: and serving the old according to the clustering result.

Dividing the elderly into different groups according to clustering results, wherein each group has similar characteristics and requirements; according to the clustering results, resources can be more reasonably distributed, and customized services can be provided for each group, wherein the customized services comprise customized health management plans, specific social activities recommended and personalized entertainment suggestions and the like, so as to meet the unique requirements of different aged people. And serving the old according to the clustering result.

In summary, in one dimension, the invention obtains the similarity of the data structures among the samples according to the correlation coefficient of the data sequence among each sample in each day; obtaining the data abnormality degree of each sample in each day according to the correlation coefficient of the data sequence of each sample in each day and other days in the time neighborhood range; further combining the relative distances between the data sequences of the corresponding samples within each day to obtain a distance measure between each sample; further obtaining a local range density for each sample; in the process of performing CURE clustering on all samples, an initial sample cluster is obtained; and taking a preset number of samples in each initial sample cluster as a group of representative points, obtaining the representative degree of each group of representative points according to the representativeness between each representative point in each group of representative points and the corresponding distance measurement, screening out the optimal representative point group, clustering the initial sample clusters to obtain a clustering result, and serving the aged. The invention improves the clustering effect by obtaining the proper representative points in the sample cluster, and provides personalized service for the elderly.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims

1. A method for intelligent analysis of data for a pension information service system, the method comprising:

service the old people according to the clustering result;

The method for acquiring the data structure similarity comprises the following steps:

Acquiring a data sequence of each sample at all hours in each day under each dimension to form an overall data sequence of each sample in each day; calculating the correlation coefficient of the whole data sequence in each sample every day, and averaging the correlation coefficients in all days to obtain the data structure similarity between each sample;

The method for acquiring the data abnormality degree comprises the following steps:

obtaining the data abnormality degree according to an obtaining formula of the data abnormality degree, wherein the obtaining formula of the data abnormality degree is as follows:

wherein/> Representing the degree of data abnormality of the ith sample in the ith dimension in the r day; omega represents the number of other days around the time neighborhood range centered on the r-th day; representing the data sequence of the ith sample in the ith dimension at the t-th hour on the r-th day; /(I) Representing the data sequence at the t hour of the ith sample in the ith dimension on day r+u; /(I)Representing the correlation coefficient of the data sequence at the t-th hour between the r-th and r+u-th days for the i-th sample in the l dimension; min () represents a minimum function;

the distance measurement acquisition method comprises the following steps:

Obtaining a distance measure according to an obtaining formula of the distance measure, wherein the obtaining formula of the distance measure is as follows:

Wherein D _i,j represents a distance measure between the i-th sample and the j-th sample; m represents the dimension number of the physiological parameter data; /(I) Representing data structure similarity between the ith sample and the jth sample in the ith dimension; n represents the acquisition days of the physiological parameter data; /(I)Representing the degree of data abnormality of the ith sample in the ith dimension within the v-th day; /(I)Representing the degree of data abnormality of the jth sample in the ith dimension in the ith day; /(I)Representing the relative distance between the ith sample and the jth sample in the ith dimension over the data sequence on the ith day; exp () represents an exponential function based on a natural constant;

The representative acquisition method comprises the following steps:

2. The intelligent analysis method for data of a pension information service system according to claim 1, wherein the obtaining method for the local range density includes:

3. The intelligent analysis method for data of a pension information service system according to claim 1, wherein the representative degree obtaining method comprises:

4. The intelligent data analysis method for a pension information service system according to claim 1, wherein the method for acquiring the optimal representative point group includes:

5. The intelligent analysis method for data of a pension information service system according to claim 1, wherein the method for obtaining the clustering result comprises:

6. The intelligent analysis method for senior information service system according to claim 1, wherein the correlation coefficient is pearson correlation coefficient.