CN117312255B - Electronic document splitting optimization management method and system - Google Patents

Electronic document splitting optimization management method and system Download PDF

Info

Publication number
CN117312255B
CN117312255B CN202311605670.1A CN202311605670A CN117312255B CN 117312255 B CN117312255 B CN 117312255B CN 202311605670 A CN202311605670 A CN 202311605670A CN 117312255 B CN117312255 B CN 117312255B
Authority
CN
China
Prior art keywords
data
water
analyzed
segmentation point
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311605670.1A
Other languages
Chinese (zh)
Other versions
CN117312255A (en
Inventor
李洪波
石文博
米杰
毛伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Zhongsi Information Technology Co ltd
Original Assignee
Hunan Zhongsi Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Zhongsi Information Technology Co ltd filed Critical Hunan Zhongsi Information Technology Co ltd
Priority to CN202311605670.1A priority Critical patent/CN117312255B/en
Publication of CN117312255A publication Critical patent/CN117312255A/en
Application granted granted Critical
Publication of CN117312255B publication Critical patent/CN117312255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a method and a system for splitting, optimizing and managing electronic documents, which comprise the following steps: real water consumption data of the water consumption in the factory at different moments in a preset time period are collected in real time, and historical water consumption data of the water consumption in the factory at different moments in the preset time period are collected in the historical water consumption; according to the distribution condition of the water data to be analyzed in the water data neighborhood range to be analyzed, screening the water data to be analyzed to determine initial segmentation points; obtaining an optimal segmentation point according to the segmentation result of the water data to be analyzed by the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed by the initial segmentation point in the actual water data; and carrying out sectional compression processing on the water data to be analyzed by utilizing the optimal sectional point, and storing the data subjected to sectional compression to obtain split water consumption electronic document data. The invention has better effect of carrying out sectional compression treatment on the water data to be analyzed.

Description

Electronic document splitting optimization management method and system
Technical Field
The invention relates to the technical field of data processing, in particular to an electronic document splitting optimization management method and system.
Background
The water consumption data in the factory refers to real-time data of water consumption of the factory at each moment in the day, and in order to monitor the water consumption condition in the factory in real time, the water consumption in the factory acquired in real time needs to be stored and uploaded to a management system. However, as the fluctuation of the water consumption data in the factory is frequent and the fluctuation range is smaller, repeated data acquired every day is more, so that the data redundancy degree is larger when the water consumption data in the factory is compressed and stored, when the real-time acquired water consumption data is split and compressed, only the continuous repeated condition of the acquired data is considered, the real-time acquired water consumption data is directly split and compressed, and the redundant relation between the real-time acquired water consumption data and the historical water consumption data is not considered, so that the existing method for splitting and compressing the water consumption data has poor processing effect.
Disclosure of Invention
In order to solve the technical problem that the existing method for splitting and compressing water data has poor processing effect, the invention aims to provide an electronic document splitting and optimizing management method, which adopts the following technical scheme:
real water consumption data of the water consumption in the factory at different moments in a preset time period are collected in real time, and historical water consumption data of the water consumption in the factory at different moments in the preset time period are collected in the historical water consumption; the actual water use data and the historical water use data are water use data to be analyzed;
obtaining probability indexes of each piece of water data to be analyzed as data segmentation points according to the distribution condition of the water data to be analyzed in the water data neighborhood range to be analyzed; screening the water data to be analyzed according to the probability index to determine initial segmentation points;
obtaining an optimal segmentation point according to the segmentation result of the water data to be analyzed by the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed by the initial segmentation point in the actual water data;
and carrying out sectional compression processing on the water data to be analyzed by utilizing the optimal sectional point, and storing the data subjected to sectional compression to obtain split water consumption electronic document data.
Preferably, the obtaining the optimal segmentation point according to the segmentation result of the water data to be analyzed by the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed by the initial segmentation point in the actual water data specifically includes:
obtaining an effect evaluation index of the initial segmentation point according to the segmentation result of the water data to be analyzed of the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed of the initial segmentation point in the actual water data;
and screening the initial segmentation points according to the effect evaluation index to determine the optimal segmentation points.
Preferably, the obtaining the effect evaluation index of the initial segmentation point according to the segmentation result of the water data to be analyzed by the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed by the initial segmentation point in the actual water data specifically includes:
recording any initial segmentation point in the actual water use data as a first target segmentation point, recording the next initial segmentation point adjacent to the first target segmentation point as a second target segmentation point, and recording initial segmentation points with the same position serial numbers as the first and second target segmentation points in the historical water use data in the actual water use data as a first matching segmentation point and a second matching segmentation point respectively;
acquiring actual water consumption data between a first target segmentation point and a second target segmentation point to form a target actual data sequence, and acquiring historical water consumption data between the first target segmentation point and the second target segmentation point to form a target historical data sequence; acquiring actual water consumption data between a first matching segmentation point and a second matching segmentation point to form a matching actual data sequence, and acquiring historical water consumption data between the first matching segmentation point and the second matching segmentation point to form a matching historical data sequence;
and obtaining the effect evaluation indexes of the second target segmentation point and the second matching segmentation point according to the target actual data sequence, the target historical data sequence, the matching actual data sequence and the data distribution conditions in the matching historical data sequence.
Preferably, the calculation formula of the effect evaluation index is specifically:
wherein,an effect evaluation index indicating a second target segment point and a second matching segment point, r+1 indicating an (r+1) -th initial segment point,>representing a matching actual data sequence->Representing the number of different data values contained in the matching actual data sequence, ±>Indicating the number of presence of the x-th value in the matching actual data sequence,/for>Representing the frequency of occurrence of the x-th value in the matching actual data sequence; />Representing a matching history data sequence,/->Representing the number of different data values contained in the matching history data sequence, ±>Indicating the number of presence of the x-th value in the matching history data sequence,/for>Representing the frequency of occurrence of the x-th value in the matching history data sequence; />Representing the actual data sequence of the object->Representing different data values contained in the target actual data sequenceQuantity of->Indicating the number of x-th values present in the target actual data sequence,/for the target actual data sequence>Representing the frequency of occurrence of the x-th numerical value contained in the target actual data sequence; />Representing a target history data sequence,/->Representing the number of different data values contained in the target historical data sequence, +.>Indicating the number of x-th value present in the target history data sequence,/for>Representing the frequency of occurrence of the x-th value contained in the target history data sequence.
Preferably, the screening the initial segmentation point according to the effect evaluation index, and determining an optimal segmentation point specifically includes:
when the effect evaluation index of the second target segmentation point and the second matching segmentation point is larger than a preset value, the second matching segmentation point is the optimal segmentation point in the water data to be analyzed; and when the effect evaluation indexes of the second target segmentation point and the second matching segmentation point are smaller than a preset value, the second target segmentation point is the optimal segmentation point in the water data to be analyzed.
Preferably, the obtaining the probability indicator that each water data to be analyzed is a data segment point according to the distribution condition of the water data to be analyzed in the water data neighborhood range to be analyzed specifically includes:
recording any one water data to be analyzed as selected water data, and forming a left neighborhood data sequence of the selected water data by a preset number of adjacent water data to be analyzed before the selected water data; forming a right neighborhood data sequence of the selected water data by using adjacent preset number of water data to be analyzed after the water data are selected;
and obtaining probability indexes of the selected water data as data segmentation points according to the matching relation of the left neighborhood data sequence and the right neighborhood data sequence.
Preferably, the obtaining the probability indicator that the selected water data is the data segmentation point according to the matching relationship between the left neighborhood data sequence and the right neighborhood data sequence specifically includes:
the method comprises the steps of obtaining the number of data, which are the same in a right neighborhood data sequence, of all data in a left neighborhood data sequence as a first number, and obtaining the number of data, which are the same in a left neighborhood data sequence, of all data in a right neighborhood data sequence as a second number; and obtaining a probability index of selecting the water data as the data segmentation point according to the maximum value in the first quantity and the second quantity, wherein the maximum value and the probability index are in a negative correlation.
Preferably, the screening the water data to be analyzed according to the probability index to determine an initial segmentation point specifically includes:
and recording the water data to be analyzed corresponding to the probability index being greater than or equal to a preset probability threshold as an initial segmentation point.
Preferably, the step of performing the segment compression processing on the water data to be analyzed by using the optimal segment point specifically includes:
and respectively carrying out segmentation processing on actual water use data and historical water use data in the water use data to be analyzed by utilizing the optimal segmentation points, and carrying out compression processing on the segmented data by utilizing a Huffman coding algorithm.
The invention also provides an electronic document splitting optimization management system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the steps of an electronic document splitting optimization management method when being executed by the processor.
The embodiment of the invention has at least the following beneficial effects:
according to the invention, the actual water consumption data and the historical water consumption data are collected at first, so that the two water consumption data are subjected to joint analysis later, and the data processing effect is better. Then, analyzing the distribution condition of the water data to be analyzed in the neighborhood range of the water data to be analyzed to obtain probability indexes of the water data to be analyzed as data segmentation points, namely reflecting the probability of the water data to be analyzed as the data segmentation points, so as to screen the water data to be analyzed to determine initial segmentation points, wherein the initial segmentation points only represent the data distribution condition of the corresponding water data in the neighborhood range of the water data to be analyzed. Further, according to the segmentation result of the water data to be analyzed by the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed by the initial segmentation point in the actual water data, an optimal segmentation point is obtained, namely, the segmentation conditions of the initial segmentation point in the two data are respectively analyzed, so that the data segmentation points with better segmentation conditions in the historical water data and the actual water data are screened out. Finally, the optimal segmentation point is utilized to perform segmented compression treatment on the water consumption data to be analyzed, and the obtained water consumption electronic document data is beneficial to management of the water consumption data of the factory.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for electronic document splitting optimization management according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following detailed description refers to specific implementation, structure, characteristics and effects of the method and system for electronic document splitting and optimizing management according to the invention, which are provided by the invention, with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The invention provides a specific scheme of an electronic document splitting optimization management method and a system, which are specifically described below with reference to the accompanying drawings.
An embodiment of an electronic document splitting optimization management method comprises the following steps:
referring to fig. 1, a flowchart of a method for electronic document splitting optimization management according to an embodiment of the present invention is shown, and the method includes the following steps:
step one, acquiring actual water use data of the water consumption in a factory at different moments in a preset time period in real time, and acquiring historical water use data of the water consumption in the historical time period at different moments in the preset time period; the actual water use data and the historical water use data are water use data to be analyzed.
Firstly, actual water use data of the water consumption in the factory at different moments in a preset time period are collected from a management system in the factory, and historical water use data of the water consumption in the factory at different moments in the preset time period are collected. In this embodiment, the time length of the preset time period is set to one day, that is, 24 hours, and the time interval between two adjacent different times is set to 10 minutes, and the practitioner can set according to the specific implementation scenario.
Specifically, when the water consumption condition in the factory on the same day is collected every day, the collected data is required to be split and stored, and the water consumption condition of the day is similar to the water consumption condition collected in real time in consideration of the history, and the fluctuation condition of the water consumption is also similar, so that the water consumption data of the same time length and different moments in the history water consumption are collected at the same time. That is, in the present embodiment, actual water usage data at each time of the day is collected in real time, and at the same time, historical water usage data at each time of the day before the day is collected in the historical water usage. In order to facilitate the subsequent data analysis process, both the actual water usage data and the historical water usage data are used as the water usage data to be analyzed, and it can be understood that the water usage data to be analyzed in the embodiment includes water usage data in two preset time periods.
Step two, obtaining probability indexes of each piece of water data to be analyzed as data segmentation points according to the distribution condition of the water data to be analyzed in the water data neighborhood range to be analyzed; and screening the water data to be analyzed according to the probability index to determine an initial segmentation point.
Because the water data in the factory frequently fluctuates and the fluctuation amplitude is relatively close to each other, the collected water data has the same numerical value as more data in the historical data, and therefore, the historical data and the data collected in real time are split and compressed simultaneously. Although the data quantity of the historical water consumption data and the real water consumption data acquired in real time in the historical data is equal, the data segmentation points between the historical water consumption data and the real water consumption data may be different, so that when the historical water consumption data and the real water consumption data are split respectively by using the same data segmentation point, the overall data compression efficiency is low, the splitting compression processing effect is poor, and further the data segmentation points need to be analyzed to determine the data segmentation point when the splitting compression effect is optimal.
Firstly, analyzing water data to be analyzed, and determining the position distribution of data segmentation points in the water data, namely obtaining probability indexes of each water data to be analyzed as the data segmentation points according to the distribution condition of the water data to be analyzed in the neighborhood range of the water data to be analyzed. In the process of analyzing, screening and determining the data segmentation points of the water consumption data to be analyzed, the actual water consumption data and the historical water consumption data respectively exist the data segmentation points in each preset time period, and in this embodiment, any one water consumption data is taken as an example for description.
Specifically, any water data to be analyzed is recorded as selected water data, and a left neighborhood data sequence of the selected water data is formed by a preset number of adjacent water data to be analyzed before the selected water data; and constructing a right neighborhood data sequence of the selected water data by using the adjacent preset number of water data to be analyzed after the selected water data.
In this embodiment, the preset number is 20, and the ith actual water data of all the actual water data is taken as the selected water data for explanation, the ith actual water data may be expressed asThe left neighborhood data sequence of the selected water data, i.e. the ith actual water data, can be expressed as +.>Wherein->Represents the i-20 th actual water use data in all actual water use data, < >>The i-1 th actual water use data among all the actual water use data is represented. The right neighborhood data sequence of the selected water data, i.e. the ith actual water data, can be expressed asWherein->Represents the (i+1) th actual water data among all actual water data,/for>The (i+20) th actual water data among all the actual water data are shown.
When the number of data on the left or right side of the selected water data is smaller than the preset number, the analysis and judgment operation of whether the data can be used as the segmentation point is not performed, namely, the data analysis is performed from the 21 st data in the actual water data until the n-20 th data in the actual water data is finished, wherein n is the total number of the actual water data.
And obtaining probability indexes of the selected water data as data segmentation points according to the matching relation of the left neighborhood data sequence and the right neighborhood data sequence. The data quantity of all the data in the left neighborhood data sequence, which is the same in the right neighborhood data sequence, is obtained and is marked as the first quantity, and the data quantity of all the data in the right neighborhood data sequence, which is the same in the left neighborhood data sequence, is obtained and is marked as the second quantity.
For example, assuming that the selected water consumption data is 1, the preset number value is 4, the corresponding left neighborhood data sequence is {1,4,5,3,1}, the right neighborhood data sequence is {1,1,1,1,4}, and the values of all the data in the left neighborhood data sequence are 1,4,1 in the right neighborhood data sequence, respectively, where the first number value is 3. The values of all data in the right neighborhood data sequence are 1,1,1,1,4 in the left neighborhood data sequence, and the value of the second number is 5.
And obtaining a probability index of selecting the water data as the data segmentation point according to the maximum value in the first quantity and the second quantity, wherein the maximum value and the probability index are in a negative correlation. Taking the ith actual water data in all the actual water data as the selected water data for illustration, the calculation formula of the probability index of the ith actual water data as the data segmentation point can be expressed as follows:
wherein,probability index indicating the ith actual water data as data segment point, +.>Representing a first quantity, +.>Representing a second quantity, +.>Representing the amount of data contained in either the left or right neighborhood data sequence.
The first quantity characterizes the number of repetitions of data in a neighborhood before and in a neighborhood after the selected water data, the second quantity characterizes the number of repetitions of data in a neighborhood after and in a neighborhood before the selected water data,the larger the value of the ratio of the part with larger repeated data in the two parts is, the higher the data repetition degree in the neighborhood range on the left side and the right side of the selected water data is, and the worse the effect of taking the selected water data as the data segmentation point is, namely the smaller the value of the corresponding probability index is, the smaller the probability of taking the selected water data as the data segmentation point is.
It should be noted that, when the huffman coding algorithm is used to compress the water data to be analyzed, the compression effect of the data is related to the numerical distribution of the data, when the frequency of the numerical value of the data in all the water data to be analyzed is high, the corresponding data compression efficiency is high, when the repetition degree of the data in the neighborhood regions on the left and right sides of the water data to be analyzed is high, the necessity of the segmentation operation is low, and the probability of the corresponding water data to be analyzed as the data segmentation points is low.
Based on the above, the to-be-analyzed water data is screened according to the probability index to determine an initial segmentation point, namely the probability index characterizes the probability that the to-be-analyzed water data is taken as a data segmentation point, when the probability index is larger than or equal to a preset probability threshold value, the probability that the to-be-analyzed water data is taken as the data segmentation point is higher, so that the to-be-analyzed water data is marked as the initial segmentation point, namely the corresponding to-be-analyzed water data is marked as the initial segmentation point when the probability index is larger than or equal to the preset probability threshold value. When the probability index is smaller than a preset probability threshold, the probability that the water data to be analyzed is used as the data segmentation point is lower, so that the water data to be analyzed is not used as the segmentation point for data analysis.
In this embodiment, the probability threshold has a value of 0.4, and the implementer may set according to a specific implementation scenario. According to the method, all initial segmentation points in all actual water use data can be obtained respectively, and all initial segmentation points in all historical water use data can be obtained simultaneously.
And thirdly, obtaining an optimal segmentation point according to the segmentation result of the water data to be analyzed by the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed by the initial segmentation point in the actual water data.
The corresponding data segmentation points exist in the actual water data at all times in a day, the corresponding data segmentation points exist in the historical water data at all times in a day, namely, the initial segmentation points in the actual water data are obtained based on the numerical distribution of the actual water data, the initial segmentation points in the historical water data are obtained based on the numerical distribution of the historical water data, and therefore, when the initial segmentation points in the actual water data are utilized to divide the historical water data at the same time, the situation that the splitting effect is poor can occur, and similarly, when the initial segmentation points in the historical water data are utilized to divide the actual water data at the same time, the situation that the splitting effect is poor can also occur, so that the analysis needs to be carried out by combining a plurality of different segmentation results to determine the data segmentation points with the optimal splitting effect.
Based on the analysis result, the effect evaluation index of the initial segmentation point is obtained according to the segmentation result of the water data to be analyzed of the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed of the initial segmentation point in the actual water data.
Specifically, any initial segmentation point in the actual water use data is marked as a first target segmentation point, the next initial segmentation point adjacent to the first target segmentation point is marked as a second target segmentation point, and initial segmentation points in the historical water use data, which have the same position serial numbers as the first and second target segmentation points in the actual water use data, are respectively marked as a first matching segmentation point and a second matching segmentation point.
In this embodiment, the (r+1) th initial segmentation point in the actual water data is taken as the first target segmentation point, the (r+1) th initial segmentation point is taken as the second target segmentation point, and similarly, the (r) th initial segmentation point in the historical water data is taken as the first matching segmentation point, and the (r+1) th initial segmentation point is taken as the second matching segmentation point.
Acquiring actual water data between a first target segment point and a second target segment point to form a target actual data sequence, which is expressed asAcquiring historical water data between a first target segmentation point and a second target segmentation point to form a target historical data sequence, wherein the target historical data sequence is expressed as +.>The method comprises the steps of carrying out a first treatment on the surface of the Acquiring actual water data between the first matching segmentation point and the second matching segmentation point to form a matching actual data sequence, wherein the matching actual data sequence is expressed as +.>Acquiring historical water data between the first matching segmentation point and the second matching segmentation point to form a matching historical data sequence, wherein the matching historical data sequence is expressed as +.>
And obtaining the effect evaluation indexes of the second target segmentation point and the second matching segmentation point according to the target actual data sequence, the target historical data sequence, the matching actual data sequence and the data distribution conditions in the matching historical data sequence. That is, in the present embodiment, the calculation formula of the effect evaluation index of the (r+1) -th initial segment point in the actual water use data and the historical water use data can be expressed as:
wherein,an effect evaluation index indicating a second target segment point and a second matching segment point, r+1 indicating an (r+1) -th initial segment point,>representing a matching actual data sequence->Representing the number of different data values contained in the matching actual data sequence, ±>Indicating the number of presence of the x-th value in the matching actual data sequence,/for>Representing the frequency of occurrence of the x-th value in the matching actual data sequence; />Representing a matching history data sequence,/->Representing the number of different data values contained in the matching history data sequence, ±>Indicating the number of presence of the x-th value in the matching history data sequence,/for>Representing the frequency of occurrence of the x-th value in the matching history data sequence; />Representing the actual data sequence of the object->Representing the number of different data values contained in the target actual data sequence, ±>Indicating the number of x-th values present in the target actual data sequence,/for the target actual data sequence>Representing the frequency of occurrence of the x-th numerical value contained in the target actual data sequence; />Representing a target history data sequence,/->Representing the number of different data values contained in the target historical data sequence, +.>Indicating the number of x-th value present in the target history data sequence,/for>Representing the frequency of occurrence of the x-th value contained in the target history data sequence.
In each data sequence, the frequency of occurrence of each of the different values reflects the degree of repetition of the water use data in the corresponding data sequence, as exemplified by matching the actual data sequence,reflects the repeated probability of the x-th data value in the matched actual data sequence, carries out product calculation by using the corresponding existing quantity of the data value as a coefficient,the method can reflect the repeatability of the value of the x-th data in the matched actual data sequence, and the higher the value is, the better the effect of the dividing mode of the current matched actual data sequence is, the higher the repeatability of the data is, and the better the effect of splitting and compressing by utilizing the dividing result is.
Similarly, according to the same analysis mode, the molecules in the formula represent the (r) initial segmentation point and the (r+1) initial segmentation point in the historical water data, after the historical water data and the actual water data are split respectively, the higher the repetition degree in the data sequence is, the better the splitting effect of the (r) initial segmentation point and the (r+1) initial segmentation point in the historical water data on the water data to be analyzed is.
The denominator of the formula characterizes the repeated degree condition in the data sequence after the historical water use data and the actual water use data are split by utilizing the (r) initial segmentation point and the (r+1) initial segmentation point in the actual water use data, and the greater the repeated degree condition is, the better the splitting effect of the water use data to be analyzed by utilizing the (r) initial segmentation point and the (r+1) initial segmentation point in the actual water use data is.
Based on this, in the present embodiment, the value of the preset numerical value is set to 1. When the effect evaluation index of the second target segmentation point and the second matching segmentation point is larger than a preset value, the fact that the numerator is larger than the denominator in the formula is explained, and further, the fact that the data repeatability of splitting the water data to be analyzed by utilizing the r initial segmentation point and the r+1th initial segmentation point in the historical water data is larger than the data repeatability of splitting the water data to be analyzed is explained, and further, the fact that the dividing effect of the initial segmentation point in the historical water data is better is explained, and further, the r+1th initial segmentation point in the historical water data is used as an optimal segmentation point, namely, the second matching segmentation point is used as the optimal segmentation point in the water data to be analyzed.
When the effect evaluation indexes of the second target segmentation point and the second matching segmentation point are smaller than the preset value, the fact that the numerator in the formula is smaller than the denominator and further the fact that the data repetition degree of splitting the water data to be analyzed by utilizing the (r) th initial segmentation point and the (r+1) th initial segmentation point in the historical water data is smaller than the data repetition degree of splitting the water data to be analyzed by utilizing the (r) th initial segmentation point and the (r+1) th initial segmentation point in the actual water data is further the fact that the dividing effect of the initial segmentation point in the actual water data is better is further the fact that the (r+1) th initial segmentation point in the actual water data is used as the optimal segmentation point is further achieved, namely the second target segmentation point is the optimal segmentation point in the water data to be analyzed.
When the effect evaluation index of the second target segment point and the second matching segment point is equal to the preset value, it is noted that the data repetition of the two dividing modes is equal, and either dividing mode may be used. At the same time, let r=0 in the initial calculation, namely, the first actual water consumption data in one day is used as the initial data of the data sequence, and the first initial segmentation point is used as the cut-off data of the data sequence for analysis in sequence. If the first initial segmentation point is judged to be the first optimal segmentation point, the first optimal segmentation point is taken as a first target segmentation point, the next initial segmentation point adjacent to the first optimal segmentation point is taken as a second target segmentation point, the corresponding first matching segmentation point and the second matching segmentation point are obtained, the obtaining and analyzing operation of the optimal segmentation point is carried out, and the like, and the process is stopped when all actual or historical water use data are traversed. And a plurality of optimal segmentation points can be obtained, and the effect of dividing actual water consumption data and historical water consumption data by using the optimal segmentation points is good.
And step four, carrying out sectional compression processing on the water data to be analyzed by utilizing the optimal sectional point, and storing the data subjected to sectional compression to obtain split water consumption electronic document data.
And respectively carrying out segmentation processing on actual water consumption data and historical water consumption data in the water consumption data to be analyzed by utilizing the optimal segmentation points, carrying out compression processing on the segmented data by utilizing a Huffman coding algorithm to obtain segmented compressed data, and further storing the segmented compressed data to obtain split water consumption electronic document data.
It should be noted that, the obtained optimal segmentation point may be some data in the historical water consumption data or some data in the actual water consumption data, and because the historical water consumption data and the actual water consumption data have a certain corresponding relationship at each time in a day, the optimal segmentation point can find the water consumption data at the corresponding time in the historical or actual water consumption data, and then split the historical or actual water consumption data by using the corresponding water consumption data, so that the effect of compressing the split data is better.
An electronic document splitting optimization management system embodiment:
the embodiment provides an electronic document splitting optimization management system, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the steps of an electronic document splitting optimization management method when being executed by the processor. Since an embodiment of an electronic document splitting optimization management method has been described in detail, it will not be described in detail.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims (6)

1. An electronic document splitting optimization management method is characterized by comprising the following steps:
real water consumption data of the water consumption in the factory at different moments in a preset time period are collected in real time, and historical water consumption data of the water consumption in the factory at different moments in the preset time period are collected in the historical water consumption; the actual water use data and the historical water use data are water use data to be analyzed;
obtaining probability indexes of each piece of water data to be analyzed as data segmentation points according to the distribution condition of the water data to be analyzed in the water data neighborhood range to be analyzed; screening the water data to be analyzed according to the probability index to determine initial segmentation points;
obtaining an optimal segmentation point according to the segmentation result of the water data to be analyzed by the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed by the initial segmentation point in the actual water data;
carrying out sectional compression processing on water data to be analyzed by utilizing an optimal sectional point, and storing the data subjected to sectional compression to obtain split water consumption electronic document data;
the method comprises the steps of obtaining an optimal segmentation point according to the segmentation result of the water data to be analyzed by the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed by the initial segmentation point in the actual water data, and specifically comprises the following steps:
obtaining an effect evaluation index of the initial segmentation point according to the segmentation result of the water data to be analyzed of the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed of the initial segmentation point in the actual water data;
screening the initial segmentation points according to the effect evaluation index to determine optimal segmentation points;
the method comprises the steps of obtaining an effect evaluation index of an initial segmentation point according to the segmentation result of the water data to be analyzed of the initial segmentation point in the historical water data and the segmentation result of the water data to be analyzed of the initial segmentation point in the actual water data, and specifically comprises the following steps:
recording any initial segmentation point in the actual water use data as a first target segmentation point, recording the next initial segmentation point adjacent to the first target segmentation point as a second target segmentation point, and recording initial segmentation points with the same position serial numbers as the first and second target segmentation points in the historical water use data in the actual water use data as a first matching segmentation point and a second matching segmentation point respectively;
acquiring actual water consumption data between a first target segmentation point and a second target segmentation point to form a target actual data sequence, and acquiring historical water consumption data between the first target segmentation point and the second target segmentation point to form a target historical data sequence; acquiring actual water consumption data between a first matching segmentation point and a second matching segmentation point to form a matching actual data sequence, and acquiring historical water consumption data between the first matching segmentation point and the second matching segmentation point to form a matching historical data sequence;
obtaining an effect evaluation index of a second target segmentation point and a second matching segmentation point according to the target actual data sequence, the target historical data sequence, the matching actual data sequence and the data distribution condition in the matching historical data sequence;
the calculation formula of the effect evaluation index specifically comprises:
wherein,an effect evaluation index indicating a second target segment point and a second matching segment point, r+1 indicating an (r+1) -th initial segment point,>representing a matching actual data sequence->Representing the number of different data values contained in the matching actual data sequence, ±>Indicating the number of presence of the x-th value in the matching actual data sequence,/for>Representing the frequency of occurrence of the x-th value in the matching actual data sequence; />Representing a matching history data sequence,/->Representing the number of different data values contained in the matching history data sequence, ±>Representing matching historyThe number of x-th value present in the data sequence, is->Representing the frequency of occurrence of the x-th value in the matching history data sequence; />Representing the actual data sequence of the object,representing the number of different data values contained in the target actual data sequence, ±>Indicating the number of x-th values present in the target actual data sequence,/for the target actual data sequence>Representing the frequency of occurrence of the x-th numerical value contained in the target actual data sequence; />Representing a target history data sequence,/->Representing the number of different data values contained in the target historical data sequence, +.>Indicating the number of x-th value present in the target history data sequence,/for>Representing the frequency of occurrence of the x-th numerical value contained in the target history data sequence;
according to the distribution condition of the water data to be analyzed in the water data neighborhood range to be analyzed, the probability index that each water data to be analyzed is a data segmentation point is obtained, and the method specifically comprises the following steps:
recording any one water data to be analyzed as selected water data, and forming a left neighborhood data sequence of the selected water data by a preset number of adjacent water data to be analyzed before the selected water data; forming a right neighborhood data sequence of the selected water data by using adjacent preset number of water data to be analyzed after the water data are selected;
and obtaining probability indexes of the selected water data as data segmentation points according to the matching relation of the left neighborhood data sequence and the right neighborhood data sequence.
2. The method for optimizing and managing splitting electronic documents according to claim 1, wherein the step of screening the initial segmentation points according to the effect evaluation index to determine the optimal segmentation points comprises the following steps:
when the effect evaluation index of the second target segmentation point and the second matching segmentation point is larger than a preset value, the second matching segmentation point is the optimal segmentation point in the water data to be analyzed; and when the effect evaluation indexes of the second target segmentation point and the second matching segmentation point are smaller than a preset value, the second target segmentation point is the optimal segmentation point in the water data to be analyzed.
3. The method for optimizing and managing the splitting of the electronic document according to claim 1, wherein the obtaining the probability index of the selected water consumption data as the data segmentation point according to the matching relation between the left neighborhood data sequence and the right neighborhood data sequence specifically comprises:
the method comprises the steps of obtaining the number of data, which are the same in a right neighborhood data sequence, of all data in a left neighborhood data sequence as a first number, and obtaining the number of data, which are the same in a left neighborhood data sequence, of all data in a right neighborhood data sequence as a second number; and obtaining a probability index of selecting the water data as the data segmentation point according to the maximum value in the first quantity and the second quantity, wherein the maximum value and the probability index are in a negative correlation.
4. The method for optimizing and managing the splitting of the electronic document according to claim 1, wherein the screening the water data to be analyzed according to the probability index to determine the initial segmentation point specifically comprises:
and recording the water data to be analyzed corresponding to the probability index being greater than or equal to a preset probability threshold as an initial segmentation point.
5. The method for optimizing and managing splitting electronic documents according to claim 1, wherein the step of performing the segment compression processing on the water data to be analyzed by using the optimal segment point comprises the following steps:
and respectively carrying out segmentation processing on actual water use data and historical water use data in the water use data to be analyzed by utilizing the optimal segmentation points, and carrying out compression processing on the segmented data by utilizing a Huffman coding algorithm.
6. An electronic document splitting optimisation management system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the computer program when executed by the processor implements the steps of an electronic document splitting optimisation management method as claimed in any one of claims 1 to 5.
CN202311605670.1A 2023-11-29 2023-11-29 Electronic document splitting optimization management method and system Active CN117312255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311605670.1A CN117312255B (en) 2023-11-29 2023-11-29 Electronic document splitting optimization management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311605670.1A CN117312255B (en) 2023-11-29 2023-11-29 Electronic document splitting optimization management method and system

Publications (2)

Publication Number Publication Date
CN117312255A CN117312255A (en) 2023-12-29
CN117312255B true CN117312255B (en) 2024-02-20

Family

ID=89285034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311605670.1A Active CN117312255B (en) 2023-11-29 2023-11-29 Electronic document splitting optimization management method and system

Country Status (1)

Country Link
CN (1) CN117312255B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114117878A (en) * 2021-11-29 2022-03-01 中国人民解放军国防科技大学 Target motion trajectory segmented compression method based on improved particle swarm optimization
CN116485445A (en) * 2023-03-10 2023-07-25 华能昌邑风力发电有限公司 New energy power spot transaction auxiliary system based on data automatic acquisition and processing
WO2023207039A1 (en) * 2022-04-28 2023-11-02 北京百度网讯科技有限公司 Data processing method and apparatus, and device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241241B (en) * 2020-01-08 2024-05-31 平安科技(深圳)有限公司 Case retrieval method, device, equipment and storage medium based on knowledge graph

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114117878A (en) * 2021-11-29 2022-03-01 中国人民解放军国防科技大学 Target motion trajectory segmented compression method based on improved particle swarm optimization
WO2023207039A1 (en) * 2022-04-28 2023-11-02 北京百度网讯科技有限公司 Data processing method and apparatus, and device and storage medium
CN116485445A (en) * 2023-03-10 2023-07-25 华能昌邑风力发电有限公司 New energy power spot transaction auxiliary system based on data automatic acquisition and processing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于分段线性表示k最近邻的水质预测方法;王保良;范昊;冀海峰;黄志尧;李海青;;环境工程学报(02);全文 *
海量数据库的查询优化研究及实现;周建鸿;;西南民族大学学报(自然科学版)(04);全文 *

Also Published As

Publication number Publication date
CN117312255A (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN110378468A (en) A kind of neural network accelerator quantified based on structuring beta pruning and low bit
CN106649026B (en) Monitoring data compression method suitable for operation and maintenance automation system
CN102611454B (en) Dynamic lossless compressing method for real-time historical data
CN109727446B (en) Method for identifying and processing abnormal value of electricity consumption data
Chandak et al. LFZip: Lossy compression of multivariate floating-point time series data via improved prediction
CN115359807B (en) Noise online monitoring system for urban noise pollution
CN113328755B (en) Compressed data transmission method facing edge calculation
CN116208172B (en) Data management system for building engineering project
CN115987295A (en) Crop monitoring data efficient processing method based on Internet of things
CN110995273B (en) Data compression method, device, equipment and medium for power database
CN116915259B (en) Bin allocation data optimized storage method and system based on internet of things
CN115219067B (en) Real-time state monitoring method for garlic storage
CN117235557B (en) Electrical equipment fault rapid diagnosis method based on big data analysis
CN116975503B (en) Soil erosion information management method and system
EP4280088A1 (en) Time series data trend feature extraction method based on dynamic grid division
CN117316301B (en) Intelligent compression processing method for gene detection data
CN116418882B (en) Memory data compression method based on HPLC dual-mode carrier communication
CN116702708B (en) Road pavement construction data management system
CN117478891A (en) Intelligent management system for building construction
CN117312255B (en) Electronic document splitting optimization management method and system
CN116366069B (en) High-performance concrete proportioning data processing method
CN117542488A (en) Intelligent processing method for brain tumor CT data
CN116631563B (en) Big data storage and intelligent matching method for pharmaceutical industry
CN113381767A (en) Method, terminal and storage medium for electrocardiogram data compression
CN116561927A (en) Digital twin-driven small sample rotary machine residual life prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant