CN115495499B - Integrated statistical method based on contaminated site same-medium multi-batch mass data - Google Patents

Integrated statistical method based on contaminated site same-medium multi-batch mass data Download PDF

Info

Publication number
CN115495499B
CN115495499B CN202211169793.0A CN202211169793A CN115495499B CN 115495499 B CN115495499 B CN 115495499B CN 202211169793 A CN202211169793 A CN 202211169793A CN 115495499 B CN115495499 B CN 115495499B
Authority
CN
China
Prior art keywords
data
index
module
excel
sheet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211169793.0A
Other languages
Chinese (zh)
Other versions
CN115495499A (en
Inventor
李旭伟
邓绍坡
孔令雅
豆叶枝
谢文逸
刘国强
王梦杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Environmental Sciences MEE
Original Assignee
Nanjing Institute of Environmental Sciences MEE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Environmental Sciences MEE filed Critical Nanjing Institute of Environmental Sciences MEE
Priority to CN202211169793.0A priority Critical patent/CN115495499B/en
Publication of CN115495499A publication Critical patent/CN115495499A/en
Priority to JP2023052527A priority patent/JP7360000B1/en
Application granted granted Critical
Publication of CN115495499B publication Critical patent/CN115495499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an integration statistical method based on mass data of the same medium in a plurality of batches of polluted sites, and belongs to the technical field of polluted site data processing. The integration statistical method comprises the following steps: s1, acquiring pollution site data; s2, identifying and updating characteristic indexes; s3, data induction integration and data statistical analysis; s4, outputting the pollution site data after integration statistics. The invention solves the problem that the traditional method for acquiring, merging, cleaning and mining the field environmental data is difficult to meet the requirement of mass data processing, and has the advantages of greatly accelerating the data processing speed, saving time and saving economic cost.

Description

Integrated statistical method based on contaminated site same-medium multi-batch mass data
Technical Field
The invention relates to the technical field of polluted site data processing, in particular to an integrated statistical method based on mass data of the same medium as a polluted site in batches.
Background
In recent years, with the progress of urban design and the acceleration of industry adjustment and transfer, a large number of industrial enterprises are shut down or moved, leaving a large number of contaminated sites. At present, a large number of sites enter investigation, repair, management and control and long-term monitoring stages, and all stages take sampling points as sources for data acquisition to reveal pollution space distribution, space-time evolution situation, pollution characteristics of different environmental media and the like. Particularly, along with the strengthening of the management of polluted plots in China, long-time sequence acquisition of multiple batches of data is required in the stages of repair, management and long-term monitoring, the distribution and change trend of the data of sampling points of the plots are analyzed from different angles, pollution causes and influence factors are judged, and the local variation, the spatial discrete characteristic and the local characteristic influenced by point source pollution of the same medium pollution index are revealed.
Along with the development of space information technology, trend analysis theory is widely applied to the field of polluted sites, and accordingly the mining and analysis requirements for mass data of the polluted sites are greatly increased, and the processing requirements for the mass data are difficult to meet by the traditional site environment data acquisition, merging, cleaning and mining methods.
The invention perfects the mode of traditional statistical analysis of the polluted site information in China, greatly saves the cost of acquiring the polluted site information and ensures the comprehensiveness of the information; and the integration efficiency of the related data of the polluted site and the analysis working capacity of the bearing unit are improved, and the time and the economic cost are saved.
Disclosure of Invention
The invention solves the technical problems that: traditional field environment data acquisition, merging, cleaning and mining methods are difficult to meet the processing requirements of mass data.
In order to solve the problems, the technical scheme of the invention is as follows:
an integration statistical method based on mass data of the same medium in a polluted site and multiple batches comprises the following steps:
s1, acquiring massive contaminated site data
Dividing the polluted site data into characteristic indexes and index data corresponding to the characteristic indexes, redistributing the polluted site data, converting the index data expressed as character string types in a digital form into data types, and finally storing the polluted site data according to a corresponding mode of the characteristic indexes and the index data;
s2, feature index identification and updating
Determining the characteristic indexes output after integration statistics, and identifying and updating the characteristic indexes in the stored polluted site data through the output characteristic indexes;
s3, induction integration and statistical analysis of contaminated site data
Selecting positioning conditions through a search column, inducing and integrating the polluted site data according to the positioning conditions, and completing batch statistical analysis of the induced and integrated polluted site data by taking the polluted site data in the form of stream flow as an input parameter of a math function to obtain the integrated and counted polluted site data;
s4, outputting the pollution site data after integration statistics.
In the above method, the redistributing includes: row-column, row-combining, column-combining.
Further, the data magnitude of the multi-batch mass data is millions.
Further, the characteristic index includes a detection index, a detection limit, and a unit.
Further, in step S1, the format of the contaminated site data is excel, and in step S5, the format of the contaminated site data after the statistics is integrated is excel.
Further, step S1 further includes:
downloading a contaminated site data input template, writing the contaminated site data into the contaminated site data input template, reading the contaminated site data in an excel format through a file output stream, judging the format, and storing all the data into an hssfWorkbook after meeting the format requirement, and waiting for further processing.
Further, step S1 further includes:
creating two data storage lists, wherein one data storage list is used for storing data with a data structure of an areaList, the other data storage list is used for storing data with a data structure of a fullllist, the data with the data structure of the list corresponds to the characteristic index of each sheet in excel, the data with the data structure of the areaList/fullllist corresponds to all index data of each sheet in excel, then creating a sheet list for storing sheet basic data, the sheet basic data is a sheet name, a sheet first line and data in the first line,
the operation process of the contaminated site data storage is as follows:
and traversing the contaminated site data in the excel format through double-layer circulation, reading each data of each sheet according to the sequence of sheets in the excel, storing the characteristic index names in each sheet into an area list after carrying out the de-hollowing and de-duplication operation, and storing all index data in each sheet into a fullllist.
Still further, the positioning conditions include: and (5) inputting the limit conditions selected by the search bar and the characteristic index positions built in the contaminated site data input template.
Preferably, in step S3, by using the contaminated site data in the form of stream as the input parameter of the math function, the following are included:
the calculation formula of the average value in the math function: the list, stream (). Mapto double (bigDecimal:. DoubleValue). Average (). GetAsdouble (), the calculation formula of the average value sets the input parameters as a group of data set list, maps the data by using the mapto double method of the stream, then calls the average averaging interface to directly calculate, finally directly converts the result into double type by the getAsdouble method,
the calculation formula of the maximum value and the minimum value in the math function: the calculation formula of the maximum value and the minimum value uses stream calculation, calls a reduction method to take the maximum value and the minimum value of a group of numbers,
the calculation mode of variance and standard deviation in math function: and calculating the sum, calculating the average value, and finally calculating the variance and the standard deviation.
Compared with the method for solving the maximum value, the minimum value and the average value of the traditional math function, the traditional math function is not only attractive in code and low in efficiency, but also can process a small amount of data; the function combines stream calculation to process, supports simultaneous processing and solving of multiple data, greatly improves the calculation speed after the accuracy rate reaches 100% through detection, adapts to mass data processing, and has the efficiency more than 4 times of that of the traditional math function. Meanwhile, the codes are concise, and the later function expansion is more convenient.
Preferably, in step S3, the batch statistical analysis includes: judging whether the character string is a numerical value, averaging, calculating a maximum value and a minimum value, calculating a variance and calculating a standard deviation.
A data processing device executes an integration statistical method based on mass data of the same medium in a polluted site, comprising the following steps:
a template downloading module for providing a contaminated site data input template, wherein the input template provided by the template downloading module is in an excel format,
an importing module for importing the data of the polluted site, wherein the data input format of the importing module is excel,
a characteristic index management module for determining the output characteristic index after the integration statistics,
a data arrangement module for identifying and updating the characteristic indexes processed by the data storage module according to the output characteristic indexes determined by the characteristic index management module,
a data storage module for storing the polluted site data according to the corresponding mode of the characteristic index and the index data after the polluted site data is imported, wherein the data storage module establishes two data storage lists, one data storage list is used for storing the data with the data structure of an area list, the other data storage list is used for storing the data with the data structure of a fullllist,
wherein the data with the data structure of list corresponds to the characteristic index of each sheet in excel, the data with the data structure of area list/fullllist corresponds to all index data of each sheet in each excel,
the data display module is used for displaying all the data of the polluted site after the characteristic index identification and updating are processed by the data arrangement module, the data display module also provides a search column for selecting positioning conditions,
the data calculation module is used for completing the induction integration and batch statistical analysis of the contaminated site data according to the positioning conditions, takes the contaminated site data in the form of stream flow as the input parameters of the math function, optimizes the calculation modes of average value, maximum value, minimum value, variance and standard deviation in the math function,
and the export module is used for exporting the processed contaminated site data, and the data output format of the export module is excel.
The existing data acquisition process of the polluted site data usually uses traditional manual excel manual processing or carries out processing by using a python language programming program, and if the manual excel manual processing is used, the data needs 2-3 days, so that the efficiency is low; if the python language is used for processing, software of the python environment needs to be installed in advance, and the python environment dependency exists; the data processing equipment realizes the light weight processing of data, is free from installation by a user, is suitable for the technical effect of any system environment, and is simple and quick to operate.
The data processing device is independently developed by using Jframe as architecture technology. The data processing equipment does not use the existing java framework, but is independently developed through the JFrame, so that the user is free from installation, and the data processing equipment has the technical effect of being applicable to any system environment.
The beneficial effects of the invention are as follows:
(1) The mass data processed by the integrated statistical method provided by the invention is in a million level, the processing time of the mass data in a single process is within 5 seconds, and compared with the time and labor cost of 2 to 3 days required by manual excel manual processing of the contaminated site data in the field, the processing speed is greatly improved, and a large amount of labor, financial resources and time are saved for the whole contaminated site data arrangement and processing project;
(2) The data processing equipment provided by the invention is independently developed by taking the JFrame as a framework technology, optimizes the Math function by combining the stream calculation of java, completes the multi-dimensional calculation of batch data, realizes the light weight processing of the data, is free from installation by a user, is suitable for the technical effect of any system environment, and is simple and quick to operate.
(3) The invention combines collected data with data volume which exceeds the manual processing volume by hundreds times and even tens of thousands times to carry out the integral, unified and batched conversion mode through the concept of structured data. The scattered data are integrated according to format requirements, so that the time cost of the previous data processing can be greatly reduced, the method is used for analyzing the basic structure of pollution characterization and trend by using a next mapping table, so that the space and time distribution trend of pollutants can be identified, and the overall change trend of the pollutants in a field and pollution cause analysis can be judged and revealed.
Drawings
FIG. 1 is a flow chart of an integrated statistical method based on mass data of multiple batches of the same medium in a contaminated site in embodiment 1;
FIG. 2 is a frame diagram of a data processing apparatus according to embodiment 2;
the system comprises a 101-template downloading module, a 102-importing module, a 103-characteristic index management module, a 104-data arrangement module, a 105-data storage module, a 106-data display module, a 107-data calculation module and a 108-exporting module.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two.
Example 1
The embodiment is an integration statistical method based on mass data of the same medium in the polluted site and in multiple batches, the mass data is in the order of millions, as shown in fig. 1, and the method comprises the following steps:
s1, acquiring massive contaminated site data
Downloading a contaminated site data input template, writing contaminated site data into the contaminated site data input template, reading the contaminated site data in an excel format through a file output stream, judging the format, storing all the data into hssfWorkbook after meeting the format requirement, waiting for further processing,
dividing the polluted site data into characteristic indexes and index data corresponding to the characteristic indexes, redistributing the polluted site data, converting the index data expressed as character string types in digital form into data types, storing the polluted site data according to the corresponding mode of the characteristic indexes and the index data,
creating two data storage lists, wherein one data storage list is used for storing data with a data structure of an areaList, the other data storage list is used for storing data with a data structure of a fullllist, the data with the data structure of the list corresponds to the characteristic index of each sheet in excel, the data with the data structure of the areaList/fullllist corresponds to all index data of each sheet in excel, then creating a sheet list for storing sheet basic data, the sheet basic data is a sheet name, a sheet first line and data in the first line,
the operation process of the contaminated site data storage is as follows:
reading each data of each sheet according to the sequence of sheets in the excel by traversing the contaminated site data in the excel format in a double-layer circulation mode, storing the characteristic index names in each sheet into an area list after carrying out the de-duplication operation, storing all index data in each sheet into a fullllist,
the characteristic indexes comprise: detecting the index, the detection limit and the unit,
the index data includes: acetone, 2-propanol, vinyl acetate, 2-butanone, ethyl acetate, 4-methyl-2-pentanone, 2-hexanone, carbon disulfide, benzene, toluene, ethylbenzene, p-m-xylene, styrene, o-xylene, cumene, m-ethyltoluene, 1,3, 5-trimethylbenzene, o-ethyltoluene, 1,2, 4-trimethylbenzene, 12, 3-trimethylbenzene, m-diethylbenzene, chlorodifluoromethane;
s2, feature index identification and updating
Determining the characteristic indexes output after integration statistics, and identifying and updating the characteristic indexes in the stored polluted site data through the output characteristic indexes;
s3, induction integration and statistical analysis of contaminated site data
Selecting positioning conditions through a search column, and integrating the polluted site data according to the positioning conditions, wherein the positioning conditions comprise: the limiting conditions selected by the search column and the characteristic index positions built in the pollution site data input template are used for completing batch statistical analysis of the integrated pollution site data by taking the pollution site data in the stream form as input parameters of the math function, wherein the batch statistical analysis comprises the following steps: averaging, maximum and minimum values, variance and standard deviation to obtain integrated and counted contaminated site data, wherein the contaminated site data in the form of stream is used as input parameters of a math function and comprises the following contents:
the calculation formula of the average value in the math function: the list, stream (). Mapto double (bigDecimal:. DoubleValue). Average (). GetAsdouble (), the calculation formula of the average value sets the input parameters as a group of data set list, maps the data by using the mapto double method of the stream, then calls the average averaging interface to directly calculate, finally directly converts the result into double type by the getAsdouble method,
where list. Stream () means to pack a collection as a stream, mapToDouble () means to map data, average () means to average,
the calculation formula of the maximum value and the minimum value in the math function: the calculation formula of the maximum value and the minimum value uses stream calculation, calls a reduction method to take the maximum value and the minimum value of a group of numbers,
where list, stream () represents the grouping of collections into streams, reduce () represents the ordering function,
the calculation mode of variance and standard deviation in math function: calculating the sum, calculating the average value, and finally calculating the variance and the standard deviation;
s4, outputting pollution site data after integration statistics, wherein the pollution site data comprises the following contents:
creating a new file workbook for exporting the data analyzed in the step S4, circularly and sequentially taking in parameter Titles, lists, usetnameList and areaList, titles of exported files corresponding to the Titles, lists being stored fulllList data, usetnameList being stored sheet name list for quick screening, areaList being index name set,
and writing the analyzed data into a workbook in a traversing way, circularly creating sheets, circularly creating a data row in each sheet, adding a detection limit and a unit, sequentially writing corresponding data, calling a file tool class after writing all sheets is completed, setting export parameters, closing a data stream, exporting an excel file, and prompting a user to process a result.
Example 2
The present embodiment is a data processing apparatus, configured to execute an integrated statistical method of embodiment 1 based on mass data of multiple batches of the same medium in a contaminated site, as shown in fig. 2, including:
a template downloading module 101 for providing a contaminated site data input template, wherein the input template format provided by the template downloading module 101 is excel,
an importing module 102 for importing contaminated site data, wherein a data input format of the importing module 102 is excel,
a feature index management module 103 for determining an integrated statistical output feature index,
a data sorting module 104 for identifying and updating the feature indexes processed by the data storage module according to the output feature indexes determined by the feature index management module 103,
a data storage module 105 for storing the contaminated site data according to the corresponding mode of the characteristic index and the index data after the contaminated site data is imported, wherein the data storage module 105 establishes two data storage lists, one of which is used for storing the data with the data structure of an area list, the other data storage list is used for storing the data with the data structure of a fullllist,
wherein the data with the data structure of list corresponds to the characteristic index of each sheet in excel, the data with the data structure of area list/fullllist corresponds to all index data of each sheet in each excel,
a data display module 106 for displaying all contaminated site data after feature index identification and updating by the data sort module 104, the data display module 106 also providing a search bar for selecting positioning conditions,
the data calculation module 107 is used for completing the induction integration and batch statistical analysis of the contaminated site data according to the positioning conditions, the data calculation module 107 takes the contaminated site data in the form of stream flow as the input parameters of the math function, optimizes the calculation modes of the average value, the maximum value, the minimum value, the variance and the standard deviation in the math function,
and the export module 108 is used for exporting the processed contaminated site data, and the data output format of the export module 108 is excel.

Claims (7)

1. The integrated statistical method for the mass data of the same medium in multiple batches based on the polluted site is characterized by comprising the following steps of:
s1, acquiring massive contaminated site data
Dividing the polluted site data into characteristic indexes and index data corresponding to the characteristic indexes, redistributing the polluted site data, converting the index data expressed as character string types in a digital form into data types, storing the polluted site data according to the corresponding mode of the characteristic indexes and the index data, downloading a polluted site data input template, writing the polluted site data into the polluted site data input template, reading the polluted site data in an excel format through a file output stream, judging the format, storing all the data into hssfWorkbook after meeting the format requirement, waiting for further processing,
creating two data storage lists, wherein one data storage list is used for storing data with a data structure of an areaList, the other data storage list is used for storing data with a data structure of a fullllist, the data with the data structure of the list corresponds to the characteristic index of each sheet in excel, the data with the data structure of the areaList/fullllist corresponds to all index data of each sheet in excel, then creating a sheet list for storing sheet basic data, the sheet basic data is a sheet name, a sheet first line and data in the first line,
the operation process of the contaminated site data storage is as follows:
reading each data of each sheet according to the sequence of sheets in the excel by traversing the contaminated site data in the excel in a double-layer circulation mode, storing the characteristic index names in each sheet into an area list after carrying out the void and duplicate removal operation, and storing all index data in each sheet into a fullllist;
s2, feature index identification and updating
Determining the characteristic indexes output after integration statistics, and identifying and updating the characteristic indexes in the stored polluted site data through the output characteristic indexes;
s3, induction integration and statistical analysis of contaminated site data
Selecting positioning conditions through a search column, inducing and integrating the polluted site data according to the positioning conditions, and completing batch statistical analysis of the induced and integrated polluted site data by taking the polluted site data in the form of stream flow as an input parameter of a math function to obtain the integrated and counted polluted site data;
s4, outputting the pollution site data after integration statistics.
2. The method for integrating and counting the mass data based on the same medium in the contaminated sites as described in claim 1, wherein in the step S1, the format of the contaminated site data is excel, and in the step S3, the format of the contaminated site data after integrating and counting is excel.
3. The method for integrating and counting the mass data based on the same medium in the polluted site and the plurality of batches as claimed in claim 1, wherein the characteristic indexes comprise detection indexes, detection limits and units.
4. The method for integrating and counting the mass data based on the same medium in the contaminated site according to claim 1, wherein in the step S3, the positioning conditions include: and (5) inputting the limit conditions selected by the search bar and the characteristic index positions built in the contaminated site data input template.
5. The method for integrating and counting the mass data based on the same medium as the contaminated sites according to claim 1, wherein in the step S3, the contaminated site data in the form of stream is used as the input parameter of the math function, and the method comprises the following steps:
the calculation formula of the average value in the math function: the list, stream (). Mapto double (bigDecimal:. DoubleValue). Average (). GetAsdouble (), the calculation formula of the average value sets the input parameters as a group of data set list, maps the data by using the mapto double method of the stream, then calls the average averaging interface to directly calculate, finally directly converts the result into double type by the getAsdouble method,
the calculation formula of the maximum value and the minimum value in the math function: the calculation formula of the maximum value and the minimum value uses stream calculation, calls a reduction method to take the maximum value and the minimum value of a group of numbers,
the calculation mode of variance and standard deviation in math function: and calculating the sum, calculating the average value, and finally calculating the variance and the standard deviation.
6. The method for integrating and counting the mass data based on the same medium as the contaminated site in batches as in claim 1, wherein in the step S3, the batch statistical analysis includes: averaging, maximum and minimum values, variance and standard deviation.
7. A data processing apparatus for performing an integrated statistical method based on contaminated site co-media multi-batch mass data as claimed in claims 1-6, comprising:
a template downloading module (101) for providing a contaminated site data input template, wherein the input template format provided by the template downloading module (101) is excel,
an importing module (102) for importing contaminated site data, the importing module (102) having a data input format of excel,
a feature index management module (103) for determining an integrated statistical output feature index,
a data storage module (105) for storing the contaminated site data according to the corresponding mode of the characteristic index and the index data after the contaminated site data is imported, wherein the data storage module (105) establishes two data storage lists, one of which is used for storing the data with the data structure of an area list, the other data storage list is used for storing the data with the data structure of a fullllist,
a data arrangement module (104) for identifying and updating the characteristic indexes processed by the data storage module (105) according to the output characteristic indexes determined by the characteristic index management module (103),
wherein the data with the data structure of list corresponds to the characteristic index of each sheet in excel, the data with the data structure of area list/fullllist corresponds to all index data of each sheet in each excel,
a data display module (106) for displaying all the pollution site data after the characteristic index identification and updating are processed by the data arrangement module (104), the data display module (106) also provides a search column for selecting positioning conditions,
a data calculation module (107) for completing the induction integration and batch statistical analysis of the contaminated site data according to the positioning conditions, wherein the data calculation module (107) takes the contaminated site data in the form of stream flow as the input parameters of the math function, optimizes the calculation modes of the average value, the maximum value, the minimum value, the variance and the standard deviation in the math function,
and the export module (108) is used for exporting the processed polluted site data, and the data output format of the export module (108) is excel.
CN202211169793.0A 2022-09-22 2022-09-22 Integrated statistical method based on contaminated site same-medium multi-batch mass data Active CN115495499B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211169793.0A CN115495499B (en) 2022-09-22 2022-09-22 Integrated statistical method based on contaminated site same-medium multi-batch mass data
JP2023052527A JP7360000B1 (en) 2022-09-22 2023-03-29 Integrated statistical system and method based on batch data of the same medium at contaminated sites

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211169793.0A CN115495499B (en) 2022-09-22 2022-09-22 Integrated statistical method based on contaminated site same-medium multi-batch mass data

Publications (2)

Publication Number Publication Date
CN115495499A CN115495499A (en) 2022-12-20
CN115495499B true CN115495499B (en) 2023-05-30

Family

ID=84471096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211169793.0A Active CN115495499B (en) 2022-09-22 2022-09-22 Integrated statistical method based on contaminated site same-medium multi-batch mass data

Country Status (2)

Country Link
JP (1) JP7360000B1 (en)
CN (1) CN115495499B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512336A (en) * 2015-12-29 2016-04-20 中国建设银行股份有限公司 Method and device for mass data processing based on Hadoop
CN111367911A (en) * 2020-03-02 2020-07-03 上海市岩土地质研究院有限公司 Site environment data analysis method and system
CN112164136A (en) * 2020-09-14 2021-01-01 浙江省环境科技有限公司 Underground water intelligent supervision platform based on three-dimensional geological model
CN114356864A (en) * 2021-12-09 2022-04-15 浪潮云信息技术股份公司 Method and system for importing excel files in batch in domestic environment

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11224837A (en) * 1998-02-04 1999-08-17 Sharp Corp Contamination managing system for processing device
CN106407714A (en) * 2016-10-14 2017-02-15 珠海富鸿科技有限公司 Air pollution assessment method and device based on CALPUFF system
CN107577909B (en) * 2017-07-31 2020-09-01 武汉工程大学 Optimization method for environmental air quality monitoring big data statistics
CN107525907B (en) * 2017-10-16 2019-12-31 中国环境科学研究院 Multi-objective optimization method for underground water pollution monitoring network
EP3712735A1 (en) * 2019-03-22 2020-09-23 L'air Liquide, Societe Anonyme Pour L'etude Et L'exploitation Des Procedes Georges Claude Method for detecting anomalies in a water treatment facility
CN112085241B (en) * 2019-06-12 2024-03-22 江苏汇环环保科技有限公司 Environmental big data analysis and decision platform based on machine learning
CN110297921A (en) * 2019-06-24 2019-10-01 南京邮电大学 A kind of atmosphere pollution unmanned plane traceability system and method based on big data technology
CN111651432B (en) * 2020-06-11 2024-04-23 中科山水(北京)科技信息有限公司 Space-time information identification method for suspected contaminated sites
CN112163724A (en) * 2020-08-05 2021-01-01 宁夏无线互通信息技术有限公司 Environment information data resource integration system
CN112347155B (en) * 2020-10-29 2023-11-21 南京大学 Site pollution characteristic factor identification and monitoring index optimization method based on data mining
CN114996410A (en) * 2022-06-30 2022-09-02 济南市环境研究院(济南市黄河流域生态保护促进中心) Method for automatically integrating and sharing environment data resources

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512336A (en) * 2015-12-29 2016-04-20 中国建设银行股份有限公司 Method and device for mass data processing based on Hadoop
CN111367911A (en) * 2020-03-02 2020-07-03 上海市岩土地质研究院有限公司 Site environment data analysis method and system
CN112164136A (en) * 2020-09-14 2021-01-01 浙江省环境科技有限公司 Underground water intelligent supervision platform based on three-dimensional geological model
CN114356864A (en) * 2021-12-09 2022-04-15 浪潮云信息技术股份公司 Method and system for importing excel files in batch in domestic environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
某焦化场地土壤多环芳烃污染数据的统计特征;刘庚;毕如田;王世杰;魏文侠;李发生;郭观林;;应用生态学报(06);全文 *

Also Published As

Publication number Publication date
CN115495499A (en) 2022-12-20
JP2024046580A (en) 2024-04-03
JP7360000B1 (en) 2023-10-12

Similar Documents

Publication Publication Date Title
CN104899314B (en) A kind of parentage analysis method and apparatus of data warehouse
CN104866580A (en) Method for quickly detecting impact caused by database modification to current service
CN109522011B (en) Code line recommendation method based on context depth perception of programming site
CN107844414A (en) A kind of spanned item mesh based on defect report analysis, parallelization defect positioning method
CN107004141A (en) To the efficient mark of large sample group
CN112100200A (en) Method for automatically generating SQL (structured query language) statements based on dimension model
CN108319807B (en) High-throughput calculation screening method for doped energy material
CN112347071B (en) Power distribution network cloud platform data fusion method and power distribution network cloud platform
CN101055566B (en) Function collection method and device of electronic data table
CN105740477A (en) Function searching method for large-scale embedded device firmware and search engine
CN115495499B (en) Integrated statistical method based on contaminated site same-medium multi-batch mass data
CN107515919A (en) A kind of ID method for numbering serial comprising classification information
CN112598142B (en) Wind turbine maintenance working quality inspection auxiliary method and system
CN111984673B (en) Fuzzy retrieval method and device for tree structure of power grid electric energy metering system
CN116822926A (en) Delay statistics and analysis method and device, electronic equipment and storage medium
CN115983582A (en) Data analysis method and energy consumption management system
CN115510361A (en) Data chart page generation method based on visual configuration and related equipment
CN115295083A (en) RNA-Seq sequencing data analysis method
Göker et al. Opm: an R package for analysing phenotype microarray and growth curve data
CN113127647A (en) Big data analysis-based process knowledge base construction method
CN113222455A (en) Generator set parameter name matching method based on modular decomposition and matching
CN106248903A (en) The processing method of coal sample data and system
CN111930815A (en) Method and system for constructing enterprise portrait based on industry attribute and business attribute
CN117251532B (en) Large-scale literature mechanism disambiguation method based on dynamic multistage matching
CN113592156B (en) Power plant coal amount scheduling method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant