CN117973343B - Intelligent processing method and system for urban rail engineering investment estimation indexes - Google Patents

Intelligent processing method and system for urban rail engineering investment estimation indexes Download PDF

Info

Publication number
CN117973343B
CN117973343B CN202410385002.0A CN202410385002A CN117973343B CN 117973343 B CN117973343 B CN 117973343B CN 202410385002 A CN202410385002 A CN 202410385002A CN 117973343 B CN117973343 B CN 117973343B
Authority
CN
China
Prior art keywords
index
data
sample data
estimation
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410385002.0A
Other languages
Chinese (zh)
Other versions
CN117973343A (en
Inventor
赵永超
刘大同
张春雷
赵彬
张建芳
朱占国
郭剑勇
王敏
朱红军
徐梦熊
王正松
张振东
杨炳晔
王辉
刘云亮
段晓霞
王明昇
胡健
薛嘉成
李哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Standard And Quota Research Institute Of Ministry Of Housing And Urban Rural Development
China Railway Design Corp
Original Assignee
Standard And Quota Research Institute Of Ministry Of Housing And Urban Rural Development
China Railway Design Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Standard And Quota Research Institute Of Ministry Of Housing And Urban Rural Development, China Railway Design Corp filed Critical Standard And Quota Research Institute Of Ministry Of Housing And Urban Rural Development
Priority to CN202410385002.0A priority Critical patent/CN117973343B/en
Publication of CN117973343A publication Critical patent/CN117973343A/en
Application granted granted Critical
Publication of CN117973343B publication Critical patent/CN117973343B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of engineering cost, and discloses an intelligent processing method and system for urban rail engineering investment estimation indexes, wherein the method comprises the following steps: s1, creating a preset template; s2, uploading an estimated index file, sequentially traversing and inserting index data in the estimated index file to corresponding positions in a preset template, and constructing an estimated index table; s3, training a sample data set through an improved clustering algorithm, and constructing an index comparison model; s4, inputting index data into an index comparison model, screening out abnormal data and carrying out identification processing on the abnormal data; s5, outputting the processed estimation index table. The improved clustering algorithm is adopted to train the sample data, so that the aggregation effect of the data is improved, the accuracy of screening abnormal data is further improved, and the timeliness and accuracy of estimating various manufacturing cost indexes are improved.

Description

Intelligent processing method and system for urban rail engineering investment estimation indexes
Technical Field
The invention relates to the technical field of engineering cost, in particular to an intelligent processing method and system for urban rail engineering investment estimation indexes.
Background
With the development of urban mass transit, urban rail transit has become an important development direction of cities. In the urban rail transit engineering investment estimation process, various factors including engineering scale, technical difficulty, operation cost and the like need to be comprehensively considered. Therefore, how to accurately and efficiently estimate the investment cost of the urban rail transit engineering has important significance for the construction and operation of the urban rail transit engineering.
At present, the application of the urban rail transit engineering investment estimation system is relatively wide. These systems typically include index data for a number of aspects, such as material costs, equipment costs, and operating costs, based on which the cost of urban rail transit works can be estimated relatively accurately. However, in the conventional practice, investment estimation of urban rail transit engineering often depends on expert experience and manual calculation of partial data, and the problems of strong subjectivity, low precision and the like exist. In addition, the engineering cost list is usually manually recorded when being recorded into the system, and data in the form are required to be input into the investment estimation system one by one. In the face of larger data volume, the manual input mode has the problems of low efficiency, inaccurate data and the like, and finally, the estimation result is inaccurate.
Therefore, there is a need for an intelligent processing method and system for urban rail engineering investment estimation indexes, which can improve the accuracy of screening abnormal data and improve the timeliness and accuracy of estimation of various manufacturing cost indexes.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intelligent processing method and an intelligent processing system for urban rail engineering investment estimation indexes, which can improve the accuracy of screening abnormal data and the timeliness and the accuracy of estimating various manufacturing cost indexes.
The invention provides an intelligent processing method for urban rail engineering investment estimation indexes, which comprises the following steps:
s1, creating a preset template;
S2, uploading an estimated index file, sequentially traversing and inserting index data in the estimated index file to corresponding positions in a preset template, and constructing an estimated index table;
s3, training a sample data set through an improved clustering algorithm, and constructing an index comparison model; wherein the sample dataset is comprised of historical benchmark prices;
s3 specifically comprises:
s31, initializing n sample data in a sample data set by adopting a Knuth algorithm to ensure that the probability of the n sample data being selected is 1/n;
s32, selecting k initial centroids by adopting a maximum and minimum distance principle, and obtaining corresponding groups;
s33, carrying out iterative computation on all sample data through a K-means algorithm until K centroids obtained through computation are consistent with K initial centroids selected or reach the maximum iteration times, completing training of a sample data set, and obtaining final K centroids and K family groups as index comparison models;
S4, inputting index data into an index comparison model, screening out abnormal data and carrying out identification processing on the abnormal data;
s5, outputting the processed estimation index table.
Further, S32, selecting k initial centroids by using a maximum-minimum distance principle, and obtaining a corresponding population includes:
s321, randomly selecting one sample data in a sample data set as a first initial centroid;
s322, selecting sample data with the largest distance from the first initial centroid as a second initial centroid;
s323, calculating straight line distances between all the rest sample data and each centroid as a distance set;
s324, selecting a maximum value from the distance set, and taking sample data corresponding to the maximum value as a new initial centroid;
S325, repeatedly executing S323-S324 until the number of the selected initial centroids reaches the preset number k, and entering S326;
s326, dividing the sample data set into k groups, and classifying the rest sample data into corresponding groups according to the distances between the rest sample data and k initial centroids.
Further, S326, dividing the sample data set into k groups, and classifying the remaining sample data into corresponding groups according to the distances between the remaining sample data and k initial centroids includes:
S3261, respectively calculating the distances between the rest sample data and k initial centroids, wherein the distance calculation formula is as follows:
where x i represents the i-th data sample, i=1, 2,..n, x j represents the j-th data sample, j=1, 2,..k, μ j represents the j-th centroid, j=1, 2,..k;
s3262, respectively classifying each residual sample data into corresponding groups according to k distance results, wherein the classifying formula is as follows:
Wherein C j represents the j-th ethnic group, j=1, 2,..k.
Further, S1, a preset template is created, where the preset template includes: title and description information of various index data forms; the description information comprises table names and dates, and the titles of various index data tables correspond to the table names of the page tables in the estimated index file one by one.
Further, S2, uploading the estimation index file, sequentially traversing the index data in the estimation index file and inserting the index data into the corresponding position in the preset template, and constructing the estimation index table includes:
S21, uploading an estimated index file, and performing formal examination on the estimated index file; if null values and/or value missing exist, prompting an error position and requesting to re-upload the estimation index file; if the formal review passes, the process proceeds to S22;
s22, reading an estimated index file, and positioning the position of index data in the estimated index file in a preset template according to the table name of each page table in the estimated index file;
s23, sequentially traversing and inserting the index data into corresponding positions in a preset template to construct an estimated index table.
Further, S23, sequentially traversing the index data to the corresponding position inserted into the preset template, and constructing the estimated index table includes:
When the data is inserted, judging the data type of the index data, if the index data is double and/or float data, converting the index data into a character string type, and converting the character string type into BigDecimal objects by using a BigDecimal method.
Further, S4, inputting the index data into the index comparison model, screening out the abnormal data and performing the identification processing on the abnormal data includes:
Judging whether the inserted index data is in a normal interval or not through an index comparison model; if the inserted index data is in the normal interval, not processing; and if the inserted index data is not in the normal interval, carrying out identification processing on the index data.
The invention also provides an intelligent processing system for the urban rail engineering investment estimation index, which is used for executing the intelligent processing method for the urban rail engineering investment estimation index, and comprises the following modules:
the form construction module is used for creating a preset template; uploading an estimated index file, sequentially traversing index data in the estimated index file and inserting the index data into a corresponding position in a preset template to construct an estimated index table;
The index comparison model construction module is connected with the table construction module and is used for training the sample data set through an improved clustering algorithm to construct an index comparison model; wherein the sample dataset is comprised of historical benchmark prices; the method specifically comprises the following steps: initializing n sample data in the sample data set by adopting a Knuth algorithm to ensure that the probability of the n sample data being selected is 1/n; selecting k initial centroids by adopting a maximum and minimum distance principle, and obtaining corresponding groups; carrying out iterative computation on all sample data through a K-means algorithm until K centroids obtained through computation are consistent with K initial centroids selected or reach the maximum iteration times, completing training of a sample data set, and obtaining final K centroids and K groups as index comparison models;
the abnormal data screening module is connected with the index comparison model construction module and is used for inputting index data into the index comparison model, screening out abnormal data and carrying out identification processing on the abnormal data;
the output module is connected with the abnormal data screening module and used for outputting the processed estimation index table.
The embodiment of the invention has the following technical effects:
1. according to the technical scheme, the cost index data of different areas and different historic periods are compiled and integrated and used as a basic data set for constructing a comparison model, and the data set can be updated at any time and in time according to the requirements so as to cope with the change of various index data, and the timeliness and the accuracy of estimating various cost indexes are greatly improved;
2. In the technical scheme, the construction of the comparison model based on the classical clustering algorithm K-Means is realized, and the data aggregation effect is improved by improving the K-Means, so that the accuracy of screening index data which does not accord with a normal interval is improved;
3. According to the technical scheme, the method and the device are intelligent, aiming at the defect of manual input of a large amount of data, the method and the device design and realize that the large amount of data are directly inserted into the summary report, and greatly improve the efficiency and the accuracy of the data in the input process;
4. According to the technical scheme, the finally-derived result summary report can accurately and intuitively reflect the estimation result of urban rail transit engineering investment, and the scientificity and timeliness of the final decision are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an intelligent processing method for urban rail engineering investment estimation indexes, which is provided by the embodiment of the invention;
FIG. 2 is a flow chart of training a sample dataset by improving a clustering algorithm provided by an embodiment of the present invention;
Fig. 3 is a schematic structural diagram of an intelligent processing system for urban rail engineering investment estimation indexes according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the invention, are within the scope of the invention.
Fig. 1 is a flowchart of an intelligent processing method for urban rail engineering investment estimation indexes, which is provided in an embodiment of the present invention, referring to fig. 1, specifically includes:
S1, creating a preset template.
Specifically, the preset template includes: title and description information of various index data forms; the description information comprises table names and dates, and the titles of various index data tables correspond to the table names of the page tables in the estimated index file one by one. Illustratively, docx formatted documents may be generated as preset templates by Xdocreport and FREEMAKER techniques. Microsoft Word is used to create a docx template file containing variables and placeholders are used in advance to mark specified locations for inserting data at particular locations. Then using FREEMAKER to load and fill the variables in the template, filling in advance the fixed content description including title, table name and date, finally using Xdocreport characteristics to fill in the template file and save the configuration to the server. FREEMAKER is a template engine, which can fill data for the document according to the self-requirement and combine the static file and the dynamic content data; xdocreport is a Java API for document population and document format conversion.
S2, uploading an estimated index file, sequentially traversing and inserting index data in the estimated index file to corresponding positions in a preset template, and constructing an estimated index table.
Specifically, the uploaded estimation index file is generally an Excel table, and can be manually dragged or searched from a file system for uploading. The estimation index file relates to various classifications and has huge data volume, and the data comprise indexes such as material price, manual price, equipment lease cost, transportation cost, construction period and the like, so that the accuracy and the completeness of estimation of various engineering cost have great significance for the cost optimal control and scientific and effective decision of urban rail transit engineering.
S21, uploading the estimated index file, and performing formal examination on the estimated index file.
Specifically, if null values and/or missing values exist, the error position can be prompted in a popup window mode and the estimation index file is required to be uploaded again; if the formal review passes, the process proceeds to S22.
S22, reading an estimated index file, and positioning the position of index data in the estimated index file in a preset template according to the table name of each page table in the estimated index file.
Specifically, the data content of the uploaded estimation index file is read by adopting a keyword regular matching method, and the table name of each page table in the file, namely the name of each sheet page, is obtained. And (3) performing cyclic traversal on the table names of each page table in the uploaded estimation index file and the pre-filled table names in the preset template, and reserving blank positions in the preset template after the keywords are matched and consistent so as to allow the subsequent index data to be input to the corresponding positions.
S23, sequentially traversing and inserting the index data into corresponding positions in a preset template to construct an estimated index table.
Specifically, based on OpenPyXL, the index data is inserted into the template document in units of the paging list name; typesetting the new index data based on Python-docx, and carrying out special identification processing on the subsequent abnormal data. OpenPyXL is a Python third party library specially used for reading, writing and operating Excel files, and related operations on target files can be realized through corresponding development; python-docx is a Python third party library specially used for creating modifiable Word, and can edit and modify Word, and the like, and the processing mode is object-oriented, namely paragraphs, texts, fonts and the like in a document are all regarded as objects, and the operation on the objects is the processing of the content of the Word document.
Further, when the data is inserted, the data type of the index data is judged, if the index data is double and/or float data, the index data is converted into a character string type, and then the character string type is converted into a BigDecimal object by using a BigDecimal method.
Furthermore, after the data of each page table is inserted, typesetting layout is performed according to the format of the data in the original table file, frames are added for each index by identifying spaces and line-wrapping characters, and finally the frames are summarized into a table.
S3, training the sample data set through an improved clustering algorithm, and constructing an index comparison model.
Wherein the sample data set is made up of historical benchmark prices.
Fig. 2 is a flowchart of training a sample dataset by improving a clustering algorithm according to an embodiment of the present invention, referring to fig. 2, S3 specifically includes:
s31, initializing n sample data in the sample data set by adopting a Knuth algorithm, so that the probability of the n sample data being selected is 1/n.
Specifically, the selection of the initial centroid in the whole clustering process is a key link of a clustering algorithm, and the conventional K-Means algorithm has a certain problem in the aspect: its initial centroid is manually selected and thus may result in the result being trapped in a locally optimal solution. In order to ensure that the randomness of the initial centroid selection is more balanced, initializing n sample data of a given data set by adopting a Knuth algorithm, so that the probability of each sample being selected is 1/n, and the method is more fair than the traditional random () method.
S32, selecting k initial centroids by adopting a maximum and minimum distance principle, and obtaining corresponding groups.
Specifically, although the random equalization problem of the K-Means algorithm is solved by adopting the Knuth algorithm, the clustering process and the clustering result are random to a certain extent. Therefore, in the process of selecting the initial centroid, the selection is performed according to the principle of maximum and minimum distances, so that the iteration times are reduced, and the efficiency and the accuracy of the population division are improved. The principle formula of the maximum and minimum distances is as follows:
Where x i represents the i-th data sample, i=1, 2,..n, μ j represents the j-th centroid, j=1, 2,..k, μ n+1 represents the n+1-th centroid.
S321, randomly selecting one sample data in the sample data set as a first initial centroid.
S322, selecting sample data with the largest distance from the first initial centroid as a second initial centroid.
S323, calculating straight line distances between all the rest sample data and each centroid as a distance set.
S324, selecting a maximum value from the distance set, and taking sample data corresponding to the maximum value as a new initial centroid.
S325, repeating S323-S324 until the number of the selected initial centroids reaches the preset number k, and entering S326.
The linear distances between all the rest sample data and the first initial centroid and the second initial centroid are calculated as distance sets, a maximum value is selected from the distance sets, and the sample data corresponding to the maximum value is taken as a third initial centroid; and removing the first initial centroid, the second initial centroid and the third initial centroid, respectively calculating straight line distances between all the rest sample data and the first initial centroid, the second initial centroid and the third initial centroid as distance sets, selecting a maximum value from the distance sets, and using the sample data corresponding to the maximum value as a fourth initial centroid … … to push the maximum value until the number of the selected initial centroids reaches a preset number k.
S326, dividing the sample data set into k groups, and classifying the rest sample data into corresponding groups according to the distances between the rest sample data and k initial centroids.
Specifically, the sample data set is divided into k groups, each group has an initial centroid, distances between the rest sample data and the k initial centroids are calculated respectively, and the sample data is classified into groups with the distance smaller than a preset boundary value and the nearest distance. The preset boundary value is set according to actual conditions.
S3261, respectively calculating the distances between the rest sample data and k initial centroids, wherein the distance calculation formula is as follows:
Where x i represents the i-th data sample, i=1, 2,..n, x j represents the j-th data sample, j=1, 2,..k, μ j represents the j-th centroid, j=1, 2,..k.
S3262, respectively classifying each residual sample data into corresponding groups according to k distance results, wherein the classifying formula is as follows:
Wherein C j represents the j-th ethnic group, j=1, 2,..k.
For example, assuming that the sample data is less than a predetermined boundary value from the 5-distance results of the sample data and the 5-th group, and the sample data is closest to the 2-th group C 2, the sample data is classified into the 2-th group C 2.
S33, carrying out iterative computation on all sample data through a K-means algorithm until K centroids obtained through computation are consistent with K initial centroids selected or the maximum iteration times are reached, completing training of a sample data set, and obtaining final K centroids and K groups serving as index comparison models.
S4, inputting the index data into an index comparison model, screening out abnormal data and carrying out identification processing on the abnormal data.
If the distance between the sample data and each group is greater than the preset boundary value, the sample data is abnormal data and is not subjected to the resetting.
Specifically, whether the inserted index data is in the normal section is judged through the index comparison model. If the inserted index data is in the normal interval, not processing; and if the inserted index data is not in the normal interval, carrying out identification processing on the index data. And inputting index data into an index comparison model, judging whether the inserted index data can be arranged in k groups or not through the index comparison model, if so, not processing, and if the distance between the sample data and each group is greater than a preset boundary value, not arranging the sample data, and identifying.
For example, the identification processing may be implemented by directly modifying the color style of the font, so that subsequent staff can more efficiently count and perfect the index data which does not meet the conventional index data.
S5, outputting the processed estimation index table.
Specifically, after filling all index data into a preset template and completing screening and identification processing, the integrated preset template is stored in a background server, and different file formats, such as pdf or word formats, can be derived according to user operation. If the word document is directly exported, the word document can be realized on the basis of FREEMAKER at the server side; if the template document needs to be exported in the PDF format, implementation is based on jsPDF, and in order to achieve front-end and back-end separation and reduce server pressure, implementation may be performed on a client, such as a browser. jsPDF is a JavaScript open source library specially used for generating PDF, and PDF documents with various purposes can be generated at a browser end.
Further, after the processed estimation index table is output, whether the identified abnormal data can be classified independently or manually can be judged manually.
In the embodiment of the invention, the cost index data of different areas and different historic periods are compiled and integrated and used as a basic data set for constructing a comparison model, and the data set can be updated at any time and in time according to the requirements so as to cope with the change of various index data, thereby greatly improving the timeliness and the accuracy of the estimation of various cost indexes;
The comparison model is used for constructing a K-Means based on a classical clustering algorithm, and the K-Means is improved, so that the aggregation effect of data is improved, and the accuracy of screening index data which does not accord with a normal interval is improved;
aiming at the defect of manual input of a large amount of data, the intelligent method designs and realizes that the large amount of data is directly inserted into the summary report, thereby greatly improving the efficiency and accuracy of the data in the input process;
the finally derived result summary report can accurately and intuitively reflect the estimation result of urban rail transit engineering investment, and the scientificity and timeliness of the final decision are improved.
Fig. 3 is a schematic structural diagram of an intelligent processing system for urban rail engineering investment estimation indexes provided by the embodiment of the present invention, referring to fig. 3, the present invention provides an intelligent processing system for urban rail engineering investment estimation indexes, configured to execute the intelligent processing method for urban rail engineering investment estimation indexes according to any one of the foregoing embodiments, including the following modules:
the form construction module is used for creating a preset template; uploading an estimated index file, sequentially traversing index data in the estimated index file and inserting the index data into a corresponding position in a preset template to construct an estimated index table;
The index comparison model construction module is connected with the table construction module and is used for training the sample data set through an improved clustering algorithm to construct an index comparison model; wherein the sample dataset is comprised of historical benchmark prices; the method specifically comprises the following steps: initializing n sample data in the sample data set by adopting a Knuth algorithm to ensure that the probability of the n sample data being selected is 1/n; selecting k initial centroids by adopting a maximum and minimum distance principle, and obtaining corresponding groups; carrying out iterative computation on all sample data through a K-means algorithm until K centroids obtained through computation are consistent with K initial centroids selected or reach the maximum iteration times, completing training of a sample data set, and obtaining final K centroids and K groups as index comparison models;
the abnormal data screening module is connected with the index comparison model construction module and is used for inputting index data into the index comparison model, screening out abnormal data and carrying out identification processing on the abnormal data;
the output module is connected with the abnormal data screening module and used for outputting the processed estimation index table.
According to the embodiment of the invention, the cost index data of different areas and different historical periods are compiled and integrated, the cost index data is used as a basic data set for constructing a comparison model, the K-Means algorithm is improved, the data are processed, the index comparison model is constructed, the aggregation effect of the data is improved, the accuracy of screening index data which do not accord with a normal interval is improved, and the efficiency and accuracy of the data in the recording process are improved.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application. As used in this specification, the terms "a," "an," "the," and/or "the" are not intended to be limiting, but rather are to be construed as covering the singular and the plural, unless the context clearly dictates otherwise. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method or apparatus that includes the element.
It should also be noted that the positional or positional relationship indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the positional or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or element in question must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Unless specifically stated or limited otherwise, the terms "mounted," "connected," and the like are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention.

Claims (8)

1. An intelligent processing method for urban rail engineering investment estimation indexes is characterized by comprising the following steps:
s1, creating a preset template;
S2, uploading an estimated index file, sequentially traversing and inserting index data in the estimated index file to corresponding positions in the preset template, and constructing an estimated index table;
s3, training a sample data set through an improved clustering algorithm, and constructing an index comparison model; wherein the sample dataset is comprised of historical benchmark prices;
s3 specifically comprises:
s31, initializing n sample data in the sample data set by adopting a Knuth algorithm to ensure that the probability of the n sample data being selected is 1/n;
s32, selecting k initial centroids by adopting a maximum and minimum distance principle, and obtaining corresponding groups;
s33, carrying out iterative computation on all sample data through a K-means algorithm until K centroids obtained through computation are consistent with K initial centroids selected or reach the maximum iteration times, completing training of a sample data set, and obtaining final K centroids and K family groups as index comparison models;
S4, inputting the index data into the index comparison model, screening out abnormal data and carrying out identification processing on the abnormal data;
s5, outputting the processed estimation index table.
2. The intelligent processing method of urban rail engineering investment estimation indexes according to claim 1, wherein the step S32 of selecting k initial centroids by using a maximum and minimum distance principle and obtaining corresponding groups comprises:
S321, randomly selecting one sample data in the sample data set as a first initial centroid;
s322, selecting sample data with the largest distance from the first initial centroid as a second initial centroid;
s323, calculating straight line distances between all the rest sample data and each centroid as a distance set;
S324, selecting a maximum value from the distance set, and taking sample data corresponding to the maximum value as a new initial centroid;
S325, repeatedly executing S323-S324 until the number of the selected initial centroids reaches the preset number k, and entering S326;
s326, dividing the sample data set into k groups, and classifying the rest sample data into corresponding groups according to the distances between the rest sample data and k initial centroids.
3. The intelligent processing method of urban rail engineering investment estimation indexes according to claim 2, wherein the step S326 of dividing the sample data set into k groups and classifying the remaining sample data into corresponding groups according to distances between the remaining sample data and k initial centroids comprises:
S3261, respectively calculating the distances between the rest sample data and k initial centroids, wherein the distance calculation formula is as follows:
where x i represents the i-th data sample, i=1, 2,..n, x j represents the j-th data sample, j=1, 2,..k, μ j represents the j-th centroid, j=1, 2,..k;
s3262, respectively classifying each residual sample data into corresponding groups according to k distance results, wherein the classifying formula is as follows:
Wherein C j represents the j-th ethnic group, j=1, 2,..k.
4. The intelligent processing method for urban rail engineering investment estimation indexes according to claim 1, wherein the step S1 of creating a preset template comprises: title and description information of various index data forms; the description information comprises table names and dates, and the titles of the various index data tables are in one-to-one correspondence with the table names of the page tables in the estimated index file.
5. The intelligent processing method of urban rail engineering investment estimation indexes according to claim 4, wherein the step S2 of uploading an estimation index file, sequentially traversing the index data in the estimation index file to the corresponding position in the preset template, and constructing an estimation index table includes:
S21, uploading an estimation index file, and performing formal examination on the estimation index file; if null values and/or value missing exist, prompting an error position and requesting to re-upload the estimation index file; if the formal review passes, the process proceeds to S22;
S22, reading the estimation index file, and positioning the position of index data in the estimation index file in the preset template according to the table name of each page table in the estimation index file;
s23, sequentially traversing and inserting the index data into the corresponding positions in the preset templates to construct an estimated index table.
6. The intelligent processing method of urban rail engineering investment estimation indexes according to claim 5, wherein the step S23 of sequentially traversing the index data to corresponding positions inserted into the preset templates to construct an estimation index table includes:
When data is inserted, judging the data type of the index data, and if the index data is double and/or float data, converting the index data into a character string type, and converting the character string type into BigDecimal objects by using a BigDecimal method.
7. The intelligent processing method of urban rail engineering investment estimation indexes according to claim 1, wherein the step S4 of inputting the index data into the index comparison model, screening out abnormal data and identifying the abnormal data comprises the steps of:
Judging whether the inserted index data is in a normal interval or not through the index comparison model; if the inserted index data is in the normal interval, not processing; and if the inserted index data is not in the normal interval, carrying out identification processing on the index data.
8. An intelligent processing system for urban rail engineering investment estimation indexes, which is used for executing an intelligent processing method for urban rail engineering investment estimation indexes according to any one of claims 1-7, and is characterized by comprising the following modules:
The form construction module is used for creating a preset template; uploading an estimated index file, sequentially traversing index data in the estimated index file to corresponding positions inserted into the preset template, and constructing an estimated index table;
The index comparison model construction module is connected with the table construction module and is used for training the sample data set through an improved clustering algorithm to construct an index comparison model; wherein the sample dataset is comprised of historical benchmark prices; the method specifically comprises the following steps: initializing n sample data in the sample data set by adopting a Knuth algorithm to ensure that the probability of the n sample data being selected is 1/n; selecting k initial centroids by adopting a maximum and minimum distance principle, and obtaining corresponding groups; carrying out iterative computation on all sample data through a K-means algorithm until K centroids obtained through computation are consistent with K initial centroids selected or reach the maximum iteration times, completing training of a sample data set, and obtaining final K centroids and K groups as index comparison models;
The abnormal data screening module is connected with the index comparison model construction module and is used for inputting the index data into the index comparison model, screening out abnormal data and carrying out identification processing on the abnormal data;
And the output module is connected with the abnormal data screening module and is used for outputting the processed estimation index table.
CN202410385002.0A 2024-04-01 2024-04-01 Intelligent processing method and system for urban rail engineering investment estimation indexes Active CN117973343B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410385002.0A CN117973343B (en) 2024-04-01 2024-04-01 Intelligent processing method and system for urban rail engineering investment estimation indexes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410385002.0A CN117973343B (en) 2024-04-01 2024-04-01 Intelligent processing method and system for urban rail engineering investment estimation indexes

Publications (2)

Publication Number Publication Date
CN117973343A CN117973343A (en) 2024-05-03
CN117973343B true CN117973343B (en) 2024-06-07

Family

ID=90864930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410385002.0A Active CN117973343B (en) 2024-04-01 2024-04-01 Intelligent processing method and system for urban rail engineering investment estimation indexes

Country Status (1)

Country Link
CN (1) CN117973343B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6314204B1 (en) * 1998-11-03 2001-11-06 Compaq Computer Corporation Multiple mode probability density estimation with application to multiple hypothesis tracking
CN104077665A (en) * 2014-07-10 2014-10-01 国家电网公司 Power grid project manufacturing cost analysis data collecting system and method
CN105225076A (en) * 2015-11-12 2016-01-06 国网宁夏电力公司经济技术研究院 Analysis of prices data Collection & Processing System
CN106127398A (en) * 2016-06-30 2016-11-16 国网山东省电力公司经济技术研究院 A kind of construction costs being applicable to project of transmitting and converting electricity calculates system
CN115795079A (en) * 2022-12-13 2023-03-14 中国人民解放军军事科学院国防工程研究院 Engineering cost analysis data acquisition and processing method and system
CN115936513A (en) * 2022-12-12 2023-04-07 和元达信息科技有限公司 Engineering project investment estimation method and system based on dynamic indexes
CN117330125A (en) * 2023-09-20 2024-01-02 中国铁路设计集团有限公司 Optical fiber monitoring device and data processing method for existing high-speed railway tunnel in shield crossing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6314204B1 (en) * 1998-11-03 2001-11-06 Compaq Computer Corporation Multiple mode probability density estimation with application to multiple hypothesis tracking
CN104077665A (en) * 2014-07-10 2014-10-01 国家电网公司 Power grid project manufacturing cost analysis data collecting system and method
CN105225076A (en) * 2015-11-12 2016-01-06 国网宁夏电力公司经济技术研究院 Analysis of prices data Collection & Processing System
CN106127398A (en) * 2016-06-30 2016-11-16 国网山东省电力公司经济技术研究院 A kind of construction costs being applicable to project of transmitting and converting electricity calculates system
CN115936513A (en) * 2022-12-12 2023-04-07 和元达信息科技有限公司 Engineering project investment estimation method and system based on dynamic indexes
CN115795079A (en) * 2022-12-13 2023-03-14 中国人民解放军军事科学院国防工程研究院 Engineering cost analysis data acquisition and processing method and system
CN117330125A (en) * 2023-09-20 2024-01-02 中国铁路设计集团有限公司 Optical fiber monitoring device and data processing method for existing high-speed railway tunnel in shield crossing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"海量短信数据中异常行为的研究";湛然;《中国优秀硕士学位论文全文数据库 信息科技辑》;20181015;全文 *

Also Published As

Publication number Publication date
CN117973343A (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN112801010B (en) Visual rich document information extraction method for actual OCR scene
Cerda et al. Similarity encoding for learning with dirty categorical variables
CN116152843B (en) Category identification method, device and storage medium for contract template to be filled-in content
CN113987199B (en) BIM intelligent image examination method, system and medium with standard automatic interpretation
CN115063119A (en) Recruitment decision system and method based on adaptivity of recruitment behavior data
CN112269872B (en) Resume analysis method and device, electronic equipment and computer storage medium
CN114969275A (en) Conversation method and system based on bank knowledge graph
CN116975256B (en) Method and system for processing multisource information in construction process of underground factory building of pumped storage power station
CN113946677A (en) Event identification and classification method based on bidirectional cyclic neural network and attention mechanism
CN112818117A (en) Label mapping method, system and computer readable storage medium
CN115906842A (en) Policy information identification method
CN113836891A (en) Method and device for extracting structured information based on multi-element labeling strategy
CN114647715A (en) Entity recognition method based on pre-training language model
CN116719934A (en) Method for extracting small sample relation under continuous learning based on prompt contrast learning
Chua et al. DeepCPCFG: deep learning and context free grammars for end-to-end information extraction
Alfaro-Contreras et al. Optical music recognition for homophonic scores with neural networks and synthetic music generation
CN114356924A (en) Method and apparatus for extracting data from structured documents
CN117973343B (en) Intelligent processing method and system for urban rail engineering investment estimation indexes
CN116091120B (en) Full stack type electricity price consulting and managing system based on knowledge graph technology
CN113255498A (en) Financial reimbursement invoice management method based on block chain technology
CN117034948A (en) Paragraph identification method, system and storage medium based on multi-feature self-adaptive fusion
CN111738008B (en) Entity identification method, device and equipment based on multilayer model and storage medium
CN114860952A (en) Graph topology learning method and system based on data statistics and knowledge guidance
CN114168720A (en) Natural language data query method and storage device based on deep learning
Zhang et al. Big data-assisted urban governance: A comprehensive system for business documents classification of the government hotline

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant