Construction project labor and materials machine data automatic coding and system
Technical field
The invention belongs to construction industry data analysis fields, and in particular to a kind of to utilize natural language technology to construction project
Artificial material mechanical equipment (i.e. labor and materials machine) data carry out automatic identification, and according to recognition result carry out autocoding method and
System.
Background technology
In the investment estimate of construction project, estimation per design, bid contrul, construction budget, final account for completed project and centralized purchasing etc.
In link, needing to use a large amount of construction project labor and materials machine data, these data are various in style, call various, specification disunity,
Data are caused to be difficult to, sort out, to be compared, analyze to data automatically.The application of these current data and pipe
Reason is based primarily upon the experience of people to judge, working efficiency is low, and formation result is slow, and entreprise cost is high, and influences construction project item
Purpose investment analysis and whole process cost management.
Invention content
The present invention is difficult to, analyzes for labor and materials machine data in the prior art, and working efficiency is low and entreprise cost is high
The problems such as, propose a kind of construction project labor and materials machine data automatic coding.
Construction project labor and materials machine data automatic coding proposed by the present invention, mainly includes the following steps that:
A1, the labor and materials machine data of natural language description are standardized by professional standard, character lack of standardization is replaced with
Specification character;
A2, name keyword is obtained from the labor and materials machine data after standardization, and by the name keyword in standard name
Claim to carry out the matching analysis in library, determines the title of the labor and materials machine data;
A3, labor and materials machine number is arbitrated according to the unit information in the title and labor and materials machine data of the labor and materials machine data
According to affiliated classification;
A4, the characteristic value for obtaining labor and materials machine data from the labor and materials machine data according to affiliated classification;
A5, the title based on the labor and materials machine data, affiliated classification and the characteristic value are encoded.
In present invention further optimization scheme, the step A2 is specifically included:
A21, word segmentation processing is carried out to the name information and specification information of the labor and materials machine data after standardization, to obtain title
Keyword;
If A22, only getting a name keyword, which is matched with the title library
Analysis;If getting multiple name keywords, carried out with the title library after each name keyword is respectively combined
The matching analysis;
A23, the title that the labor and materials machine data are determined according to highest matching degree.
The classification arbitrated in present invention further optimization scheme, in the step A3 belonging to labor and materials machine data refers to secondary
Labor and materials machine data affiliated classification number in national standard classification is cut out, arbitration labor and materials machine data is specifically can refer to and exists《GB/T 50851-2013 are built
If engineering artificial material plant machinery data standard》In belonging to classification number;If it is not unique to arbitrate obtained classification number, in conjunction with
Specification information in labor and materials machine data does secondary arbitration, to obtain unique classification number.
In present invention further optimization scheme, the step A4 is specifically included:According to the labor and materials machine data in state
The characteristic item description of classification number belonging in mark classification carries out characterization rules analysis, and obtains the data value of various features.
In present invention further optimization scheme, the step A5 is specifically included:
A51, using the labor and materials machine data in national standard classification belonging to classification number as classification coding section, and based on the work
Title, the characteristic value of material machine data distribute the title coding section and Coding pattern features section of presetting digit capacity respectively;
A52, the classification coding section, title coding section and Coding pattern features section are sequentially combined, forms the work
The coding of material machine data.
Correspondingly, the invention also provides a kind of construction project labor and materials machine data automatic coding systems, include mainly specification
Change module, matching analysis module, arbitration modules, characteristic value acquisition module and coding module;
The normalizing block will for the labor and materials machine data of natural language description to be standardized by professional standard
Character lack of standardization replaces with specification character;
The matching analysis module, for will obtain name keyword from the labor and materials machine data after standardization, and by name
Claim keyword to carry out the matching analysis in title library, determines the title of the labor and materials machine data;
The arbitration modules, for being believed according to the unit in the title and labor and materials machine data of the labor and materials machine data
Classification belonging to breath arbitration labor and materials machine data;
The characteristic value acquisition module, for obtaining labor and materials machine data from the labor and materials machine data according to affiliated classification
Characteristic value;
The coding module is used for title, affiliated classification and the feature based on the labor and materials machine data
Value is encoded.
In present invention further optimization scheme, further includes labor and materials machine character control library, be used for the work of storage specification
Material machine character;Labor and materials machine character is compareed corresponding labor and materials machine character in library and replaces the labor and materials machine data by the normalizing block
In character lack of standardization.
Further include labor and materials machine thesaurus, for storing labor and materials organ keyword in present invention further optimization scheme;Institute
Matching analysis module is stated to divide the name information and specification information of the labor and materials machine data by the labor and materials machine thesaurus
Word processing, to obtain the name keyword in labor and materials machine data.
Further include labor and materials machine characterization rules library, the labor and materials machine characterization rules in present invention further optimization scheme
Library has characteristic item description of the labor and materials machine in national standard classifies corresponding classification number;The characteristic value acquisition module is according to the labor and materials
Machine characterization rules library carries out characterization rules analysis to institute's labor and materials machine data, to obtain the data value of various features.
Further include that labor and materials machine title code database and labor and materials machine characteristic value are compiled in present invention further optimization scheme
Code library;The labor and materials machine title code database is stored with labor and materials machine title coding section, the labor and materials machine Coding pattern features
Inventory contains labor and materials machine Coding pattern features section;The coding module is with labor and materials machine data classification affiliated in national standard classification
Number it is classification coding section, and the title of the labor and materials machine data is carried out in the labor and materials machine title code database
It is equipped with and obtains title coding section, the characteristic value is matched in the labor and materials machine Coding pattern features library to obtain spy
The classification coding section, title coding section, Coding pattern features section are sequentially combined into the labor and materials machine number by value indicative coding section
According to coding.
The present invention at least has following advantageous effect:
1, each labor and materials machine data are assigned by way of coding uniquely to encode, so that labor and materials machine data are identified,
The applications such as conversion, analysis, classification and management.
2, each labor and materials machine data can be used for intelligence and execute the identification, conversion, divide with corresponding unique coding
The applications such as analysis, classification and management, without manual operation, help to improve working efficiency, are quickly formed as a result, and reducing enterprise
Industry cost faster promotes investment analysis and the whole process cost management of construction project.
3, it is capable of title, unit information, the specification information etc. of intelligent recognition labor and materials machine data in an encoding process, forms mark
Quasi- title (aggregation) and completion labor and materials machine data characterization, and key feature label can be carried out, block code is formed, in order to work
The further application and management of material machine data.
Description of the drawings
Fig. 1 is a kind of construction project labor and materials machine data automatic coding flow diagram that embodiment one proposes.
Fig. 2 is a kind of construction project labor and materials machine data automatic coding system structural schematic diagram that embodiment two proposes.
Specific implementation mode
In order to facilitate the understanding of those skilled in the art, being carried out to the present invention below in conjunction with attached drawing and embodiment further
Description.
Embodiment one
With a nonstandard labor and materials machine data instance with natural language description, it is assumed that it includes title, specification, unit
Etc. information, it is specific as follows:
Title:Power cable
Specification:0.Mono- cores of 6/1KV 1.5mm2VV
Unit:KM
Referring to Fig. 1, the construction project labor and materials machine data automatic coding that embodiment one proposes, to above-mentioned nonstandard
Labor and materials machine data carry out autocoding, and main process includes the following steps S100 to S500:
S100, the labor and materials machine data of natural language description are standardized by professional standard, character lack of standardization is replaced
For specification character.
Character lack of standardization is mainly substituted for specification (standard) character, such as number machine by standardization in the step s 100
Specification information " 0 in data.6 " include non-standard character, can be substituted for " 0.6 ", unit information " KM " alternatively at
“km”;Certainly, it only illustrates here, if appearance " ∮ ", Deng also alternatively at " Φ " of specification.
Further, the standardization character specifically can compare library in labor and materials machine character and prestore, the labor and materials machine character
The labor and materials machine character that library is used for storage specification is compareed, it, can recruitment when recognizing in labor and materials machine data there are when character lack of standardization
The character that standardizes accordingly in material machine character control library is replaced.
S200, name keyword is obtained from the labor and materials machine data after standardization, and by the name keyword in standard
The matching analysis is carried out in namebase, determines the title of the labor and materials machine data.
In order to provide more preferably embodiment, step S200 can be refined into following steps S210 to S230:
S210, word segmentation processing is carried out to the name information and specification information of the labor and materials machine data after standardization, to obtain name
Claim keyword.
If S220, only getting a name keyword, by the name keyword and title library progress
With analysis.If getting multiple name keywords, by each name keyword be respectively combined after with the title library into
Row the matching analysis.
S230, the title that the labor and materials machine data are determined according to highest matching degree.
Further, labor and materials organ keyword can be stored in labor and materials machine thesaurus in advance;Then in step S200 (steps
S210 by the matching analysis in), the name information and specification information of labor and materials machine data are carried out using the labor and materials machine thesaurus
Word segmentation processing, to obtain the name keyword in labor and materials machine data.
For example, " power cable ", " KV ", " mm ", " VV ", " one can be obtained by word segmentation processing in step S210
The name keywords such as core ";Since name keyword is there are multiple, needed in step S230 by these name keywords into
The matching analysis is carried out with title library after row combination, and is referred to as the standard of above-mentioned labor and materials machine data with the highest name of matching degree
Title;The combination of name keyword " power cable " and " VV " possess matching degree most in title library in the present embodiment
High title, thus foundation is combined as with name keyword " power cable " and " VV ", it is matched in step S230
To title " VV copper core polyvinyl chloride-insulated polyvinyl chlorides power cable ".
S300, labor and materials machine is arbitrated according to the unit information in the title and labor and materials machine data of the labor and materials machine data
Classification belonging to data.
The classification belonging to labor and materials machine data is arbitrated in the step S300 specifically can refer to arbitration labor and materials machine data in national standard point
Class (can refer to《GB/T 50851-2013 construction project artificial material plant machinery data standards》) in belonging to classification number;If
The classification number that arbitration obtains is not unique, then combines the specification information in labor and materials machine data to do secondary arbitration, uniquely to be divided
Class-mark.
For example, according to title " the VV copper core polyvinyl chloride-insulated polyvinyl chloride electric power of above-mentioned labor and materials machine data
The classification belonging to unit information " km " arbitration labor and materials machine data in cable ", labor and materials machine data, classifies to obtain it in national standard
In classification number " 2811 " (" 2811 " exist《GB/T 50851-2013 construction project artificial material plant machinery data standards》In
Corresponding " power cable ").
S400, the characteristic value for obtaining labor and materials machine data from the labor and materials machine data according to affiliated classification.
The step S400 is specifically included:According to the feature of the labor and materials machine data affiliated classification number in national standard classification
Item description carries out characterization rules analysis, and obtains the data value of various features.
Further, labor and materials machine characterization rules library can be pre-set, there is labor and materials machine to exist in the labor and materials machine characterization rules library
Characteristic item description in the corresponding classification number of national standard classification;Step S400 is according to the labor and materials machine characterization rules library to institute's labor and materials machine number
According to characterization rules analysis is carried out, to obtain the data value of various features.
For example, the characteristic value of above-mentioned labor and materials machine data obtains result:" kind:VV;Nominal section (mm2):1.5;Core number:
1;Rated voltage (KV):0.6/1 ", wherein " performance number ", " nominal section (mm2) ", " core number ", " rated voltage (KV) " be characterized
, " VV ", " 1.5 ", " 1 ", " 0.6/1 " they are respectively the data value of individual features item.Characteristic value acquisition process is with " nominal section
(mm2) " for:" nominal section (mm2) " be《GB/T 50851-2013 construction project artificial material plant machinery data standards》
The characteristic item of middle classification number " 2811 ", by obtaining " mm2 " to the labor and materials machine data, " mm2 " is common single with " nominal section "
Position " mm2" close, therefore " mm2 " is identified as to the unit of " nominal section ";Further according to the normalized written of " nominal section ", before unit
The numerical value in face is the data value of " nominal section ", therefore can extract to obtain data value " 1.5 ";It can also be to the data after extraction
Value range is verified, and is verified, and illustrates that the data value is effective.
S500, the title based on the labor and materials machine data, affiliated classification and the characteristic value are encoded.
In order to provide more preferably embodiment, step S500 can be refined into following steps S510 to S520:
S510, using the labor and materials machine data national standard classification in belonging to classification number as classification coding section, and based on described in
Title, the characteristic value of labor and materials machine data distribute the title coding section and Coding pattern features of presetting digit capacity respectively
Section.
S520, the classification coding section, title coding section and Coding pattern features section are sequentially combined, forms the work
The coding of material machine data.
Further, labor and materials machine title code database and labor and materials machine Coding pattern features library can be pre-set;The labor and materials
Machine title code database is stored with labor and materials machine title coding section, and the labor and materials machine Coding pattern features inventory contains labor and materials machine
Coding pattern features section;Step S510 by the title of the labor and materials machine data in the labor and materials machine title code database into
Row matching, to obtain title coding section, and the characteristic value is matched in the labor and materials machine Coding pattern features library,
To obtain Coding pattern features section.Step S520 by the classification coding section, title coding section and Coding pattern features section sequentially
Combination, just forms the coding of the labor and materials machine data.
In the present embodiment, determined according to the specificity analysis of labor and materials machine data, most three kinds of characteristic item, these three characteristic item groups
At Coding pattern features section can represent the difference between same type of material, then by classification coding section, title coding section and
Coding pattern features section forms the coding of labor and materials machine data, can represent the difference between inhomogeneity material, ensure that coding only
One property.
For example, in step S510, encoded as classification using the classification number that the labor and materials machine data are affiliated in national standard classification
Section, you can obtain " 2811 ", be " 2011 " according to the title coding section that the title of above-mentioned labor and materials machine data obtains, then
Classification coding section is " 28112011 " plus title coding section.
In terms of Coding pattern features, in the present embodiment, " VV copper core polyvinyl chloride-insulated polyvinyl chlorides power cable "
There are four characteristic value " kinds:VV;Nominal section (mm2):1.5;Core number:1;Rated voltage (kV):0.6/1 ", wherein " kind ",
" nominal section (mm2) ", " core number " be key feature item (the labor and materials machine data of each classification can by three or three with
Interior characteristic item judges difference, these characteristic items can be described as key feature item, therefore the coding being made of the value of key feature item
It is also unique;Here " rated voltage (kV):0.6/1 " is not belonging to key feature item), by the corresponding coding of the value " VV " of " kind "
For " 025 ", " nominal section (mm2) " value " 1.5 " it is corresponding be encoded to " 004 ", the value " 1 " of " core number " is corresponding to be encoded to
" 008 " can be combined into the Coding pattern features section of " VV copper core polyvinyl chloride-insulated polyvinyl chlorides power cable ":
“025004008”。
Therefore, final above-mentioned labor and materials machine data are encoded to classification coding section, title coding section and Coding pattern features
Duan Yixu combinations are sequentially composed, i.e.,:“28112011025004008”.
Embodiment two
Referring to Fig. 2, embodiment is second is that a kind of construction project labor and materials machine data autocoding system corresponding with embodiment one
System includes mainly normalizing block 10, matching analysis module 30, arbitration modules 40, characteristic value acquisition module 50 and coding mould
Block 60.
The normalizing block 10, for the labor and materials machine data of natural language description to be standardized by professional standard,
Character lack of standardization is replaced with into specification character.
The matching analysis module 30, for name keyword will to be obtained from the labor and materials machine data after standardization, and will
Name keyword carries out the matching analysis in title library (such as Fig. 2 Plays namebase 21), determines the labor and materials machine data
Title.
The arbitration modules 40, for according to the unit in the title and labor and materials machine data of the labor and materials machine data
Information arbitrates the classification belonging to labor and materials machine data.
The characteristic value acquisition module 50, for obtaining labor and materials machine number from the labor and materials machine data according to affiliated classification
According to characteristic value.
The coding module 60 is used for title, affiliated classification and the spy based on the labor and materials machine data
Value indicative is encoded.
In order to which the purpose of embodiment two is better achieved, embodiment two can also advanced optimize as follows:
In the first prioritization scheme, embodiment two may also include labor and materials machine character control library 22, be used for storage specification
Labor and materials machine character;Labor and materials machine character is compareed corresponding labor and materials machine character in library 22 and replaces the work by the normalizing block 10
Character lack of standardization in material machine data.
In second of preferred embodiment, embodiment two can also further comprise labor and materials machine thesaurus 23, for storing labor and materials
Organ's keyword;The matching analysis module 30 by the labor and materials machine thesaurus 23 to the name informations of the labor and materials machine data and
Specification information carries out word segmentation processing, to obtain the name keyword in labor and materials machine data.
In the third preferred embodiment, embodiment two can also further comprise labor and materials machine characterization rules library 24, the labor and materials
Machine characterization rules library 24 has characteristic item description of the labor and materials machine in national standard classifies corresponding classification number;The characteristic value acquisition module
50 carry out characterization rules analysis according to the labor and materials machine characterization rules library 24 to institute's labor and materials machine data, to obtain the number of various features
According to value.
In the 4th kind of preferred embodiment, embodiment two can also further comprise labor and materials machine title code database 25 and labor and materials
Machine Coding pattern features library 26;The labor and materials machine title code database 25 is stored with labor and materials machine title coding section, the work
Material machine Coding pattern features library 26 is stored with labor and materials machine Coding pattern features section;The coding module 60 is with the labor and materials machine data in state
Classification number belonging in mark classification is classification coding section, and by the title of the labor and materials machine data in the labor and materials machine standard
It is matched in name encoding library 25 to obtain title coding section, by the characteristic value in the labor and materials machine Coding pattern features
It is matched in library 26 to obtain Coding pattern features section, by the classification coding section, title coding section, Coding pattern features section
Sequentially it is combined into the coding of the labor and materials machine data..
The technical principle and advantageous effect of above example two are corresponding with embodiment one, and which is not described herein again.
Several embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
Cannot the limitation to the scope of the claims of the present invention therefore be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the guarantor of the present invention
Protect range.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.