CN107562701A - A kind of data analysis method and its system of steel trade industry stock resource - Google Patents

A kind of data analysis method and its system of steel trade industry stock resource Download PDF

Info

Publication number
CN107562701A
CN107562701A CN201710722845.5A CN201710722845A CN107562701A CN 107562701 A CN107562701 A CN 107562701A CN 201710722845 A CN201710722845 A CN 201710722845A CN 107562701 A CN107562701 A CN 107562701A
Authority
CN
China
Prior art keywords
server
unit
resolved
data
source material
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710722845.5A
Other languages
Chinese (zh)
Inventor
张家卫
李剑
袁刚
马志鑫
朱成军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Looking For Steel Network Information Polytron Technologies Inc
Original Assignee
Shanghai Looking For Steel Network Information Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Looking For Steel Network Information Polytron Technologies Inc filed Critical Shanghai Looking For Steel Network Information Polytron Technologies Inc
Priority to CN201710722845.5A priority Critical patent/CN107562701A/en
Publication of CN107562701A publication Critical patent/CN107562701A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data analysis method and its system of steel trade industry stock resource, the described method comprises the following steps:(1) called side sends a resource list analysis request to first server;(2) first server generates task requests to be resolved by asynchronous system, and the task requests to be resolved are stored to second server;(3) resolution server timing transfers the task requests to be resolved from the second server, and the source material list corresponding with the task requests to be resolved is obtained from file server;(4) resolution server resolves to the source material list by the resolution rules module built in one data file of reference format, and the data file is stored to the file server;(5) resolution server sends an analysis result to the called side, and updates the mark of corresponding task requests to be resolved in the second server.

Description

A kind of data analysis method and its system of steel trade industry stock resource
Technical field
The present invention relates to data analysis technique field, more particularly to a kind of data analysis method of steel trade industry stock resource And its system.
Background technology
As other industry, user enters any one steel trade industrial sustainability, passes through input for " search " of steel trade industry Keyword can be carried out searching for.In steel trade industry, the given area scope for searching target information is referred to as " resource pool ". " resource pool " of steel trade industry generally comprises following information:Category, material, specification, steel mill, warehouse, price, quantity, size etc. Deng.In addition, the data in " resource pool " generally even carry out being continuously updated data per hour daily.
" resource pool " data for modernizing steel trade industrial sustainability typically will by different steel suppliers and platform operation business Then source material single upload comprising stock information of freight source collects what is got into the database of website again to website.These Uploaded as source material is single with common document format, such as word, excel or txt document.
Because the resource single lattice formula that each supplier or platform operation business upload is not quite similar, some does not have form even Pure natural language, but the final data needed to store to database must be in strict accordance with the regular number of the ranks of website requests According to table, therefore, it is necessary to by it is various it is rambling, form is various, data extract in the resource list of miscellaneous editions, arrange, Screen, be organized into the valid data of unified form.
Existing data analyzing platform is that the code developed by using donet technologies provides multiple business and for not Resource list with form carries out data parsing.Because there is ground in the non-standard resources list that each supplier or platform operation business provide Domain otherness, and the resource list of each form needs to use individually rule configuration, and therefore, this will increase data parsing Complexity.In addition, existing data analyzing platform is on a single machine while runs multiple business, data solution is easily caused Resource scarcity and systematic function decline during analysis.In addition, business used in existing data analyzing platform can not be to space-consuming Larger resource list and the less resource list differentiation processing of space-consuming, can so cause the less money of suitable treatment space-consuming In the larger resource list of processing space-consuming such as stuck abnormal conditions, and have influence on other business occur for the single business in source Perform the operation of data parsing.In addition, existing data analyzing platform in resolving, is only supported to fix a dispatching party, expand Malleability is poor and is difficult to safeguard.
In view of this, a kind of new data analysis method and its system need to be provided to solve the above problems.
The content of the invention
It is an object of the present invention to provide a kind of data analysis method of steel trade industry stock resource, methods described is logical Cross using unified interface to obtain resource list to be resolved, and using the resolution rules module built in one by original resource list Parsing is changed into the data file with reference format, so that multiple called sides use, so as to realize parsing separation so that each The default resolution rules of called side are unaffected each other, and support parsing and the super large file of the resource list of multiple format Processing.And methods described has the characteristics that real-time servicing, accuracy rate are high, efficiency is good and expansible strong.
In order to solve the above problems, the invention provides a kind of data analysis method of steel trade industry stock resource, and it is wrapped Include following steps:(1) called side sends a resource list analysis request to first server;(2) first server passes through asynchronous side Formula generates task requests to be resolved, and the task requests to be resolved are stored to second server;(3) resolution server timing The task requests to be resolved are transferred from the second server, and are obtained and the task requests to be resolved from file server Corresponding source material list;(4) resolution server is parsed the source material list by the resolution rules module built in one For the data file of reference format, and the data file is stored to the file server;(5) resolution server sends one Analysis result updates the mark of corresponding task requests to be resolved in the second server to the called side.
In one embodiment of this invention, in step (2), the task requests to be resolved generated include unique appoint Business identification number, the task identification number are used to distinguish different task requests to be resolved, solved in order to be sent in resolution server After analysing result to called side, called side identifies the task identification number in the analysis result, and according to the analysis result In address information the data file corresponding with the task identification number is called from corresponding file server.
In one embodiment of this invention, in step (2), a source is also included in the task requests to be resolved generated Message identification, the source-information are identified for distinguishing different called sides.
In one embodiment of this invention, in step (4), further comprise:(41) resolution server is to described original Resource list carries out pretreatment operation;(42) after pretreatment operation, resolution server carries out data to the source material list Extraction operation, to obtain the pending data structure of standard;(43) resolution server enters to the pending data structure Formatting lines operate;(44) data file of resolution server outputting standard form.
In one embodiment of this invention, further comprise in step (41):(411) resolution server is by loading on The default resolution rules of internal memory carry out judging whether source material list is word document, if so, step (412) is then performed, if not It is then to perform step (42);(412) word analysis programs are loaded, the word document are converted into text document, and be back to Step (42).
In one embodiment of this invention, further comprise in step (42):(421) resolution server is by loading on The default resolution rules of internal memory carry out judging whether source material list is excel documents;If so, step (422) is then performed, if not It is then directly to perform step (423);(422) excel analysis programs are loaded, the excel documents are read by POI modes, and Pending data structure is translated into, and is back to step (43);(423) resolution server is by loading on the pre- of internal memory If resolution rules carry out judging whether source material list is text document, if so, step (424) is then performed, if it is not, then parsing Server sends a parsing unexpected message to second server;(424) loading text analysis program, the text text is read line by line Shelves, and pending data structure is converted into by regular expression and exhaustive dictionary, and it is back to step (43).
In one embodiment of this invention, in step (43), further comprise:(431) resolution server is to pending Item of information in data structure carries out completion operation;(432) resolution server treats the repetition letter in handle data structures body Cease item and carry out cleaning operation;(433) item of information that resolution server is treated in handle data structures body carries out fractured operation.
In addition, the present invention also provides a kind of data analyzing system of steel trade industry stock resource, including:One resource list parses Request module, the resource list analysis request module are used for user and send a resource list analysis request to first server;One Task requests generation module to be resolved, the task requests generation module to be resolved and the resource list analysis request module phase Even, the task requests generation module to be resolved is used to make first server generate task requests to be resolved by asynchronous system, And the task requests to be resolved are stored to second server;One source material list acquisition module, the source material list obtain Modulus block is connected with the task requests generation module to be resolved, and the source material list acquisition module is used to make resolution server Timing transfers the task requests to be resolved from the second server, and is obtained and the task to be resolved from file server Ask corresponding source material list;One Reference data file generation module, the Reference data file generation module with it is described Source material list acquisition module is connected, and the Reference data file generation module is used to make resolution server pass through the solution built in one Analysis rule module resolves to the source material list data file of reference format, and the data file is stored to described File server;One analysis result generation module, the analysis result generation module and the Reference data file generation module It is connected, the analysis result generation module is used to make resolution server send an analysis result to the called side, and updates The mark of corresponding task requests to be resolved in the second server.
In one embodiment of this invention, the task requests generation module to be resolved is additionally operable to make to be generated to be resolved Task requests include unique task identification number, and the task identification number is used to distinguish different task requests to be resolved, with It is easy to after resolution server sends analysis result to called side, called side identifies the task identification in the analysis result Number, and the address information in the analysis result is called with the task identification number relatively from corresponding file server The data file answered.
In one embodiment of this invention, the task requests generation module to be resolved is additionally operable to make to be generated to be resolved Also include a source-information in task requests to identify, the source-information is identified for distinguishing different called sides.
In one embodiment of this invention, the Reference data file generation module further comprises:One source material list Pretreatment unit, the source material list pretreatment unit are used to make resolution server pre-process the source material list Operation;One pending data structure acquiring unit, the pending data structure acquiring unit and the source material list Pretreatment unit is connected, and the pending data structure acquiring unit is used for after pretreatment operation, makes resolution server Data extraction operation is carried out to the source material list, to obtain the pending data structure of standard;One data structure physique Formula unit, the data structure format unit are connected with the pending data structure acquiring unit, the data structure Body format unit is used to make resolution server be formatted operation to the pending data structure;One data file exports Unit, the data file output unit are connected with the data structure format unit, and the data file output unit is used In the data file for making resolution server outputting standard form.
In one embodiment of this invention, the source material list pretreatment unit further comprises:One word document is sentenced Disconnected subelement, the word document judgment sub-unit are used to make resolution server by loading on the default resolution rules of internal memory enter Row judges whether source material list is word document;One word document parses subelement, word document parsing subelement with The word document judgment sub-unit is connected, and the word document parsing subelement, which is used to work as, judges that source material list is word During document, word analysis programs are loaded, the word document are converted into text document, and call the pending data structure Body acquiring unit.
In one embodiment of this invention, the pending data structure acquiring unit further comprises:One excel texts Shelves judgment sub-unit, the excel documents judgment sub-unit are used for the default parsing for making resolution server by loading on internal memory Rule carries out judging whether source material list is excel documents;One excel documents parse subelement, the excel documents parsing Subelement is connected with the excel documents judgment sub-unit, and the excel documents parsing subelement, which is used to work as, judges original money When source list is excel documents, excel analysis programs are loaded, the excel documents are read by POI modes, and be translated into Pending data structure, and call the data structure format unit;One text document judgment sub-unit, the text text Shelves judgment sub-unit is connected with the excel documents judgment sub-unit, and the text document judgment sub-unit is used to take parsing Business device carries out judging whether source material list is text document by loading on the default resolution rules of internal memory;One text document solution Subelement is analysed, the text document parsing subelement is connected with the text document judgment sub-unit, the text document parsing Subelement is used for when it is text document to judge source material list, loading text analysis program, reads the text text line by line Shelves, and pending data structure is converted into by regular expression and exhaustive dictionary, and call the data structure form Unit;One parsing unexpected message transmission sub-unit, the parsing unexpected message transmission sub-unit judge son with the text document Unit is connected, and the parsing unexpected message transmission sub-unit is used for when it is not text document to judge source material list, makes solution Analyse server and send a parsing unexpected message to second server.
In one embodiment of this invention, the Reference data file generation module further comprises:One item of information completion Unit, described information item completion unit are used to make resolution server treat the item of information progress completion behaviour in handle data structures body Make;One item of information cleaning unit, described information item cleaning unit are connected with described information item completion unit, the cleaning of described information item Unit is used to make resolution server treat the duplicate message item progress cleaning operation in handle data structures body;One item of information is split Unit, described information item split cells are connected with described information item cleaning unit, and described information item split cells is used to make parsing The item of information that server is treated in handle data structures body carries out fractured operation.
It is an advantage of the current invention that the data analysis method of the steel trade industry stock resource of the embodiment of the present invention by using Unified interface is turned original resource list parsing using the resolution rules module built in one to obtain resource list to be resolved It is changed into the data file with reference format, so that multiple called sides use, so as to realize parsing separation so that each called side Default resolution rules it is unaffected each other, and support multiple format the parsing of resource list and the processing of super large file. And methods described has the characteristics that real-time servicing, accuracy rate are high, efficiency is good and expansible strong.
Brief description of the drawings
Fig. 1 is the data analysis method flow chart of steps of the steel trade industry stock resource of one embodiment of the invention.
Fig. 2 is the sub-step flow chart of the step S140 in the data analysis method of embodiment of the present invention.
Fig. 3 is the step flow chart of the sub-step S141 and S142 in the data analysis method of embodiment of the present invention.
Fig. 4 is the step flow chart of the sub-step S143 in the data analysis method of embodiment of the present invention.
Fig. 5 is the frame diagram of the data analyzing system of the steel trade industry stock resource of another embodiment of the present invention.
Fig. 6 is the framework of the Reference data file generation module in the data analyzing system of another embodiment of the present invention Figure.
Fig. 7 is the framework of the source material list pretreatment unit in the data analyzing system of another embodiment of the present invention Figure.
Fig. 8 is the pending data structure acquiring unit in the data analyzing system of another embodiment of the present invention Frame diagram.
Fig. 9 is the data structure format unit in the data analyzing system of another embodiment of the present invention.
Embodiment
Data analysis method to steel trade industry stock resource provided by the invention and its tool of system below in conjunction with the accompanying drawings Body embodiment elaborates.
It is shown in Figure 1, a kind of data parsing side of steel trade industry stock resource is provided in one embodiment of the invention Method, it comprises the following steps:
Step S110:Called side sends a resource list analysis request to first server.
In this step, the called side can refer to user or operator platform related personnel, can also refer to an equipment Or device.The web-site or operator platform (such as ERP that user or operator platform related personnel are cooperated by steel supplier Source material list is uploaded from the background) to resource list analyzing platform.In the present embodiment, the called side can be multiple both to include The web-site that steel supplier cooperates, also including operator's platform.In addition, parsing rule are preset workable for each called side Then can difference (see below description).In an embodiment of the present invention, the resource list analyzing platform includes being used to receive resource The first server of single analysis request, the file server (such as ftp file servers) for storing source material list, it is used for Record the second server of task requests to be resolved and multiple resolution servers.
In addition, called side is by the file server in source material single upload to resource list analyzing platform.In the present embodiment In, file server corresponding with called side can be preset, includes the address and path of this document server, wherein, institute Address and path configuration are stated in a routing table.Different file servers is corresponded to due to different called sides can be set, because This so that follow-up data parsing operation separates each other, and the default resolution rules of each called side are unaffected.
Step S120:First server generates task requests to be resolved by asynchronous system, and by the task to be resolved Request is stored to second server.
In this step, task requests to be resolved are generated using asynchronous system, so ensures that first server not It is impacted to continue executing with other operations, it is not necessary to as the method for synchronization will wait until that other works can just be carried out by receiving corresponding response Make.
In addition, in this step, the task requests to be resolved generated include unique task identification number, the task Identification number be used for distinguish different task requests to be resolved, in order to resolution server send analysis result to called side it Afterwards, called side identifies the task identification number in the analysis result, and the address information slave phase in the analysis result The data file corresponding with the task identification number is called in the file server answered.That is, due to first server Multiple task requests to be resolved can be once sent, and first server generates task requests to be resolved by asynchronous system, the One server is when performing other operations, if receive the analysis result transmitted by analytically server, if not having task mark Knowledge number can not then confirm the analysis result for responding and generating according to which task requests to be resolved.Therefore, setting is passed through Task identification number, first server can match corresponding analysis result according to task identification number.
Identified in addition, also including a source-information in this step, in the task requests to be resolved generated, the source Message identification is used to distinguish different called sides.So, after different called sides is distinguished, the resolution server can basis Different called side and call corresponding default resolution rules to be parsed in resolution rules module, so as to realize parsing Separation, makes the default resolution rules of each called side unaffected each other.Further, since different called sides can be distinguished, because This, can customize respective default resolution rules.
Step S130:Resolution server timing transfers the task requests to be resolved from the second server, and from text Part server obtains the source material list corresponding with the task requests to be resolved.
In this step, resolution server periodically can transfer the task to be resolved from the second server and ask Ask, and according to ftp addresses set in advance, obtained from corresponding file server corresponding with the task requests to be resolved Source material list.
Step S140:The source material list is resolved to standard by resolution server by the resolution rules module built in one The data file of form, and the data file is stored to the file server.
Shown in Figure 2, in the present embodiment, step S140 further comprises following sub-step:
Step S141:Resolution server carries out pretreatment operation to the source material list.
In the present embodiment, further comprise in step s 141:
It is shown in Figure 3, step S1411:Resolution server is judged by loading on the default resolution rules of internal memory Whether source material list is word document, if so, step S1412 is then performed, if it is not, then performing step S142.
Due to the default resolution rules in resolution rules module are loaded on into internal memory, including word analysis programs and afterwards Excel analysis programs, the resolution rules of text resolution program and completion operation, cleaning operation and fractured operation, accordingly, it is capable to Enough reach the effect of reading speed faster, more efficient.In addition, by refreshing the default solution of internal memory (or caching) can renewal Analyse the configuration of rule.
Step S1412:Word analysis programs are loaded, the word document are converted into text document, and be back to step S142。
In step S1412, word analysis programs are loaded, word document are read as character string, parsing afterwards and txt Text document is identical.
In addition, pretreatment except it is above-mentioned word document is converted into the operation of text document in addition to, can also further wrap Include:The file size of source material list is verified, if this document size is more than 20M, operated without parsing;And Obtain the relevant information (such as the information such as file size, file type, file md5) of source material list and store to analysis service In the database of device.
Step S142:After pretreatment operation, resolution server carries out data extraction operation to the source material list, To obtain the pending data structure of standard.
In the present embodiment, further comprise in step S142:
Step S1421:Resolution server carries out judging that source material list is by loading on the default resolution rules of internal memory No is excel documents;If so, step S1422 is then performed, if it is not, then directly performing step S1423.
Step S1422:Excel analysis programs are loaded, the excel documents are read by POI modes, and be translated into Pending data structure, and it is back to step S143.
In this step, using POI modes (Poor Obfuscation Implementation, simple and crude fuzzy reality Existing, it can be by the read-write capability of JAVA operating Microsoft office external member instruments) read the excel documents. It is exactly to confirm all table areas, the excel documents is read in units of table area, so as to obtain source material set, That is pending data structure, wherein source material set (pending data structure) include multiple items of information.
Herein, item of information refers to:Category, material, specification, steel mill, warehouse, price, thickness, width, length etc., letter Item configuration is ceased in database.
Item of information alias refers to other literary styles being likely to occur of item of information, it is desirable to more than 2 character strings, except size. Such as " name of an article " is " category ", and " sale price " is " price ", and item of information alias is configured in database.
Item of information, which is enumerated, refers to value corresponding to item of information, " middle storage Golconda storehouse ", " Xiang Yuku " etc., information as corresponding to warehouse Item enumerates configuration in database.
Above-mentioned item of information, item of information alias and item of information be enumerated as setting in the database of resolution server three is not Same allocation list.When performing parsing operation, above three allocation list can be loaded in internal memory (or caching), for default Resolution rules call.
In addition, above-mentioned table area refers to:The allocation list that excel documents are analytically set in the database of server comes Item of information alias is read, travels through excel.Such as using specification as main gauge outfit, using thickness as main gauge outfit if without specification, from Gauge outfit down determines that table area stops until running into next gauge outfit, to define the coboundary of a table area and following Boundary;From main gauge outfit or so diverging until the 1st row or next main gauge outfit stopping, to determine the left margin of a table area And right margin.So, after table area is determined, the first row of table area is title, and the second row is until lower boundary Each one valid data of behavior.
Source material set:It is multiple to house category, material, specification, steel mill, warehouse, price, thickness, width, length etc. Item of information.
In addition, in the present invention, resource list analyzing platform has also accessed steel except access platform operator (called side) Supplier's (called side), and the source material list that steel supplier (called side) is provided is excel documents.Therefore, parse Server not only supports that source material list is word document and text document in the prior art, and support that source material list is Excel documents, and support super large document analysis.
Step S1423:Resolution server carries out judging that source material list is by loading on the default resolution rules of internal memory No is text document, if so, step S1424 is then performed, if it is not, then resolution server sends a parsing unexpected message to second Server.
If resolution server judges source material list neither word document or text document, nor excel documents, Then resolution server sends a parsing unexpected message to second server, to represent that resolution server does not support such original Beginning resource list, therefore, resolution server can not carry out the data extraction operation of correlation.
Step S1424:Loading text analysis program, the text document is read line by line, and by regular expression and thoroughly Lift dictionary and be converted into pending data structure, and be back to step S143.
In this step, resolution server is parsed line by line to text document, and text is split as according to space per a line Block.Each text block judges which item of information it belongs to according to regular expression.If it can not be entered by regular expression What row determined, then enumerated by item of information and matched in exhaustive dictionary, to judge which item of information belonged to, when all texts After block determines item of information, combined using such as specification as main item of information, and form source material set.
Due to first being matched in parsing such as word document, txt text documents using regular expression, exhaustion is reused Dictionary carries out exhaustion, thus, it is possible to reach higher, the more efficient effect of parsing accuracy rate.
In addition, in the present embodiment, parsing of the resolution server to text document is carried out with text block mode, with Prior art (is parsed according to row, a valid data can only be parsed per a line, if a line expresses multigroup significant figure According to when, then can not parse), it is simultaneously not limited to every row a data, can parse multigroup valid data in a line.Such as The content of certain a line is Q235B 2.5*1250 2.7*1250 in text document, and prior art can only parse one group of significant figure According to Q235B 2.7*1250, and the present invention can parse two Q235B 2.5*1250 and Q235B 2.7*1250.
In addition, in the present embodiment, when the resolution server parses to source material list, the wherein source material Single is text document, and certain a line content in this article this document is Q235B 2.5*1250=1500, and prior art can only parse Go out material:Q235B, specification 2.5*1250=1500,1500 can not be parsed and represent price, and the present invention can parse material Matter:Q235B, specification:2.5*1250, price:1500.Therefore, parsing accuracy rate of the invention is higher.
In addition, the resolution rules of prior art are only supported using specification as main node, and the resolution rules of the present invention are supported Using specification as main node, thickness is time main node.That is, when specification is space-time, resolution rules of the invention are roots Specification is assembled according to thickness * width * length.
In addition, it is noted that the present invention performs pre-parsed operation in step S142, the operation is only responsible for original money Source is singly read out, and is not included performing completion, cleaning (or duplicate removal), is split.
Step S143:Resolution server is formatted operation to the pending data structure.
It is shown in Figure 4, in step S143, further comprise:
Step S1431:The item of information that resolution server is treated in handle data structures body carries out completion operation.
In this step, completion is carried out to the item of information lacked by resolution rules, mended first from colleague, if colleague has word Section, it can match and lack a certain item of information of field and enumerate, then completion is that the item of information is enumerated;Secondly, colleague is unable to completion, Completion then is carried out from the comment line on the table area of excel documents, its completion logical sum colleague's completion logic is similar;Most Afterwards, if still can not completion, from lastrow inherit.
Another resolution rules that completion is carried out to the item of information that lacks are:If specification is empty and thickness, width, length It is not sky, using thickness, width, length, to be assembled into specification.
In addition, in the present embodiment, completion operation refers to completion category, material, specification, steel mill, warehouse.
Step S1432:The duplicate message item that resolution server is treated in handle data structures body carries out cleaning operation.
There are the resolution rules of 4 kinds of duplicate removals (or to clean).One of which is according to category, material, specification, steel mill four Latitude duplicate removal, if aforementioned four field is equal in source material set, duplicate data is considered as, only retains one.
Step S1433:The item of information that resolution server is treated in handle data structures body carries out fractured operation.
, can also be real in addition to implementation specification is split by the setting of the allocation list in the database of resolution server Existing category is split, steel mill splits, warehouse is split, material is split.In addition, the order and combination that split can also pass through analysis service The setting of allocation list in the database of device is realized.
Step S144:The data file of resolution server outputting standard form.
Step S150:Resolution server sends an analysis result to the called side, and updates the second server In corresponding task requests to be resolved mark.
In this step, complete parsing when resolution server to operate, and analysis result sent to the called side, Meanwhile resolution server can update the mark of corresponding task requests to be resolved in the second server, wait to solve to identify this Analysis task requests have completed parsing operation.In addition, when resolution server can not notify the called side, the resolution server A notice can be recorded to be identified in second server, and the state timing identified according to the notice is connected with second server, To send analysis result again to the called side.
In addition, data file is sent to file server set in advance, this document by resolution server in this step Server can be same server with the file server for being previously used for storing source material list, or difference service Device.When called side needs, data file can be sent to file server by resolution server with the document of excel forms, Used in order to the called side of (outside).
In addition, another embodiment of the present invention also provides a kind of data analyzing system of steel trade industry stock resource, can join As shown in Figure 5.
The system includes:One resource list analysis request module 510, the resource list analysis request module 510 are used to make A resource list analysis request is sent to first server with side;One task requests generation module 520 to be resolved, described to be resolved Business request generation module 520 is connected with the resource list analysis request module 510, the task requests generation module to be resolved 520 are used to make first server generate task requests to be resolved by asynchronous system, and the task requests to be resolved are stored To second server;One source material list acquisition module 530, the source material list acquisition module 530 with described to be resolved Business request generation module 520 is connected, and the source material list acquisition module 530 is used to make resolution server timing from described second Server transfers the task requests to be resolved, and the original corresponding with the task requests to be resolved is obtained from file server Beginning resource list;One Reference data file generation module 540, the Reference data file generation module 540 and the source material Single acquisition module 530 is connected, and the Reference data file generation module 540 is used to make resolution server pass through the parsing built in one Rule module resolves to the source material list data file of reference format, and the data file is stored to the text Part server;One analysis result generation module 550, the analysis result generation module 550 generate with the Reference data file Module 540 is connected, and the analysis result generation module 550 is used to make resolution server send an analysis result to the calling Side, and update the mark of corresponding task requests to be resolved in the second server.
In another embodiment, what the task requests generation module 520 to be resolved was additionally operable to make to be generated waits to solve Analysis task requests include unique task identification number, and the task identification number is used to distinguish different task requests to be resolved, After sending analysis result to called side in resolution server, called side identifies the task mark in the analysis result Knowledge number, and the address information in the analysis result is called and the task identification phase from corresponding file server Corresponding data file.
In addition, gone back in the task requests to be resolved that the task requests generation module 520 to be resolved is additionally operable to make to be generated Identified including a source-information, the source-information is identified for distinguishing different called sides.
Referring to Fig. 6, in another embodiment, the Reference data file generation module 540 further comprises:One is former Beginning resource list pretreatment unit 541, the source material list pretreatment unit 541 are used to make resolution server to the original money Source singly carries out pretreatment operation;One pending data structure acquiring unit 542, the pending data structure acquiring unit 542 are connected with the source material list pretreatment unit 541, and the pending data structure acquiring unit 542 is used for pre- After processing operation, resolution server is set to carry out data extraction operation to the source material list, to obtain the pending of standard Data structure;One data structure format unit 543, the data structure format unit 543 and the pending data Structure acquiring unit 542 is connected, and the data structure format unit 543 is used to make resolution server to the pending number Operation is formatted according to structure;One data file output unit 544, the data file output unit 544 and the number It is connected according to structure format unit 543, the data file output unit 544 is used to make resolution server outputting standard form Data file.
Wherein, referring to shown in Fig. 7 and Fig. 8, the source material list pretreatment unit 541 further comprises:One word texts Shelves judgment sub-unit 5411, the word document judgment sub-unit 5411 are used to make resolution server by loading on the pre- of internal memory If resolution rules carry out judging whether source material list is word document;One word document parses subelement 5412, the word Document parsing subelement 5412 is connected with the word document judgment sub-unit 5411, and the word document parses subelement 5412 For when it is word document to judge source material list, loading word analysis programs, the word document being converted into text Document, and call the pending data structure acquiring unit 542.
The pending data structure acquiring unit 542 further comprises:One excel documents judgment sub-unit 5421, The excel documents judgment sub-unit 5421 is used to make resolution server by loading on the default resolution rules of internal memory be sentenced Whether disconnected source material list is excel documents;One excel documents parse subelement 5422, and the excel documents parse subelement 5422 are connected with the excel documents judgment sub-unit 5421, and the excel documents parsing subelement 5422, which is used to work as, to be judged When source material list is excel documents, load excel analysis programs, the excel documents read by POI modes, and by its Pending data structure is converted into, and calls the data structure format unit 543;One text document judgment sub-unit 5423, the text document judgment sub-unit 5423 is connected with the excel documents judgment sub-unit 5421, the text document Judgment sub-unit 5423 is used to make resolution server carry out judging source material list by loading on the default resolution rules of internal memory Whether it is text document;One text document parses subelement 5424, the text document parsing subelement 5424 and the text Document judgment sub-unit 5423 is connected, and the text document parsing subelement 5424, which is used to work as, judges that source material list is text During document, loading text analysis program, the text document is read line by line, and be converted into by regular expression and exhaustive dictionary Pending data structure, and call the data structure format unit 543;One parsing unexpected message transmission sub-unit 5425, the parsing unexpected message transmission sub-unit 5425 is connected with the text document judgment sub-unit 5423, the parsing Unexpected message transmission sub-unit 5425 is used for when it is not text document to judge source material list, resolution server is sent one Unexpected message is parsed to second server.
Shown in Figure 9, the Reference data file generation module 540 further comprises:One item of information completion unit 5431, the item of information that described information item completion unit 5431 is used to make resolution server treat in handle data structures body is mended Full operation;One item of information cleaning unit 5432, described information item cleaning unit 5432 and the phase of described information item completion unit 5431 Even, described information item cleaning unit 5432 is used to make resolution server treat the duplicate message item progress in handle data structures body Cleaning operation;One item of information split cells 5433, described information item split cells 5433 and described information item cleaning unit 5432 It is connected, the item of information that described information item split cells 5433 is used to make resolution server treat in handle data structures body is torn open Divide operation.
The data analysis method of the steel trade industry stock resource of the embodiment of the present invention is obtained by using unified interface Resource list to be resolved, and original resource list parsing is changed into reference format using the resolution rules module built in one Data file, so that multiple called sides use, so as to realize parsing separation so that the resolution rules of each called side are each other not It is impacted, and support the parsing of resource list and the processing of super large file of multiple format.And methods described has dimension in real time The features such as shield, accuracy rate are high, efficiency is good and expansible strong.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art Member, under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be regarded as Protection scope of the present invention.

Claims (14)

1. a kind of data analysis method of steel trade industry stock resource, it is characterised in that comprise the following steps:
(1) called side sends a resource list analysis request to first server;
(2) first server generates task requests to be resolved by asynchronous system, and by the task requests to be resolved store to Second server;
(3) resolution server timing transfers the task requests to be resolved from the second server, and is obtained from file server Obtain the source material list corresponding with the task requests to be resolved;
(4) resolution server resolves to the source material list by the resolution rules module built in one data of reference format Document, and the data file is stored to the file server;
(5) resolution server sends an analysis result to the called side, and updates and treated accordingly in the second server Parse the mark of task requests.
2. data analysis method according to claim 1, it is characterised in that in step (2), generated to be resolved Business request includes unique task identification number, and the task identification number is used to distinguish different task requests to be resolved, so as to After analysis result to called side is sent in resolution server, called side identifies the task identification in the analysis result Number, and the address information in the analysis result is called with the task identification number relatively from corresponding file server The data file answered.
3. data analysis method according to claim 1, it is characterised in that in step (2), generated to be resolved Also include a source-information in business request to identify, the source-information is identified for distinguishing different called sides.
4. data analysis method according to claim 1, it is characterised in that in step (4), further comprise:
(41) resolution server carries out pretreatment operation to the source material list;
(42) after pretreatment operation, resolution server carries out data extraction operation to the source material list, to be marked Accurate pending data structure;
(43) resolution server is formatted operation to the pending data structure;
(44) data file of resolution server outputting standard form.
5. data analysis method according to claim 4, it is characterised in that further comprise in step (41):
(411) resolution server carries out judging whether source material list is word texts by loading on the default resolution rules of internal memory Shelves, if so, step (412) is then performed, if it is not, then performing step (42);
(412) word analysis programs are loaded, the word document are converted into text document, and be back to step (42).
6. data analysis method according to claim 4, it is characterised in that further comprise in step (42):
(421) resolution server carries out judging whether source material list is excel by loading on the default resolution rules of internal memory Document;If so, step (422) is then performed, if it is not, then directly performing step (423);
(422) excel analysis programs are loaded, the excel documents are read by POI modes, and be translated into pending number According to structure, and it is back to step (43);
(423) resolution server carries out judging whether source material list is text text by loading on the default resolution rules of internal memory Shelves, if so, step (424) is then performed, if it is not, then resolution server sends a parsing unexpected message to second server;
(424) loading text analysis program, the text document is read line by line, and converted by regular expression and exhaustive dictionary For pending data structure, and it is back to step (43).
7. data analysis method according to claim 4, it is characterised in that in step (43), further comprise:
(431) item of information that resolution server is treated in handle data structures body carries out completion operation;
(432) the duplicate message item that resolution server is treated in handle data structures body carries out cleaning operation;
(433) item of information that resolution server is treated in handle data structures body carries out fractured operation.
A kind of 8. data analyzing system of steel trade industry stock resource, it is characterised in that including:
One resource list analysis request module, the resource list analysis request module are used for user and send a resource list analysis request To first server;
One task requests generation module to be resolved, the task requests generation module to be resolved and the resource list analysis request mould Block is connected, and the task requests generation module to be resolved please for making first server generate task to be resolved by asynchronous system Ask, and the task requests to be resolved are stored to second server;
One source material list acquisition module, the source material list acquisition module and the task requests generation module phase to be resolved Even, the source material list acquisition module is used to make resolution server timing transfer described to be resolved from the second server Business request, and the source material list corresponding with the task requests to be resolved is obtained from file server;
One Reference data file generation module, the Reference data file generation module and the source material list acquisition module phase Even, the Reference data file generation module is used to make resolution server will be described original by the resolution rules module built in one Resource list resolves to the data file of reference format, and the data file is stored to the file server;
One analysis result generation module, the analysis result generation module are connected with the Reference data file generation module, institute State analysis result generation module to be used to make resolution server send an analysis result to the called side, and update described second The mark of corresponding task requests to be resolved in server.
9. the data analyzing system of steel trade industry stock resource according to claim 8, it is characterised in that described to be resolved The task requests to be resolved that task requests generation module is additionally operable to make to be generated include unique task identification number, the task Identification number be used for distinguish different task requests to be resolved, in order to resolution server send analysis result to called side it Afterwards, called side identifies the task identification number in the analysis result, and the address information slave phase in the analysis result The data file corresponding with the task identification number is called in the file server answered.
10. the data analyzing system of steel trade industry stock resource according to claim 8, it is characterised in that described to wait to solve Also include a source-information in the task requests to be resolved that analysis task requests generation module is additionally operable to make to be generated to identify, it is described next Source information is identified for distinguishing different called sides.
11. the data analyzing system of steel trade industry stock resource according to claim 8, it is characterised in that the standard Data file generation module further comprises:
One source material list pretreatment unit, the source material list pretreatment unit are used to make resolution server to described original Resource list carries out pretreatment operation;
One pending data structure acquiring unit, the pending data structure acquiring unit and the source material list are pre- Processing unit is connected, and the pending data structure acquiring unit is used for after pretreatment operation, makes resolution server pair The source material list carries out data extraction operation, to obtain the pending data structure of standard;
One data structure format unit, the data structure format unit and the pending data structure acquiring unit It is connected, the data structure format unit is used to make resolution server be formatted behaviour to the pending data structure Make;
One data file output unit, the data file output unit is connected with the data structure format unit, described Data file output unit is used for the data file for making resolution server outputting standard form.
12. the data analyzing system of steel trade industry stock resource according to claim 11, it is characterised in that described original Resource list pretreatment unit further comprises:
One word document judgment sub-unit, the word document judgment sub-unit are used to make resolution server by loading on internal memory Default resolution rules carry out judging whether source material list is word document;
One word document parses subelement, and the word document parsing subelement is connected with the word document judgment sub-unit, The word document parsing subelement is used for when it is word document to judge source material list, loads word analysis programs, will The word document is converted into text document, and calls the pending data structure acquiring unit.
13. the data analyzing system of steel trade industry stock resource according to claim 11, it is characterised in that described to wait to locate Reason data structure acquiring unit further comprises:
One excel document judgment sub-units, the excel documents judgment sub-unit are used to make resolution server pass through in loading on The default resolution rules deposited carry out judging whether source material list is excel documents;
One excel documents parse subelement, the excel documents parsing subelement and the excel documents judgment sub-unit phase Even, the excel documents parsing subelement is used for when it is excel documents to judge source material list, loading excel parsing journeys Sequence, the excel documents are read by POI modes, and be translated into pending data structure, and call the data knot Structure body format unit;
One text document judgment sub-unit, the text document judgment sub-unit are connected with the excel documents judgment sub-unit, The text document judgment sub-unit is used to make resolution server carry out judging original by loading on the default resolution rules of internal memory Whether beginning resource list is text document;
One text document parses subelement, and the text document parsing subelement is connected with the text document judgment sub-unit, The text document parsing subelement is used for when it is text document to judge source material list, loading text analysis program, by Row reads the text document, and is converted into pending data structure by regular expression and exhaustive dictionary, and calls institute State data structure format unit;
One parsing unexpected message transmission sub-unit, the parsing unexpected message transmission sub-unit judge that son is single with the text document Member is connected, and the parsing unexpected message transmission sub-unit is used for when it is not text document to judge source material list, makes parsing Server sends a parsing unexpected message to second server.
14. the data analyzing system of steel trade industry stock resource according to claim 11, it is characterised in that the standard Data file generation module further comprises:
One item of information completion unit, described information item completion unit are used to make resolution server treat in handle data structures body Item of information carries out completion operation;
One item of information cleaning unit, described information item cleaning unit are connected with described information item completion unit, described information Xiang Qing Unit is washed to be used to make resolution server treat the duplicate message item progress cleaning operation in handle data structures body;
One item of information split cells, described information item split cells are connected with described information item cleaning unit, and described information item is torn open Subdivision is used to make resolution server treat the item of information progress fractured operation in handle data structures body.
CN201710722845.5A 2017-08-22 2017-08-22 A kind of data analysis method and its system of steel trade industry stock resource Pending CN107562701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710722845.5A CN107562701A (en) 2017-08-22 2017-08-22 A kind of data analysis method and its system of steel trade industry stock resource

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710722845.5A CN107562701A (en) 2017-08-22 2017-08-22 A kind of data analysis method and its system of steel trade industry stock resource

Publications (1)

Publication Number Publication Date
CN107562701A true CN107562701A (en) 2018-01-09

Family

ID=60976626

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710722845.5A Pending CN107562701A (en) 2017-08-22 2017-08-22 A kind of data analysis method and its system of steel trade industry stock resource

Country Status (1)

Country Link
CN (1) CN107562701A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119472A (en) * 2019-05-22 2019-08-13 欧冶云商股份有限公司 Steel product search method and system applied to the network platform
CN112800049A (en) * 2021-04-06 2021-05-14 航天神舟智慧***技术有限公司 EXCEL data source cleaning method and system based on big data, electronic device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572898A (en) * 2014-12-22 2015-04-29 上海钢富电子商务有限公司 Data analysis method and data analysis system for steel trade industry spot commodity resource
CN104679819A (en) * 2014-12-22 2015-06-03 上海钢富电子商务有限公司 Data analysis method and system of spot resources for steel trading industry

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572898A (en) * 2014-12-22 2015-04-29 上海钢富电子商务有限公司 Data analysis method and data analysis system for steel trade industry spot commodity resource
CN104679819A (en) * 2014-12-22 2015-06-03 上海钢富电子商务有限公司 Data analysis method and system of spot resources for steel trading industry

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119472A (en) * 2019-05-22 2019-08-13 欧冶云商股份有限公司 Steel product search method and system applied to the network platform
CN112800049A (en) * 2021-04-06 2021-05-14 航天神舟智慧***技术有限公司 EXCEL data source cleaning method and system based on big data, electronic device and storage medium

Similar Documents

Publication Publication Date Title
US10169337B2 (en) Converting data into natural language form
US8234248B2 (en) Tracking changes to a business object
US8209318B2 (en) Product searching system and method using search logic according to each category
US11010360B2 (en) Extending tags for information resources
KR101877828B1 (en) User interface integrated platform system based on artificial intelligence
CN113282854A (en) Data request response method and device, electronic equipment and storage medium
CN107562701A (en) A kind of data analysis method and its system of steel trade industry stock resource
CN104298603B (en) A kind of inspection method of the correctness of application system version structure
JP2012190063A (en) Data generation device and data generation program
CN113918460A (en) Page testing method, device, equipment and medium
CN101520778A (en) Apparatus and method for determing parts-of-speech in chinese
CN101377772B (en) Method and system for globalizing support operations
JP5576570B2 (en) Method, program and system for generating a workflow from business specifications
CN106681852A (en) Method and device for adjusting browser compatibility
JP6634938B2 (en) Analysis support method, analysis support program, and analysis support device
JP2011175486A (en) Apparatus, program and method for supporting check of name collecting
US20110145656A1 (en) Analyzing A Distributed Computer System
JP2010191851A (en) Article feature word extraction device, article feature word extraction method and program
JP6677158B2 (en) Document data processing apparatus, document data processing method, and document data processing program
JP2007241916A (en) Program analysis method, program and program analyzer
CN107608837A (en) Method, device, readable medium and storage controller for positioning fault environment equipment
CN111177501B (en) Label processing method, device and system
JP5449023B2 (en) Information processing apparatus, information processing method, and program
Miled et al. A wrapper induction application with knowledge base support: A use case for initiation and maintenance of wrappers
US20180293231A1 (en) Linguistic intelligence using language validator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180109

RJ01 Rejection of invention patent application after publication