CN107562701A - A kind of data analysis method and its system of steel trade industry stock resource - Google Patents
A kind of data analysis method and its system of steel trade industry stock resource Download PDFInfo
- Publication number
- CN107562701A CN107562701A CN201710722845.5A CN201710722845A CN107562701A CN 107562701 A CN107562701 A CN 107562701A CN 201710722845 A CN201710722845 A CN 201710722845A CN 107562701 A CN107562701 A CN 107562701A
- Authority
- CN
- China
- Prior art keywords
- server
- unit
- resolved
- data
- source material
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data analysis method and its system of steel trade industry stock resource, the described method comprises the following steps:(1) called side sends a resource list analysis request to first server;(2) first server generates task requests to be resolved by asynchronous system, and the task requests to be resolved are stored to second server;(3) resolution server timing transfers the task requests to be resolved from the second server, and the source material list corresponding with the task requests to be resolved is obtained from file server;(4) resolution server resolves to the source material list by the resolution rules module built in one data file of reference format, and the data file is stored to the file server;(5) resolution server sends an analysis result to the called side, and updates the mark of corresponding task requests to be resolved in the second server.
Description
Technical field
The present invention relates to data analysis technique field, more particularly to a kind of data analysis method of steel trade industry stock resource
And its system.
Background technology
As other industry, user enters any one steel trade industrial sustainability, passes through input for " search " of steel trade industry
Keyword can be carried out searching for.In steel trade industry, the given area scope for searching target information is referred to as " resource pool ".
" resource pool " of steel trade industry generally comprises following information:Category, material, specification, steel mill, warehouse, price, quantity, size etc.
Deng.In addition, the data in " resource pool " generally even carry out being continuously updated data per hour daily.
" resource pool " data for modernizing steel trade industrial sustainability typically will by different steel suppliers and platform operation business
Then source material single upload comprising stock information of freight source collects what is got into the database of website again to website.These
Uploaded as source material is single with common document format, such as word, excel or txt document.
Because the resource single lattice formula that each supplier or platform operation business upload is not quite similar, some does not have form even
Pure natural language, but the final data needed to store to database must be in strict accordance with the regular number of the ranks of website requests
According to table, therefore, it is necessary to by it is various it is rambling, form is various, data extract in the resource list of miscellaneous editions, arrange,
Screen, be organized into the valid data of unified form.
Existing data analyzing platform is that the code developed by using donet technologies provides multiple business and for not
Resource list with form carries out data parsing.Because there is ground in the non-standard resources list that each supplier or platform operation business provide
Domain otherness, and the resource list of each form needs to use individually rule configuration, and therefore, this will increase data parsing
Complexity.In addition, existing data analyzing platform is on a single machine while runs multiple business, data solution is easily caused
Resource scarcity and systematic function decline during analysis.In addition, business used in existing data analyzing platform can not be to space-consuming
Larger resource list and the less resource list differentiation processing of space-consuming, can so cause the less money of suitable treatment space-consuming
In the larger resource list of processing space-consuming such as stuck abnormal conditions, and have influence on other business occur for the single business in source
Perform the operation of data parsing.In addition, existing data analyzing platform in resolving, is only supported to fix a dispatching party, expand
Malleability is poor and is difficult to safeguard.
In view of this, a kind of new data analysis method and its system need to be provided to solve the above problems.
The content of the invention
It is an object of the present invention to provide a kind of data analysis method of steel trade industry stock resource, methods described is logical
Cross using unified interface to obtain resource list to be resolved, and using the resolution rules module built in one by original resource list
Parsing is changed into the data file with reference format, so that multiple called sides use, so as to realize parsing separation so that each
The default resolution rules of called side are unaffected each other, and support parsing and the super large file of the resource list of multiple format
Processing.And methods described has the characteristics that real-time servicing, accuracy rate are high, efficiency is good and expansible strong.
In order to solve the above problems, the invention provides a kind of data analysis method of steel trade industry stock resource, and it is wrapped
Include following steps:(1) called side sends a resource list analysis request to first server;(2) first server passes through asynchronous side
Formula generates task requests to be resolved, and the task requests to be resolved are stored to second server;(3) resolution server timing
The task requests to be resolved are transferred from the second server, and are obtained and the task requests to be resolved from file server
Corresponding source material list;(4) resolution server is parsed the source material list by the resolution rules module built in one
For the data file of reference format, and the data file is stored to the file server;(5) resolution server sends one
Analysis result updates the mark of corresponding task requests to be resolved in the second server to the called side.
In one embodiment of this invention, in step (2), the task requests to be resolved generated include unique appoint
Business identification number, the task identification number are used to distinguish different task requests to be resolved, solved in order to be sent in resolution server
After analysing result to called side, called side identifies the task identification number in the analysis result, and according to the analysis result
In address information the data file corresponding with the task identification number is called from corresponding file server.
In one embodiment of this invention, in step (2), a source is also included in the task requests to be resolved generated
Message identification, the source-information are identified for distinguishing different called sides.
In one embodiment of this invention, in step (4), further comprise:(41) resolution server is to described original
Resource list carries out pretreatment operation;(42) after pretreatment operation, resolution server carries out data to the source material list
Extraction operation, to obtain the pending data structure of standard;(43) resolution server enters to the pending data structure
Formatting lines operate;(44) data file of resolution server outputting standard form.
In one embodiment of this invention, further comprise in step (41):(411) resolution server is by loading on
The default resolution rules of internal memory carry out judging whether source material list is word document, if so, step (412) is then performed, if not
It is then to perform step (42);(412) word analysis programs are loaded, the word document are converted into text document, and be back to
Step (42).
In one embodiment of this invention, further comprise in step (42):(421) resolution server is by loading on
The default resolution rules of internal memory carry out judging whether source material list is excel documents;If so, step (422) is then performed, if not
It is then directly to perform step (423);(422) excel analysis programs are loaded, the excel documents are read by POI modes, and
Pending data structure is translated into, and is back to step (43);(423) resolution server is by loading on the pre- of internal memory
If resolution rules carry out judging whether source material list is text document, if so, step (424) is then performed, if it is not, then parsing
Server sends a parsing unexpected message to second server;(424) loading text analysis program, the text text is read line by line
Shelves, and pending data structure is converted into by regular expression and exhaustive dictionary, and it is back to step (43).
In one embodiment of this invention, in step (43), further comprise:(431) resolution server is to pending
Item of information in data structure carries out completion operation;(432) resolution server treats the repetition letter in handle data structures body
Cease item and carry out cleaning operation;(433) item of information that resolution server is treated in handle data structures body carries out fractured operation.
In addition, the present invention also provides a kind of data analyzing system of steel trade industry stock resource, including:One resource list parses
Request module, the resource list analysis request module are used for user and send a resource list analysis request to first server;One
Task requests generation module to be resolved, the task requests generation module to be resolved and the resource list analysis request module phase
Even, the task requests generation module to be resolved is used to make first server generate task requests to be resolved by asynchronous system,
And the task requests to be resolved are stored to second server;One source material list acquisition module, the source material list obtain
Modulus block is connected with the task requests generation module to be resolved, and the source material list acquisition module is used to make resolution server
Timing transfers the task requests to be resolved from the second server, and is obtained and the task to be resolved from file server
Ask corresponding source material list;One Reference data file generation module, the Reference data file generation module with it is described
Source material list acquisition module is connected, and the Reference data file generation module is used to make resolution server pass through the solution built in one
Analysis rule module resolves to the source material list data file of reference format, and the data file is stored to described
File server;One analysis result generation module, the analysis result generation module and the Reference data file generation module
It is connected, the analysis result generation module is used to make resolution server send an analysis result to the called side, and updates
The mark of corresponding task requests to be resolved in the second server.
In one embodiment of this invention, the task requests generation module to be resolved is additionally operable to make to be generated to be resolved
Task requests include unique task identification number, and the task identification number is used to distinguish different task requests to be resolved, with
It is easy to after resolution server sends analysis result to called side, called side identifies the task identification in the analysis result
Number, and the address information in the analysis result is called with the task identification number relatively from corresponding file server
The data file answered.
In one embodiment of this invention, the task requests generation module to be resolved is additionally operable to make to be generated to be resolved
Also include a source-information in task requests to identify, the source-information is identified for distinguishing different called sides.
In one embodiment of this invention, the Reference data file generation module further comprises:One source material list
Pretreatment unit, the source material list pretreatment unit are used to make resolution server pre-process the source material list
Operation;One pending data structure acquiring unit, the pending data structure acquiring unit and the source material list
Pretreatment unit is connected, and the pending data structure acquiring unit is used for after pretreatment operation, makes resolution server
Data extraction operation is carried out to the source material list, to obtain the pending data structure of standard;One data structure physique
Formula unit, the data structure format unit are connected with the pending data structure acquiring unit, the data structure
Body format unit is used to make resolution server be formatted operation to the pending data structure;One data file exports
Unit, the data file output unit are connected with the data structure format unit, and the data file output unit is used
In the data file for making resolution server outputting standard form.
In one embodiment of this invention, the source material list pretreatment unit further comprises:One word document is sentenced
Disconnected subelement, the word document judgment sub-unit are used to make resolution server by loading on the default resolution rules of internal memory enter
Row judges whether source material list is word document;One word document parses subelement, word document parsing subelement with
The word document judgment sub-unit is connected, and the word document parsing subelement, which is used to work as, judges that source material list is word
During document, word analysis programs are loaded, the word document are converted into text document, and call the pending data structure
Body acquiring unit.
In one embodiment of this invention, the pending data structure acquiring unit further comprises:One excel texts
Shelves judgment sub-unit, the excel documents judgment sub-unit are used for the default parsing for making resolution server by loading on internal memory
Rule carries out judging whether source material list is excel documents;One excel documents parse subelement, the excel documents parsing
Subelement is connected with the excel documents judgment sub-unit, and the excel documents parsing subelement, which is used to work as, judges original money
When source list is excel documents, excel analysis programs are loaded, the excel documents are read by POI modes, and be translated into
Pending data structure, and call the data structure format unit;One text document judgment sub-unit, the text text
Shelves judgment sub-unit is connected with the excel documents judgment sub-unit, and the text document judgment sub-unit is used to take parsing
Business device carries out judging whether source material list is text document by loading on the default resolution rules of internal memory;One text document solution
Subelement is analysed, the text document parsing subelement is connected with the text document judgment sub-unit, the text document parsing
Subelement is used for when it is text document to judge source material list, loading text analysis program, reads the text text line by line
Shelves, and pending data structure is converted into by regular expression and exhaustive dictionary, and call the data structure form
Unit;One parsing unexpected message transmission sub-unit, the parsing unexpected message transmission sub-unit judge son with the text document
Unit is connected, and the parsing unexpected message transmission sub-unit is used for when it is not text document to judge source material list, makes solution
Analyse server and send a parsing unexpected message to second server.
In one embodiment of this invention, the Reference data file generation module further comprises:One item of information completion
Unit, described information item completion unit are used to make resolution server treat the item of information progress completion behaviour in handle data structures body
Make;One item of information cleaning unit, described information item cleaning unit are connected with described information item completion unit, the cleaning of described information item
Unit is used to make resolution server treat the duplicate message item progress cleaning operation in handle data structures body;One item of information is split
Unit, described information item split cells are connected with described information item cleaning unit, and described information item split cells is used to make parsing
The item of information that server is treated in handle data structures body carries out fractured operation.
It is an advantage of the current invention that the data analysis method of the steel trade industry stock resource of the embodiment of the present invention by using
Unified interface is turned original resource list parsing using the resolution rules module built in one to obtain resource list to be resolved
It is changed into the data file with reference format, so that multiple called sides use, so as to realize parsing separation so that each called side
Default resolution rules it is unaffected each other, and support multiple format the parsing of resource list and the processing of super large file.
And methods described has the characteristics that real-time servicing, accuracy rate are high, efficiency is good and expansible strong.
Brief description of the drawings
Fig. 1 is the data analysis method flow chart of steps of the steel trade industry stock resource of one embodiment of the invention.
Fig. 2 is the sub-step flow chart of the step S140 in the data analysis method of embodiment of the present invention.
Fig. 3 is the step flow chart of the sub-step S141 and S142 in the data analysis method of embodiment of the present invention.
Fig. 4 is the step flow chart of the sub-step S143 in the data analysis method of embodiment of the present invention.
Fig. 5 is the frame diagram of the data analyzing system of the steel trade industry stock resource of another embodiment of the present invention.
Fig. 6 is the framework of the Reference data file generation module in the data analyzing system of another embodiment of the present invention
Figure.
Fig. 7 is the framework of the source material list pretreatment unit in the data analyzing system of another embodiment of the present invention
Figure.
Fig. 8 is the pending data structure acquiring unit in the data analyzing system of another embodiment of the present invention
Frame diagram.
Fig. 9 is the data structure format unit in the data analyzing system of another embodiment of the present invention.
Embodiment
Data analysis method to steel trade industry stock resource provided by the invention and its tool of system below in conjunction with the accompanying drawings
Body embodiment elaborates.
It is shown in Figure 1, a kind of data parsing side of steel trade industry stock resource is provided in one embodiment of the invention
Method, it comprises the following steps:
Step S110:Called side sends a resource list analysis request to first server.
In this step, the called side can refer to user or operator platform related personnel, can also refer to an equipment
Or device.The web-site or operator platform (such as ERP that user or operator platform related personnel are cooperated by steel supplier
Source material list is uploaded from the background) to resource list analyzing platform.In the present embodiment, the called side can be multiple both to include
The web-site that steel supplier cooperates, also including operator's platform.In addition, parsing rule are preset workable for each called side
Then can difference (see below description).In an embodiment of the present invention, the resource list analyzing platform includes being used to receive resource
The first server of single analysis request, the file server (such as ftp file servers) for storing source material list, it is used for
Record the second server of task requests to be resolved and multiple resolution servers.
In addition, called side is by the file server in source material single upload to resource list analyzing platform.In the present embodiment
In, file server corresponding with called side can be preset, includes the address and path of this document server, wherein, institute
Address and path configuration are stated in a routing table.Different file servers is corresponded to due to different called sides can be set, because
This so that follow-up data parsing operation separates each other, and the default resolution rules of each called side are unaffected.
Step S120:First server generates task requests to be resolved by asynchronous system, and by the task to be resolved
Request is stored to second server.
In this step, task requests to be resolved are generated using asynchronous system, so ensures that first server not
It is impacted to continue executing with other operations, it is not necessary to as the method for synchronization will wait until that other works can just be carried out by receiving corresponding response
Make.
In addition, in this step, the task requests to be resolved generated include unique task identification number, the task
Identification number be used for distinguish different task requests to be resolved, in order to resolution server send analysis result to called side it
Afterwards, called side identifies the task identification number in the analysis result, and the address information slave phase in the analysis result
The data file corresponding with the task identification number is called in the file server answered.That is, due to first server
Multiple task requests to be resolved can be once sent, and first server generates task requests to be resolved by asynchronous system, the
One server is when performing other operations, if receive the analysis result transmitted by analytically server, if not having task mark
Knowledge number can not then confirm the analysis result for responding and generating according to which task requests to be resolved.Therefore, setting is passed through
Task identification number, first server can match corresponding analysis result according to task identification number.
Identified in addition, also including a source-information in this step, in the task requests to be resolved generated, the source
Message identification is used to distinguish different called sides.So, after different called sides is distinguished, the resolution server can basis
Different called side and call corresponding default resolution rules to be parsed in resolution rules module, so as to realize parsing
Separation, makes the default resolution rules of each called side unaffected each other.Further, since different called sides can be distinguished, because
This, can customize respective default resolution rules.
Step S130:Resolution server timing transfers the task requests to be resolved from the second server, and from text
Part server obtains the source material list corresponding with the task requests to be resolved.
In this step, resolution server periodically can transfer the task to be resolved from the second server and ask
Ask, and according to ftp addresses set in advance, obtained from corresponding file server corresponding with the task requests to be resolved
Source material list.
Step S140:The source material list is resolved to standard by resolution server by the resolution rules module built in one
The data file of form, and the data file is stored to the file server.
Shown in Figure 2, in the present embodiment, step S140 further comprises following sub-step:
Step S141:Resolution server carries out pretreatment operation to the source material list.
In the present embodiment, further comprise in step s 141:
It is shown in Figure 3, step S1411:Resolution server is judged by loading on the default resolution rules of internal memory
Whether source material list is word document, if so, step S1412 is then performed, if it is not, then performing step S142.
Due to the default resolution rules in resolution rules module are loaded on into internal memory, including word analysis programs and afterwards
Excel analysis programs, the resolution rules of text resolution program and completion operation, cleaning operation and fractured operation, accordingly, it is capable to
Enough reach the effect of reading speed faster, more efficient.In addition, by refreshing the default solution of internal memory (or caching) can renewal
Analyse the configuration of rule.
Step S1412:Word analysis programs are loaded, the word document are converted into text document, and be back to step
S142。
In step S1412, word analysis programs are loaded, word document are read as character string, parsing afterwards and txt
Text document is identical.
In addition, pretreatment except it is above-mentioned word document is converted into the operation of text document in addition to, can also further wrap
Include:The file size of source material list is verified, if this document size is more than 20M, operated without parsing;And
Obtain the relevant information (such as the information such as file size, file type, file md5) of source material list and store to analysis service
In the database of device.
Step S142:After pretreatment operation, resolution server carries out data extraction operation to the source material list,
To obtain the pending data structure of standard.
In the present embodiment, further comprise in step S142:
Step S1421:Resolution server carries out judging that source material list is by loading on the default resolution rules of internal memory
No is excel documents;If so, step S1422 is then performed, if it is not, then directly performing step S1423.
Step S1422:Excel analysis programs are loaded, the excel documents are read by POI modes, and be translated into
Pending data structure, and it is back to step S143.
In this step, using POI modes (Poor Obfuscation Implementation, simple and crude fuzzy reality
Existing, it can be by the read-write capability of JAVA operating Microsoft office external member instruments) read the excel documents.
It is exactly to confirm all table areas, the excel documents is read in units of table area, so as to obtain source material set,
That is pending data structure, wherein source material set (pending data structure) include multiple items of information.
Herein, item of information refers to:Category, material, specification, steel mill, warehouse, price, thickness, width, length etc., letter
Item configuration is ceased in database.
Item of information alias refers to other literary styles being likely to occur of item of information, it is desirable to more than 2 character strings, except size.
Such as " name of an article " is " category ", and " sale price " is " price ", and item of information alias is configured in database.
Item of information, which is enumerated, refers to value corresponding to item of information, " middle storage Golconda storehouse ", " Xiang Yuku " etc., information as corresponding to warehouse
Item enumerates configuration in database.
Above-mentioned item of information, item of information alias and item of information be enumerated as setting in the database of resolution server three is not
Same allocation list.When performing parsing operation, above three allocation list can be loaded in internal memory (or caching), for default
Resolution rules call.
In addition, above-mentioned table area refers to:The allocation list that excel documents are analytically set in the database of server comes
Item of information alias is read, travels through excel.Such as using specification as main gauge outfit, using thickness as main gauge outfit if without specification, from
Gauge outfit down determines that table area stops until running into next gauge outfit, to define the coboundary of a table area and following
Boundary;From main gauge outfit or so diverging until the 1st row or next main gauge outfit stopping, to determine the left margin of a table area
And right margin.So, after table area is determined, the first row of table area is title, and the second row is until lower boundary
Each one valid data of behavior.
Source material set:It is multiple to house category, material, specification, steel mill, warehouse, price, thickness, width, length etc.
Item of information.
In addition, in the present invention, resource list analyzing platform has also accessed steel except access platform operator (called side)
Supplier's (called side), and the source material list that steel supplier (called side) is provided is excel documents.Therefore, parse
Server not only supports that source material list is word document and text document in the prior art, and support that source material list is
Excel documents, and support super large document analysis.
Step S1423:Resolution server carries out judging that source material list is by loading on the default resolution rules of internal memory
No is text document, if so, step S1424 is then performed, if it is not, then resolution server sends a parsing unexpected message to second
Server.
If resolution server judges source material list neither word document or text document, nor excel documents,
Then resolution server sends a parsing unexpected message to second server, to represent that resolution server does not support such original
Beginning resource list, therefore, resolution server can not carry out the data extraction operation of correlation.
Step S1424:Loading text analysis program, the text document is read line by line, and by regular expression and thoroughly
Lift dictionary and be converted into pending data structure, and be back to step S143.
In this step, resolution server is parsed line by line to text document, and text is split as according to space per a line
Block.Each text block judges which item of information it belongs to according to regular expression.If it can not be entered by regular expression
What row determined, then enumerated by item of information and matched in exhaustive dictionary, to judge which item of information belonged to, when all texts
After block determines item of information, combined using such as specification as main item of information, and form source material set.
Due to first being matched in parsing such as word document, txt text documents using regular expression, exhaustion is reused
Dictionary carries out exhaustion, thus, it is possible to reach higher, the more efficient effect of parsing accuracy rate.
In addition, in the present embodiment, parsing of the resolution server to text document is carried out with text block mode, with
Prior art (is parsed according to row, a valid data can only be parsed per a line, if a line expresses multigroup significant figure
According to when, then can not parse), it is simultaneously not limited to every row a data, can parse multigroup valid data in a line.Such as
The content of certain a line is Q235B 2.5*1250 2.7*1250 in text document, and prior art can only parse one group of significant figure
According to Q235B 2.7*1250, and the present invention can parse two Q235B 2.5*1250 and Q235B 2.7*1250.
In addition, in the present embodiment, when the resolution server parses to source material list, the wherein source material
Single is text document, and certain a line content in this article this document is Q235B 2.5*1250=1500, and prior art can only parse
Go out material:Q235B, specification 2.5*1250=1500,1500 can not be parsed and represent price, and the present invention can parse material
Matter:Q235B, specification:2.5*1250, price:1500.Therefore, parsing accuracy rate of the invention is higher.
In addition, the resolution rules of prior art are only supported using specification as main node, and the resolution rules of the present invention are supported
Using specification as main node, thickness is time main node.That is, when specification is space-time, resolution rules of the invention are roots
Specification is assembled according to thickness * width * length.
In addition, it is noted that the present invention performs pre-parsed operation in step S142, the operation is only responsible for original money
Source is singly read out, and is not included performing completion, cleaning (or duplicate removal), is split.
Step S143:Resolution server is formatted operation to the pending data structure.
It is shown in Figure 4, in step S143, further comprise:
Step S1431:The item of information that resolution server is treated in handle data structures body carries out completion operation.
In this step, completion is carried out to the item of information lacked by resolution rules, mended first from colleague, if colleague has word
Section, it can match and lack a certain item of information of field and enumerate, then completion is that the item of information is enumerated;Secondly, colleague is unable to completion,
Completion then is carried out from the comment line on the table area of excel documents, its completion logical sum colleague's completion logic is similar;Most
Afterwards, if still can not completion, from lastrow inherit.
Another resolution rules that completion is carried out to the item of information that lacks are:If specification is empty and thickness, width, length
It is not sky, using thickness, width, length, to be assembled into specification.
In addition, in the present embodiment, completion operation refers to completion category, material, specification, steel mill, warehouse.
Step S1432:The duplicate message item that resolution server is treated in handle data structures body carries out cleaning operation.
There are the resolution rules of 4 kinds of duplicate removals (or to clean).One of which is according to category, material, specification, steel mill four
Latitude duplicate removal, if aforementioned four field is equal in source material set, duplicate data is considered as, only retains one.
Step S1433:The item of information that resolution server is treated in handle data structures body carries out fractured operation.
, can also be real in addition to implementation specification is split by the setting of the allocation list in the database of resolution server
Existing category is split, steel mill splits, warehouse is split, material is split.In addition, the order and combination that split can also pass through analysis service
The setting of allocation list in the database of device is realized.
Step S144:The data file of resolution server outputting standard form.
Step S150:Resolution server sends an analysis result to the called side, and updates the second server
In corresponding task requests to be resolved mark.
In this step, complete parsing when resolution server to operate, and analysis result sent to the called side,
Meanwhile resolution server can update the mark of corresponding task requests to be resolved in the second server, wait to solve to identify this
Analysis task requests have completed parsing operation.In addition, when resolution server can not notify the called side, the resolution server
A notice can be recorded to be identified in second server, and the state timing identified according to the notice is connected with second server,
To send analysis result again to the called side.
In addition, data file is sent to file server set in advance, this document by resolution server in this step
Server can be same server with the file server for being previously used for storing source material list, or difference service
Device.When called side needs, data file can be sent to file server by resolution server with the document of excel forms,
Used in order to the called side of (outside).
In addition, another embodiment of the present invention also provides a kind of data analyzing system of steel trade industry stock resource, can join
As shown in Figure 5.
The system includes:One resource list analysis request module 510, the resource list analysis request module 510 are used to make
A resource list analysis request is sent to first server with side;One task requests generation module 520 to be resolved, described to be resolved
Business request generation module 520 is connected with the resource list analysis request module 510, the task requests generation module to be resolved
520 are used to make first server generate task requests to be resolved by asynchronous system, and the task requests to be resolved are stored
To second server;One source material list acquisition module 530, the source material list acquisition module 530 with described to be resolved
Business request generation module 520 is connected, and the source material list acquisition module 530 is used to make resolution server timing from described second
Server transfers the task requests to be resolved, and the original corresponding with the task requests to be resolved is obtained from file server
Beginning resource list;One Reference data file generation module 540, the Reference data file generation module 540 and the source material
Single acquisition module 530 is connected, and the Reference data file generation module 540 is used to make resolution server pass through the parsing built in one
Rule module resolves to the source material list data file of reference format, and the data file is stored to the text
Part server;One analysis result generation module 550, the analysis result generation module 550 generate with the Reference data file
Module 540 is connected, and the analysis result generation module 550 is used to make resolution server send an analysis result to the calling
Side, and update the mark of corresponding task requests to be resolved in the second server.
In another embodiment, what the task requests generation module 520 to be resolved was additionally operable to make to be generated waits to solve
Analysis task requests include unique task identification number, and the task identification number is used to distinguish different task requests to be resolved,
After sending analysis result to called side in resolution server, called side identifies the task mark in the analysis result
Knowledge number, and the address information in the analysis result is called and the task identification phase from corresponding file server
Corresponding data file.
In addition, gone back in the task requests to be resolved that the task requests generation module 520 to be resolved is additionally operable to make to be generated
Identified including a source-information, the source-information is identified for distinguishing different called sides.
Referring to Fig. 6, in another embodiment, the Reference data file generation module 540 further comprises:One is former
Beginning resource list pretreatment unit 541, the source material list pretreatment unit 541 are used to make resolution server to the original money
Source singly carries out pretreatment operation;One pending data structure acquiring unit 542, the pending data structure acquiring unit
542 are connected with the source material list pretreatment unit 541, and the pending data structure acquiring unit 542 is used for pre-
After processing operation, resolution server is set to carry out data extraction operation to the source material list, to obtain the pending of standard
Data structure;One data structure format unit 543, the data structure format unit 543 and the pending data
Structure acquiring unit 542 is connected, and the data structure format unit 543 is used to make resolution server to the pending number
Operation is formatted according to structure;One data file output unit 544, the data file output unit 544 and the number
It is connected according to structure format unit 543, the data file output unit 544 is used to make resolution server outputting standard form
Data file.
Wherein, referring to shown in Fig. 7 and Fig. 8, the source material list pretreatment unit 541 further comprises:One word texts
Shelves judgment sub-unit 5411, the word document judgment sub-unit 5411 are used to make resolution server by loading on the pre- of internal memory
If resolution rules carry out judging whether source material list is word document;One word document parses subelement 5412, the word
Document parsing subelement 5412 is connected with the word document judgment sub-unit 5411, and the word document parses subelement 5412
For when it is word document to judge source material list, loading word analysis programs, the word document being converted into text
Document, and call the pending data structure acquiring unit 542.
The pending data structure acquiring unit 542 further comprises:One excel documents judgment sub-unit 5421,
The excel documents judgment sub-unit 5421 is used to make resolution server by loading on the default resolution rules of internal memory be sentenced
Whether disconnected source material list is excel documents;One excel documents parse subelement 5422, and the excel documents parse subelement
5422 are connected with the excel documents judgment sub-unit 5421, and the excel documents parsing subelement 5422, which is used to work as, to be judged
When source material list is excel documents, load excel analysis programs, the excel documents read by POI modes, and by its
Pending data structure is converted into, and calls the data structure format unit 543;One text document judgment sub-unit
5423, the text document judgment sub-unit 5423 is connected with the excel documents judgment sub-unit 5421, the text document
Judgment sub-unit 5423 is used to make resolution server carry out judging source material list by loading on the default resolution rules of internal memory
Whether it is text document;One text document parses subelement 5424, the text document parsing subelement 5424 and the text
Document judgment sub-unit 5423 is connected, and the text document parsing subelement 5424, which is used to work as, judges that source material list is text
During document, loading text analysis program, the text document is read line by line, and be converted into by regular expression and exhaustive dictionary
Pending data structure, and call the data structure format unit 543;One parsing unexpected message transmission sub-unit
5425, the parsing unexpected message transmission sub-unit 5425 is connected with the text document judgment sub-unit 5423, the parsing
Unexpected message transmission sub-unit 5425 is used for when it is not text document to judge source material list, resolution server is sent one
Unexpected message is parsed to second server.
Shown in Figure 9, the Reference data file generation module 540 further comprises:One item of information completion unit
5431, the item of information that described information item completion unit 5431 is used to make resolution server treat in handle data structures body is mended
Full operation;One item of information cleaning unit 5432, described information item cleaning unit 5432 and the phase of described information item completion unit 5431
Even, described information item cleaning unit 5432 is used to make resolution server treat the duplicate message item progress in handle data structures body
Cleaning operation;One item of information split cells 5433, described information item split cells 5433 and described information item cleaning unit 5432
It is connected, the item of information that described information item split cells 5433 is used to make resolution server treat in handle data structures body is torn open
Divide operation.
The data analysis method of the steel trade industry stock resource of the embodiment of the present invention is obtained by using unified interface
Resource list to be resolved, and original resource list parsing is changed into reference format using the resolution rules module built in one
Data file, so that multiple called sides use, so as to realize parsing separation so that the resolution rules of each called side are each other not
It is impacted, and support the parsing of resource list and the processing of super large file of multiple format.And methods described has dimension in real time
The features such as shield, accuracy rate are high, efficiency is good and expansible strong.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
Member, under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications also should be regarded as
Protection scope of the present invention.
Claims (14)
1. a kind of data analysis method of steel trade industry stock resource, it is characterised in that comprise the following steps:
(1) called side sends a resource list analysis request to first server;
(2) first server generates task requests to be resolved by asynchronous system, and by the task requests to be resolved store to
Second server;
(3) resolution server timing transfers the task requests to be resolved from the second server, and is obtained from file server
Obtain the source material list corresponding with the task requests to be resolved;
(4) resolution server resolves to the source material list by the resolution rules module built in one data of reference format
Document, and the data file is stored to the file server;
(5) resolution server sends an analysis result to the called side, and updates and treated accordingly in the second server
Parse the mark of task requests.
2. data analysis method according to claim 1, it is characterised in that in step (2), generated to be resolved
Business request includes unique task identification number, and the task identification number is used to distinguish different task requests to be resolved, so as to
After analysis result to called side is sent in resolution server, called side identifies the task identification in the analysis result
Number, and the address information in the analysis result is called with the task identification number relatively from corresponding file server
The data file answered.
3. data analysis method according to claim 1, it is characterised in that in step (2), generated to be resolved
Also include a source-information in business request to identify, the source-information is identified for distinguishing different called sides.
4. data analysis method according to claim 1, it is characterised in that in step (4), further comprise:
(41) resolution server carries out pretreatment operation to the source material list;
(42) after pretreatment operation, resolution server carries out data extraction operation to the source material list, to be marked
Accurate pending data structure;
(43) resolution server is formatted operation to the pending data structure;
(44) data file of resolution server outputting standard form.
5. data analysis method according to claim 4, it is characterised in that further comprise in step (41):
(411) resolution server carries out judging whether source material list is word texts by loading on the default resolution rules of internal memory
Shelves, if so, step (412) is then performed, if it is not, then performing step (42);
(412) word analysis programs are loaded, the word document are converted into text document, and be back to step (42).
6. data analysis method according to claim 4, it is characterised in that further comprise in step (42):
(421) resolution server carries out judging whether source material list is excel by loading on the default resolution rules of internal memory
Document;If so, step (422) is then performed, if it is not, then directly performing step (423);
(422) excel analysis programs are loaded, the excel documents are read by POI modes, and be translated into pending number
According to structure, and it is back to step (43);
(423) resolution server carries out judging whether source material list is text text by loading on the default resolution rules of internal memory
Shelves, if so, step (424) is then performed, if it is not, then resolution server sends a parsing unexpected message to second server;
(424) loading text analysis program, the text document is read line by line, and converted by regular expression and exhaustive dictionary
For pending data structure, and it is back to step (43).
7. data analysis method according to claim 4, it is characterised in that in step (43), further comprise:
(431) item of information that resolution server is treated in handle data structures body carries out completion operation;
(432) the duplicate message item that resolution server is treated in handle data structures body carries out cleaning operation;
(433) item of information that resolution server is treated in handle data structures body carries out fractured operation.
A kind of 8. data analyzing system of steel trade industry stock resource, it is characterised in that including:
One resource list analysis request module, the resource list analysis request module are used for user and send a resource list analysis request
To first server;
One task requests generation module to be resolved, the task requests generation module to be resolved and the resource list analysis request mould
Block is connected, and the task requests generation module to be resolved please for making first server generate task to be resolved by asynchronous system
Ask, and the task requests to be resolved are stored to second server;
One source material list acquisition module, the source material list acquisition module and the task requests generation module phase to be resolved
Even, the source material list acquisition module is used to make resolution server timing transfer described to be resolved from the second server
Business request, and the source material list corresponding with the task requests to be resolved is obtained from file server;
One Reference data file generation module, the Reference data file generation module and the source material list acquisition module phase
Even, the Reference data file generation module is used to make resolution server will be described original by the resolution rules module built in one
Resource list resolves to the data file of reference format, and the data file is stored to the file server;
One analysis result generation module, the analysis result generation module are connected with the Reference data file generation module, institute
State analysis result generation module to be used to make resolution server send an analysis result to the called side, and update described second
The mark of corresponding task requests to be resolved in server.
9. the data analyzing system of steel trade industry stock resource according to claim 8, it is characterised in that described to be resolved
The task requests to be resolved that task requests generation module is additionally operable to make to be generated include unique task identification number, the task
Identification number be used for distinguish different task requests to be resolved, in order to resolution server send analysis result to called side it
Afterwards, called side identifies the task identification number in the analysis result, and the address information slave phase in the analysis result
The data file corresponding with the task identification number is called in the file server answered.
10. the data analyzing system of steel trade industry stock resource according to claim 8, it is characterised in that described to wait to solve
Also include a source-information in the task requests to be resolved that analysis task requests generation module is additionally operable to make to be generated to identify, it is described next
Source information is identified for distinguishing different called sides.
11. the data analyzing system of steel trade industry stock resource according to claim 8, it is characterised in that the standard
Data file generation module further comprises:
One source material list pretreatment unit, the source material list pretreatment unit are used to make resolution server to described original
Resource list carries out pretreatment operation;
One pending data structure acquiring unit, the pending data structure acquiring unit and the source material list are pre-
Processing unit is connected, and the pending data structure acquiring unit is used for after pretreatment operation, makes resolution server pair
The source material list carries out data extraction operation, to obtain the pending data structure of standard;
One data structure format unit, the data structure format unit and the pending data structure acquiring unit
It is connected, the data structure format unit is used to make resolution server be formatted behaviour to the pending data structure
Make;
One data file output unit, the data file output unit is connected with the data structure format unit, described
Data file output unit is used for the data file for making resolution server outputting standard form.
12. the data analyzing system of steel trade industry stock resource according to claim 11, it is characterised in that described original
Resource list pretreatment unit further comprises:
One word document judgment sub-unit, the word document judgment sub-unit are used to make resolution server by loading on internal memory
Default resolution rules carry out judging whether source material list is word document;
One word document parses subelement, and the word document parsing subelement is connected with the word document judgment sub-unit,
The word document parsing subelement is used for when it is word document to judge source material list, loads word analysis programs, will
The word document is converted into text document, and calls the pending data structure acquiring unit.
13. the data analyzing system of steel trade industry stock resource according to claim 11, it is characterised in that described to wait to locate
Reason data structure acquiring unit further comprises:
One excel document judgment sub-units, the excel documents judgment sub-unit are used to make resolution server pass through in loading on
The default resolution rules deposited carry out judging whether source material list is excel documents;
One excel documents parse subelement, the excel documents parsing subelement and the excel documents judgment sub-unit phase
Even, the excel documents parsing subelement is used for when it is excel documents to judge source material list, loading excel parsing journeys
Sequence, the excel documents are read by POI modes, and be translated into pending data structure, and call the data knot
Structure body format unit;
One text document judgment sub-unit, the text document judgment sub-unit are connected with the excel documents judgment sub-unit,
The text document judgment sub-unit is used to make resolution server carry out judging original by loading on the default resolution rules of internal memory
Whether beginning resource list is text document;
One text document parses subelement, and the text document parsing subelement is connected with the text document judgment sub-unit,
The text document parsing subelement is used for when it is text document to judge source material list, loading text analysis program, by
Row reads the text document, and is converted into pending data structure by regular expression and exhaustive dictionary, and calls institute
State data structure format unit;
One parsing unexpected message transmission sub-unit, the parsing unexpected message transmission sub-unit judge that son is single with the text document
Member is connected, and the parsing unexpected message transmission sub-unit is used for when it is not text document to judge source material list, makes parsing
Server sends a parsing unexpected message to second server.
14. the data analyzing system of steel trade industry stock resource according to claim 11, it is characterised in that the standard
Data file generation module further comprises:
One item of information completion unit, described information item completion unit are used to make resolution server treat in handle data structures body
Item of information carries out completion operation;
One item of information cleaning unit, described information item cleaning unit are connected with described information item completion unit, described information Xiang Qing
Unit is washed to be used to make resolution server treat the duplicate message item progress cleaning operation in handle data structures body;
One item of information split cells, described information item split cells are connected with described information item cleaning unit, and described information item is torn open
Subdivision is used to make resolution server treat the item of information progress fractured operation in handle data structures body.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710722845.5A CN107562701A (en) | 2017-08-22 | 2017-08-22 | A kind of data analysis method and its system of steel trade industry stock resource |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710722845.5A CN107562701A (en) | 2017-08-22 | 2017-08-22 | A kind of data analysis method and its system of steel trade industry stock resource |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107562701A true CN107562701A (en) | 2018-01-09 |
Family
ID=60976626
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710722845.5A Pending CN107562701A (en) | 2017-08-22 | 2017-08-22 | A kind of data analysis method and its system of steel trade industry stock resource |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107562701A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119472A (en) * | 2019-05-22 | 2019-08-13 | 欧冶云商股份有限公司 | Steel product search method and system applied to the network platform |
CN112800049A (en) * | 2021-04-06 | 2021-05-14 | 航天神舟智慧***技术有限公司 | EXCEL data source cleaning method and system based on big data, electronic device and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572898A (en) * | 2014-12-22 | 2015-04-29 | 上海钢富电子商务有限公司 | Data analysis method and data analysis system for steel trade industry spot commodity resource |
CN104679819A (en) * | 2014-12-22 | 2015-06-03 | 上海钢富电子商务有限公司 | Data analysis method and system of spot resources for steel trading industry |
-
2017
- 2017-08-22 CN CN201710722845.5A patent/CN107562701A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104572898A (en) * | 2014-12-22 | 2015-04-29 | 上海钢富电子商务有限公司 | Data analysis method and data analysis system for steel trade industry spot commodity resource |
CN104679819A (en) * | 2014-12-22 | 2015-06-03 | 上海钢富电子商务有限公司 | Data analysis method and system of spot resources for steel trading industry |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110119472A (en) * | 2019-05-22 | 2019-08-13 | 欧冶云商股份有限公司 | Steel product search method and system applied to the network platform |
CN112800049A (en) * | 2021-04-06 | 2021-05-14 | 航天神舟智慧***技术有限公司 | EXCEL data source cleaning method and system based on big data, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10169337B2 (en) | Converting data into natural language form | |
US8234248B2 (en) | Tracking changes to a business object | |
US8209318B2 (en) | Product searching system and method using search logic according to each category | |
US11010360B2 (en) | Extending tags for information resources | |
KR101877828B1 (en) | User interface integrated platform system based on artificial intelligence | |
CN113282854A (en) | Data request response method and device, electronic equipment and storage medium | |
CN107562701A (en) | A kind of data analysis method and its system of steel trade industry stock resource | |
CN104298603B (en) | A kind of inspection method of the correctness of application system version structure | |
JP2012190063A (en) | Data generation device and data generation program | |
CN113918460A (en) | Page testing method, device, equipment and medium | |
CN101520778A (en) | Apparatus and method for determing parts-of-speech in chinese | |
CN101377772B (en) | Method and system for globalizing support operations | |
JP5576570B2 (en) | Method, program and system for generating a workflow from business specifications | |
CN106681852A (en) | Method and device for adjusting browser compatibility | |
JP6634938B2 (en) | Analysis support method, analysis support program, and analysis support device | |
JP2011175486A (en) | Apparatus, program and method for supporting check of name collecting | |
US20110145656A1 (en) | Analyzing A Distributed Computer System | |
JP2010191851A (en) | Article feature word extraction device, article feature word extraction method and program | |
JP6677158B2 (en) | Document data processing apparatus, document data processing method, and document data processing program | |
JP2007241916A (en) | Program analysis method, program and program analyzer | |
CN107608837A (en) | Method, device, readable medium and storage controller for positioning fault environment equipment | |
CN111177501B (en) | Label processing method, device and system | |
JP5449023B2 (en) | Information processing apparatus, information processing method, and program | |
Miled et al. | A wrapper induction application with knowledge base support: A use case for initiation and maintenance of wrappers | |
US20180293231A1 (en) | Linguistic intelligence using language validator |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180109 |
|
RJ01 | Rejection of invention patent application after publication |