CN109785099A - A kind of method and system that service data information is handled automatically - Google Patents
A kind of method and system that service data information is handled automatically Download PDFInfo
- Publication number
- CN109785099A CN109785099A CN201811612300.XA CN201811612300A CN109785099A CN 109785099 A CN109785099 A CN 109785099A CN 201811612300 A CN201811612300 A CN 201811612300A CN 109785099 A CN109785099 A CN 109785099A
- Authority
- CN
- China
- Prior art keywords
- file
- message
- data
- brand
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of method and system handled automatically service data information, it include: to carry out data cleansing using original service data of the distributed computing framework mapreduce to acquisition, to obtain item information data file and brand message data file;Information conversion is carried out to the field information in the brand message data file according to preset transformation rule, to obtain the brand message data file by information conversion;The item information data file and the brand message data file by information conversion are associated, and data pick-up is carried out based on brand message according to demand, to obtain the second Item Information file;The keyword message for obtaining each article increases in the second Item Information file, and carries out part-of-speech tagging to the keyword message;The second Item Information file is filtered according to preset keyword message filtering rule, and is stored the Item Information file Jing Guo filtration treatment as integer message file into index database.
Description
Technical field
The present invention relates to big data technical fields, and carry out automatically to service data information more particularly, to a kind of
The method and system of processing.
Background technique
It is provided according to the State Tax Administration, all there is the article of every one kind corresponding tax revenue trade classification to encode, mainly
For the tax rate of differentiator product category of employment and every a kind of article, and in our daily lives, it is difficult for these tax rates
It gets, or because operation error is easy to fill out mistake, to realize that intelligence is made out an invoice, then need one kind can be according to object
The information of product carrys out the information such as its taxonomy of goods of intelligent recognition coding, in the method for rating.
Summary of the invention
The present invention proposes a kind of method and system handled automatically service data information, how automatic right to solve
The problem of service data information is handled.
To solve the above-mentioned problems, according to an aspect of the invention, there is provided it is a kind of automatically to service data information into
The method of row processing, which is characterized in that the described method includes:
Data cleansing is carried out using original service data of the distributed computing framework mapreduce to acquisition, to obtain object
Product information data file and brand message data file;
Information conversion is carried out to the field information in the brand message data file according to preset transformation rule, to obtain
Take the brand message data file converted by information;
The item information data file and the brand message data file by information conversion are associated, to obtain
First Item Information file, and data pick-up is carried out based on brand message according to demand, to obtain the second Item Information file;
The keyword message is increased to the second Item Information file by the keyword message for obtaining each article
In, and part-of-speech tagging is carried out to the keyword message;
According to preset keyword message filtering rule, by undesirable key in the second Item Information file
Word information is filtered, and is stored the Item Information file Jing Guo filtration treatment as integer message file to index database
In.
Preferably, wherein the field information of the item information data file includes: article code, Item Title, article
Quantity and Amount in Total;The field information of the brand message data file includes: article code, Chinese brand and English product
Board.
Preferably, wherein it is described according to preset transformation rule to the field information in the brand message data file into
Row information conversion, to obtain the brand message data file by information conversion, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message there are corresponding English product
Board information, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message there are corresponding Chinese product
Board information, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
Preferably, wherein described carry out data pick-up based on brand message according to demand, to obtain the second Item Information text
Part, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information
File is stored;
The business record data extracted not successfully are manually extracted.
Preferably, wherein described manually extract the business record data extracted not successfully, comprising:
Judge in the business record data that extract not successfully every business record data whether contain brand message or not at
The data volume for the business record data that function extracts and the total amount of data of the business record data in the first Item Information file
Percentage whether be more than or equal to preset percentage threshold value, if so, the business record number for manually being extracted, and being extracted
According to storage into the second Item Information file;Conversely, then without manually extracting.
Preferably, wherein the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
Preferably, wherein the Item Information file using Jing Guo filtration treatment is arrived as integer message file storage
In index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and will be described
Json formatted file is imported into index database, enables to call elasticsearch to look into carry out information by external api
It askes.
According to another aspect of the present invention, a kind of system handled automatically service data information is provided,
It is characterized in that, the system comprises:
Data cleansing unit, for being carried out using original service data of the distributed computing framework mapreduce to acquisition
Data cleansing, to obtain item information data file and brand message data file;
Information conversion unit, for according to preset transformation rule to the field information in the brand message data file
Information conversion is carried out, to obtain the brand message data file by information conversion;
Data pick-up unit, for the brand message data of the item information data file and process information conversion are literary
Part is associated, and to obtain the first Item Information file, and data pick-up is carried out based on brand message according to demand, to obtain the
Two Item Information files;
Part-of-speech tagging unit increases to the keyword message described for obtaining the keyword message of each article
In second Item Information file, and part-of-speech tagging is carried out to the keyword message;
Data Integration unit is used for according to preset keyword message filtering rule, by the second Item Information file
In undesirable keyword message be filtered, and using the Item Information file Jing Guo filtration treatment as integer believe
File storage is ceased into index database.
Preferably, wherein the field information of the item information data file includes: article code, Item Title, article
Quantity and Amount in Total;The field information of the brand message data file includes: article code, Chinese brand and English product
Board.
Preferably, wherein the information conversion unit, according to preset transformation rule to the brand message data file
In field information carry out information conversion, with obtain by information conversion brand message data file, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message there are corresponding English product
Board information, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message there are corresponding Chinese product
Board information, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
Preferably, wherein the data pick-up unit, data pick-up is carried out based on brand message according to demand, to obtain the
Two Item Information files, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information
File is stored;
The business record data extracted not successfully are manually extracted.
Preferably, wherein described manually extract the business record data extracted not successfully, comprising:
Judge in the business record data that extract not successfully every business record data whether contain brand message or not at
The data volume for the business record data that function extracts and the total amount of data of the business record data in the first Item Information file
Percentage whether be more than or equal to preset percentage threshold value, if so, the business record number for manually being extracted, and being extracted
According to storage into the second Item Information file;Conversely, then without manually extracting.
Preferably, wherein the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
Preferably, wherein the Data Integration unit, using the Item Information file Jing Guo filtration treatment as integer
Message file is stored into index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and will be described
Json formatted file is imported into index database, enables to call elasticsearch to look into carry out information by external api
It askes.
The present invention provides a kind of method and system handled automatically service data information, comprising: utilizes distribution
Formula Computational frame mapreduce carries out data cleansing to the original service data of acquisition, with obtain item information data file and
Brand message data file;Information is carried out to the field information in the brand message data file according to preset transformation rule
Conversion, to obtain the brand message data file by information conversion;Turn by the item information data file and by information
The brand message data file changed is associated, and carries out data pick-up based on brand message according to demand, to obtain the second object
Product message file;The keyword message is increased to the second Item Information text by the keyword message for obtaining each article
In part, and part-of-speech tagging is carried out to the keyword message;According to preset keyword message filtering rule, by second object
Undesirable keyword message is filtered in product message file, and using the Item Information file Jing Guo filtration treatment as
Integer message file is stored into index database.Technical solution of the present invention utilizes distributed computing framework mapreduce base
Data cleansing is carried out in original service data of the rigorous cleaning rule to acquisition, guarantees that zero error is accomplished in data cleansing as far as possible,
Avoid the compatibility of program;By being supplemented brand message and manually being extracted, the accuracy of data ensure that;?
When carrying out part-of-speech tagging, is handled using the part-of-speech tagging of Zhang Huaping, ensure that the accuracy of participle;According to preset key
Word information filtering rule is filtered, and to obtain the storage of integer message file into index database, is realized accurately automatic
Service data information is handled, it is time saving and energy saving.
Detailed description of the invention
By reference to the following drawings, exemplary embodiments of the present invention can be more fully understood by:
Fig. 1 is the flow chart according to the automatic method 100 handled service data information of embodiment of the present invention;
And
Fig. 2 is to be shown according to the structure of the automatic system 200 handled service data information of embodiment of the present invention
It is intended to.
Specific embodiment
Exemplary embodiments of the present invention are introduced referring now to the drawings, however, the present invention can use many different shapes
Formula is implemented, and is not limited to the embodiment described herein, and to provide these embodiments be at large and fully disclose
The present invention, and the scope of the present invention is sufficiently conveyed to person of ordinary skill in the field.Show for what is be illustrated in the accompanying drawings
Term in example property embodiment is not limitation of the invention.In the accompanying drawings, identical cells/elements use identical attached
Icon note.
Unless otherwise indicated, term (including scientific and technical terminology) used herein has person of ordinary skill in the field
It is common to understand meaning.Further it will be understood that with the term that usually used dictionary limits, should be understood as and its
The context of related fields has consistent meaning, and is not construed as Utopian or too formal meaning.
Fig. 1 is the flow chart according to the automatic method 100 handled service data information of embodiment of the present invention.
As shown in Figure 1, the automatic method handled service data information that embodiments of the present invention provide, is counted using distribution
It calculates frame mapreduce and carries out data cleansing based on original service data of the rigorous cleaning rule to acquisition, guarantee that data are clear
It washes and accomplishes zero error as far as possible, avoid the compatibility of program;By being supplemented brand message and manually being extracted, guarantee
The accuracys of data;It when carrying out part-of-speech tagging, is handled using the part-of-speech tagging of Zhang Huaping, ensure that the accurate of participle
Property;It is filtered according to preset keyword message filtering rule, to obtain the storage of integer message file into index database,
It realizes and accurately service data information is handled automatically, it is time saving and energy saving.Embodiments of the present invention provide automatic right
The method that service data information is handled utilizes distributed computing framework since step 101 place, in step 101
Mapreduce carries out data cleansing to the original service data of acquisition, to obtain item information data file and brand message number
According to file.
Preferably, wherein the field information of the item information data file includes: article code, Item Title, article
Quantity and Amount in Total;The field information of the brand message data file includes: article code, Chinese brand and English product
Board.
In embodiments of the present invention, it is mutually tied in data training using machine learning algorithm popular at present
It closes.Decision tree is a kind of tree structure, wherein each internal node indicates the test on an attribute, each branch represents one
Test output, each leaf node represent a kind of classification, and rule is found in mass data, main to be come using mapreduce technology
Training data, using complete big data platform equipment, mapreduce is mainly a kind of programming model, for extensive
The concurrent operation of data set (being greater than 1TB).Concept " Map (mapping) " and " Reduce (reduction) ", are their main thoughts, all
It is to be borrowed in Functional Programming, there are also the characteristics borrowed in vector programming language.It greatly facilitates programming people
Member will not distributed parallel programming in the case where, the program of oneself is operated in distributed system.Current software realization
It is that Map (mapping) function is specified to specify concurrent Reduce for one group of key-value pair is mapped to one group of new key-value pair
(reduction) function, for guaranteeing that each of the key-value pair of all mappings shares identical key group.
After carrying out data cleansing using original service data of the distributed computing framework mapreduce to acquisition, acquisition
Item information data file and brand message data file are respectively as follows: coding-r-00000 and cate-r-00000.Wherein, it compiles
Field in code-r-00000 file includes: article code, Item Title, number of articles and Amount in Total;cate-r-00000
Field in file includes: article code, Chinese brand and English brand.
Preferably, in step 102 according to preset transformation rule to the field information in the brand message data file
Information conversion is carried out, to obtain the brand message data file by information conversion.
Preferably, wherein it is described according to preset transformation rule to the field information in the brand message data file into
Row information conversion, to obtain the brand message data file by information conversion, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message there are corresponding English product
Board information, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message there are corresponding Chinese product
Board information, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
Preferably, in step 103 that the brand message data of the item information data file and process information conversion are literary
Part is associated, and to obtain the first Item Information file, and data pick-up is carried out based on brand message according to demand, to obtain the
Two Item Information files.
Preferably, wherein described carry out data pick-up based on brand message according to demand, to obtain the second Item Information text
Part, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information
File is stored;
The business record data extracted not successfully are manually extracted.
Preferably, wherein described manually extract the business record data extracted not successfully, comprising:
Judge in the business record data that extract not successfully every business record data whether contain brand message or not at
The data volume for the business record data that function extracts and the total amount of data of the business record data in the first Item Information file
Percentage whether be more than or equal to preset percentage threshold value, if so, the business record number for manually being extracted, and being extracted
According to storage into the second Item Information file;Conversely, then without manually extracting.
In embodiments of the present invention, extracting brand needs us to remove one brand dictionary of maintenance, which includes Chinese
Brand and English brand, and can be associated with.Maintenance program work mainly carries out dictionary table to repeat filtering and abnormality detection
Filtering.Turn firstly, it is necessary to carry out information to the field information in the brand message data file according to preset transformation rule
It changes, to obtain the brand message data file by information conversion,;Then, by the item information data file and by believing
The brand message data file of breath conversion, which is associated, obtains the first Item Information file brand.dic, the first Item Information text
The field information of part brand.dic includes: article code, Chinese brand, English brand, Item Title, quantity and Amount in Total
Field.
It can be extracted according to brand according to mapreduce task, generate the second Item Information file after the completion of executing
Multiple coding-r-00000 and failed extracted file still-r-00000.Wherein, the field of multiple coding-r-00000 files
It include: that article code, Chinese brand, English brand, Item Title, quantity and Amount in Total field, representative are successfully extracted
The a plurality of record of brand out.The field of still-r-00000 file includes: article code, Item Title, quantity and Amount in Total
Field represents a plurality of record for extracting brand not successfully.
Artificial extraction supplement dictionary filtering is carried out to the failed data for extracting brand, wherein if really commodity do not have
When the quantity of the record of brand or still-r-00000 file is very small compared to total data volume, even not successfully
The business record number that every business record data are extracted without brand message or not successfully really in the business record data of extraction
According to data volume and the first Item Information file in the percentage of total amount of data of business record data be less than default hundred
Divide than threshold value, then without manually extracting.
Preferably, the keyword message is increased to described by the keyword message that each article is obtained in step 104
In two Item Information files, and part-of-speech tagging is carried out to the keyword message.
Preferably, wherein the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
It preferably, will be in the second Item Information file in step 105 according to preset keyword message filtering rule
Undesirable keyword message is filtered, and using the Item Information file Jing Guo filtration treatment as integer information
File is stored into index database.
Preferably, wherein the Item Information file using Jing Guo filtration treatment is arrived as integer message file storage
In index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and will be described
Json formatted file is imported into index database, enables to call elasticsearch to look into carry out information by external api
It askes.
In embodiments of the present invention, part of speech mark is done using the part-of-speech tagging tool of Zhang Huaping for keyword message
Note, undesirable keyword message is filtered, and using the Item Information file Jing Guo filtration treatment as complete object
Integer message file is organized into the storage of json file into index database by product message file by way of code, because
The importing of this json format is supported in the importing of index database, enables to call elasticsearch to look by external api
Ask the result needed.Being designed into ETL process above all uses mapreduce frame to handle.
Fig. 2 is to be shown according to the structure of the automatic system 200 handled service data information of embodiment of the present invention
It is intended to.As shown in Fig. 2, the automatic system 200 handled service data information that embodiments of the present invention provide, packet
It includes: data cleansing unit 201, information conversion unit 202, data pick-up unit 203, part-of-speech tagging unit 204 and Data Integration
Unit 205.Preferably, the data cleansing unit 201, for the original using distributed computing framework mapreduce to acquisition
Beginning business datum carries out data cleansing, to obtain item information data file and brand message data file.
Preferably, wherein the field information of the item information data file includes: article code, Item Title, article
Quantity and Amount in Total;The field information of the brand message data file includes: article code, Chinese brand and English product
Board.
Preferably, the information conversion unit 202, for literary to the brand message data according to preset transformation rule
Field information in part carries out information conversion, to obtain the brand message data file by information conversion.
Preferably, wherein the information conversion unit 202, literary to the brand message data according to preset transformation rule
Field information in part carries out information conversion, to obtain the brand message data file by information conversion, comprising: for only
The business datum of Chinese brand message, judges whether the Chinese brand message has corresponding English brand message, if so, then straight
It connects and the corresponding English brand message of the Chinese brand message is added in English brand field;For only English brand letter
The business datum of breath, judges whether the English brand message has corresponding Chinese brand message, if so, then directly by the English
The corresponding Chinese brand message of literary brand message is added in Chinese brand field.
Preferably, the data pick-up unit 203, for by the item information data file and by information conversion
Brand message data file is associated, and to obtain the first Item Information file, and is counted according to demand based on brand message
According to extraction, to obtain the second Item Information file.
Preferably, wherein the data pick-up unit 203, carries out data pick-up based on brand message according to demand, to obtain
Take the second Item Information file, comprising: extracted according to demand based on brand message, the business record data that success is extracted
It is stored as the second Item Information file;The business record data extracted not successfully are manually extracted.
Preferably, wherein described manually extract the business record data extracted not successfully, comprising: judgement is failed
The business record number that whether every business record data extract containing brand message or not successfully in the business record data of extraction
According to data volume and the first Item Information file in the percentage of total amount of data of business record data whether be greater than
In preset percentage threshold value, if so, manually being extracted, and the business record data extracted are stored to second object
In product message file;Conversely, then without manually extracting.
Preferably, the part-of-speech tagging unit 204, for obtaining the keyword message of each article, by the keyword
Information increases in the second Item Information file, and carries out part-of-speech tagging to the keyword message.
Preferably, wherein the preset keyword message filtering rule includes: by the process comprising presetting sensitive words
The keyword message of part-of-speech tagging is filtered.
Preferably, the Data Integration unit 205, for according to preset keyword message filtering rule, by described the
Undesirable keyword message is filtered in two Item Information files, and by the Item Information file Jing Guo filtration treatment
As the storage of integer message file into index database.
Preferably, wherein the Data Integration unit, using the Item Information file Jing Guo filtration treatment as integer
Message file is stored into index database, comprising: the Item Information file Jing Guo filtration treatment is carried out form arrangement, to obtain
Json formatted file, and the json formatted file is imported into index database, it enables to call by external api
Elasticsearch carries out information inquiry.
The automatic system 200 that service data information is handled and another reality of the invention of the embodiment of the present invention
The automatic method 100 handled service data information for applying example is corresponding, and details are not described herein.
The present invention is described by reference to a small amount of embodiment.However, it is known in those skilled in the art, as
Defined by subsidiary Patent right requirement, in addition to the present invention other embodiments disclosed above equally fall in it is of the invention
In range.
Normally, all terms used in the claims are all solved according to them in the common meaning of technical field
It releases, unless in addition clearly being defined wherein.All references " one/described/be somebody's turn to do [device, component etc.] " are all opened ground
At least one example being construed in described device, component etc., unless otherwise expressly specified.Any method disclosed herein
Step need not all be run with disclosed accurate sequence, unless explicitly stated otherwise.
Claims (14)
1. a kind of method handled automatically service data information, which is characterized in that the described method includes:
Data cleansing is carried out using original service data of the distributed computing framework mapreduce to acquisition, to obtain article letter
Cease data file and brand message data file;
Information conversion is carried out to the field information in the brand message data file according to preset transformation rule, to obtain warp
Cross the brand message data file of information conversion;
The item information data file and the brand message data file by information conversion are associated, to obtain first
Item Information file, and data pick-up is carried out based on brand message according to demand, to obtain the second Item Information file;
The keyword message for obtaining each article increases to the keyword message in the second Item Information file, and
Part-of-speech tagging is carried out to the keyword message;
According to preset keyword message filtering rule, keyword undesirable in the second Item Information file is believed
Breath is filtered, and is stored the Item Information file Jing Guo filtration treatment as integer message file into index database.
2. the method according to claim 1, wherein the field information of the item information data file includes:
Article code, Item Title, number of articles and Amount in Total;The field information of the brand message data file includes: article
Coding, Chinese brand and English brand.
3. according to the method described in claim 2, it is characterized in that, it is described according to preset transformation rule to the brand message
Field information in data file carries out information conversion, to obtain the brand message data file by information conversion, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message has corresponding English brand letter
Breath, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message has corresponding Chinese brand letter
Breath, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
4. the method according to claim 1, wherein described carry out data pumping based on brand message according to demand
It takes, to obtain the second Item Information file, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information file
It is stored;
The business record data extracted not successfully are manually extracted.
5. according to the method described in claim 4, it is characterized in that, described carry out people to the business record data extracted not successfully
Work extracts, comprising:
Judge whether every business record data contain brand message or failed pumping in the business record data extracted not successfully
The hundred of the total amount of data of business record data in the data volume of the business record data taken and the first Item Information file
Divide than whether being more than or equal to preset percentage threshold value, if so, manually being extracted, and the business record data extracted is deposited
It stores up in the second Item Information file;Conversely, then without manually extracting.
6. the method according to claim 1, wherein the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
7. the method according to claim 1, wherein the Item Information file using Jing Guo filtration treatment as
Integer message file is stored into index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and by the json
Formatted file is imported into index database, enables to call elasticsearch to carry out information inquiry by external api.
8. a kind of system handled automatically service data information, which is characterized in that the system comprises:
Data cleansing unit, for carrying out data using original service data of the distributed computing framework mapreduce to acquisition
Cleaning, to obtain item information data file and brand message data file;
Information conversion unit, for being carried out according to preset transformation rule to the field information in the brand message data file
Information conversion, to obtain the brand message data file by information conversion;
Data pick-up unit, for by the item information data file and by information conversion brand message data file into
Row association to obtain the first Item Information file, and carries out data pick-up based on brand message according to demand, to obtain the second object
Product message file;
The keyword message is increased to described second for obtaining the keyword message of each article by part-of-speech tagging unit
In Item Information file, and part-of-speech tagging is carried out to the keyword message;
Data Integration unit, for according to preset keyword message filtering rule, by the second Item Information file not
Satisfactory keyword message is filtered, and using the Item Information file Jing Guo filtration treatment as integer information text
Part is stored into index database.
9. system according to claim 8, which is characterized in that the field information of the item information data file includes:
Article code, Item Title, number of articles and Amount in Total;The field information of the brand message data file includes: article
Coding, Chinese brand and English brand.
10. system according to claim 9, which is characterized in that the information conversion unit, according to preset transformation rule
Information conversion is carried out to the field information in the brand message data file, to obtain the brand message number by information conversion
According to file, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message has corresponding English brand letter
Breath, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message has corresponding Chinese brand letter
Breath, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
11. system according to claim 8, which is characterized in that the data pick-up unit is believed based on brand according to demand
Breath carries out data pick-up, to obtain the second Item Information file, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information file
It is stored;
The business record data extracted not successfully are manually extracted.
12. system according to claim 11, which is characterized in that described to be carried out to the business record data extracted not successfully
It is artificial to extract, comprising:
Judge whether every business record data contain brand message or failed pumping in the business record data extracted not successfully
The hundred of the total amount of data of business record data in the data volume of the business record data taken and the first Item Information file
Divide than whether being more than or equal to preset percentage threshold value, if so, manually being extracted, and the business record data extracted is deposited
It stores up in the second Item Information file;Conversely, then without manually extracting.
13. system according to claim 8, which is characterized in that the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
14. system according to claim 8, which is characterized in that the Data Integration unit, by the object Jing Guo filtration treatment
Product message file is stored as integer message file into index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and by the json
Formatted file is imported into index database, enables to call elasticsearch to carry out information inquiry by external api.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811612300.XA CN109785099B (en) | 2018-12-27 | 2018-12-27 | Method and system for automatically processing service data information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811612300.XA CN109785099B (en) | 2018-12-27 | 2018-12-27 | Method and system for automatically processing service data information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109785099A true CN109785099A (en) | 2019-05-21 |
CN109785099B CN109785099B (en) | 2021-07-06 |
Family
ID=66497751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811612300.XA Active CN109785099B (en) | 2018-12-27 | 2018-12-27 | Method and system for automatically processing service data information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109785099B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111552730A (en) * | 2020-04-28 | 2020-08-18 | 杭州数梦工场科技有限公司 | Data distribution method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143862A1 (en) * | 2000-05-19 | 2002-10-03 | Atitania Ltd. | Method and apparatus for transferring information between a source and a destination on a network |
WO2007036932A2 (en) * | 2005-09-27 | 2007-04-05 | Zetapoint Ltd. | Data table management system and methods useful therefor |
CN101866331A (en) * | 2009-12-24 | 2010-10-20 | 北京信息科技大学 | Conversion method and device of XML (Extensible Markup Language) documents of different languages |
CN102880709A (en) * | 2012-09-28 | 2013-01-16 | 用友软件股份有限公司 | Data warehouse management system and data warehouse management method |
CN106649455A (en) * | 2016-09-24 | 2017-05-10 | 孙燕群 | Big data development standardized systematic classification and command set system |
CN108241677A (en) * | 2016-12-26 | 2018-07-03 | 航天信息股份有限公司 | A kind of method and system for the tax revenue sorting code number for obtaining commodity |
CN108415980A (en) * | 2018-02-09 | 2018-08-17 | 平安科技(深圳)有限公司 | Question and answer data processing method, electronic device and storage medium |
-
2018
- 2018-12-27 CN CN201811612300.XA patent/CN109785099B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020143862A1 (en) * | 2000-05-19 | 2002-10-03 | Atitania Ltd. | Method and apparatus for transferring information between a source and a destination on a network |
WO2007036932A2 (en) * | 2005-09-27 | 2007-04-05 | Zetapoint Ltd. | Data table management system and methods useful therefor |
CN101866331A (en) * | 2009-12-24 | 2010-10-20 | 北京信息科技大学 | Conversion method and device of XML (Extensible Markup Language) documents of different languages |
CN102880709A (en) * | 2012-09-28 | 2013-01-16 | 用友软件股份有限公司 | Data warehouse management system and data warehouse management method |
CN106649455A (en) * | 2016-09-24 | 2017-05-10 | 孙燕群 | Big data development standardized systematic classification and command set system |
CN108241677A (en) * | 2016-12-26 | 2018-07-03 | 航天信息股份有限公司 | A kind of method and system for the tax revenue sorting code number for obtaining commodity |
CN108415980A (en) * | 2018-02-09 | 2018-08-17 | 平安科技(深圳)有限公司 | Question and answer data processing method, electronic device and storage medium |
Non-Patent Citations (1)
Title |
---|
VISHAL SHUKLA: "《Elasticsearch集成Hadoop最佳实践》", 30 June 2017, 清华大学出版社 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111552730A (en) * | 2020-04-28 | 2020-08-18 | 杭州数梦工场科技有限公司 | Data distribution method and device, electronic equipment and storage medium |
CN111552730B (en) * | 2020-04-28 | 2024-01-26 | 杭州数梦工场科技有限公司 | Data distribution method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109785099B (en) | 2021-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110543374B (en) | Centralized data coordination using artificial intelligence mechanism | |
CN107885874A (en) | Data query method and apparatus, computer equipment and computer-readable recording medium | |
US20040158562A1 (en) | Data quality system | |
CN110990546B (en) | Intelligent question-answer corpus updating method and device | |
CN106296195A (en) | A kind of Risk Identification Method and device | |
CN109598517B (en) | Commodity clearance processing, object processing and category prediction method and device thereof | |
CN109582772A (en) | Contract information extracting method, device, computer equipment and storage medium | |
CN109522417A (en) | A kind of trading company's abstracting method of company name | |
CN107357785A (en) | Theme feature word abstracting method and system, feeling polarities determination methods and system | |
CN109492104A (en) | Training method, classification method, system, equipment and the medium of intent classifier model | |
CN109447273A (en) | Model training method, advertisement recommended method, relevant apparatus, equipment and medium | |
CN113722483A (en) | Topic classification method, device, equipment and storage medium | |
CN106095745A (en) | Transaction record extracting method based on log and system thereof | |
CN110990711A (en) | WeChat public number recommendation algorithm and system based on machine learning | |
CN110427604A (en) | Table integration method and device | |
CN109902157A (en) | A kind of training sample validation checking method and device | |
CN110019820A (en) | Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history | |
CN116245097A (en) | Method for training entity recognition model, entity recognition method and corresponding device | |
CN112035449A (en) | Data processing method and device, computer equipment and storage medium | |
CN108228787A (en) | According to the method and apparatus of multistage classification processing information | |
CN106997350A (en) | A kind of method and device of data processing | |
CN109785099A (en) | A kind of method and system that service data information is handled automatically | |
CN108874780A (en) | A kind of segmentation methods system | |
CN109271479A (en) | A kind of resume structuring processing method | |
CN109685103A (en) | A kind of text Multi-label learning method based on broad sense K mean algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: Room 3106, floor 31, building a, No. 2, South Zhongguancun Street, Haidian District, Beijing 100086 Applicant after: ELE-CLOUD INFORMATION TECHNOLOGY Co.,Ltd. Address before: 100195, Beijing, Haidian District apricot Road, No. 18 Applicant before: ELE-CLOUD INFORMATION TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |