CN109785099A - A kind of method and system that service data information is handled automatically - Google Patents

A kind of method and system that service data information is handled automatically Download PDF

Info

Publication number
CN109785099A
CN109785099A CN201811612300.XA CN201811612300A CN109785099A CN 109785099 A CN109785099 A CN 109785099A CN 201811612300 A CN201811612300 A CN 201811612300A CN 109785099 A CN109785099 A CN 109785099A
Authority
CN
China
Prior art keywords
file
message
data
brand
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811612300.XA
Other languages
Chinese (zh)
Other versions
CN109785099B (en
Inventor
陈懿
李泽然
张泽
李浩浩
尤培海
白光佩
苏瑞文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Elephant Hui Yun Information Technology Co Ltd
Original Assignee
Elephant Hui Yun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elephant Hui Yun Information Technology Co Ltd filed Critical Elephant Hui Yun Information Technology Co Ltd
Priority to CN201811612300.XA priority Critical patent/CN109785099B/en
Publication of CN109785099A publication Critical patent/CN109785099A/en
Application granted granted Critical
Publication of CN109785099B publication Critical patent/CN109785099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of method and system handled automatically service data information, it include: to carry out data cleansing using original service data of the distributed computing framework mapreduce to acquisition, to obtain item information data file and brand message data file;Information conversion is carried out to the field information in the brand message data file according to preset transformation rule, to obtain the brand message data file by information conversion;The item information data file and the brand message data file by information conversion are associated, and data pick-up is carried out based on brand message according to demand, to obtain the second Item Information file;The keyword message for obtaining each article increases in the second Item Information file, and carries out part-of-speech tagging to the keyword message;The second Item Information file is filtered according to preset keyword message filtering rule, and is stored the Item Information file Jing Guo filtration treatment as integer message file into index database.

Description

A kind of method and system that service data information is handled automatically
Technical field
The present invention relates to big data technical fields, and carry out automatically to service data information more particularly, to a kind of The method and system of processing.
Background technique
It is provided according to the State Tax Administration, all there is the article of every one kind corresponding tax revenue trade classification to encode, mainly For the tax rate of differentiator product category of employment and every a kind of article, and in our daily lives, it is difficult for these tax rates It gets, or because operation error is easy to fill out mistake, to realize that intelligence is made out an invoice, then need one kind can be according to object The information of product carrys out the information such as its taxonomy of goods of intelligent recognition coding, in the method for rating.
Summary of the invention
The present invention proposes a kind of method and system handled automatically service data information, how automatic right to solve The problem of service data information is handled.
To solve the above-mentioned problems, according to an aspect of the invention, there is provided it is a kind of automatically to service data information into The method of row processing, which is characterized in that the described method includes:
Data cleansing is carried out using original service data of the distributed computing framework mapreduce to acquisition, to obtain object Product information data file and brand message data file;
Information conversion is carried out to the field information in the brand message data file according to preset transformation rule, to obtain Take the brand message data file converted by information;
The item information data file and the brand message data file by information conversion are associated, to obtain First Item Information file, and data pick-up is carried out based on brand message according to demand, to obtain the second Item Information file;
The keyword message is increased to the second Item Information file by the keyword message for obtaining each article In, and part-of-speech tagging is carried out to the keyword message;
According to preset keyword message filtering rule, by undesirable key in the second Item Information file Word information is filtered, and is stored the Item Information file Jing Guo filtration treatment as integer message file to index database In.
Preferably, wherein the field information of the item information data file includes: article code, Item Title, article Quantity and Amount in Total;The field information of the brand message data file includes: article code, Chinese brand and English product Board.
Preferably, wherein it is described according to preset transformation rule to the field information in the brand message data file into Row information conversion, to obtain the brand message data file by information conversion, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message there are corresponding English product Board information, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message there are corresponding Chinese product Board information, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
Preferably, wherein described carry out data pick-up based on brand message according to demand, to obtain the second Item Information text Part, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information File is stored;
The business record data extracted not successfully are manually extracted.
Preferably, wherein described manually extract the business record data extracted not successfully, comprising:
Judge in the business record data that extract not successfully every business record data whether contain brand message or not at The data volume for the business record data that function extracts and the total amount of data of the business record data in the first Item Information file Percentage whether be more than or equal to preset percentage threshold value, if so, the business record number for manually being extracted, and being extracted According to storage into the second Item Information file;Conversely, then without manually extracting.
Preferably, wherein the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
Preferably, wherein the Item Information file using Jing Guo filtration treatment is arrived as integer message file storage In index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and will be described Json formatted file is imported into index database, enables to call elasticsearch to look into carry out information by external api It askes.
According to another aspect of the present invention, a kind of system handled automatically service data information is provided, It is characterized in that, the system comprises:
Data cleansing unit, for being carried out using original service data of the distributed computing framework mapreduce to acquisition Data cleansing, to obtain item information data file and brand message data file;
Information conversion unit, for according to preset transformation rule to the field information in the brand message data file Information conversion is carried out, to obtain the brand message data file by information conversion;
Data pick-up unit, for the brand message data of the item information data file and process information conversion are literary Part is associated, and to obtain the first Item Information file, and data pick-up is carried out based on brand message according to demand, to obtain the Two Item Information files;
Part-of-speech tagging unit increases to the keyword message described for obtaining the keyword message of each article In second Item Information file, and part-of-speech tagging is carried out to the keyword message;
Data Integration unit is used for according to preset keyword message filtering rule, by the second Item Information file In undesirable keyword message be filtered, and using the Item Information file Jing Guo filtration treatment as integer believe File storage is ceased into index database.
Preferably, wherein the field information of the item information data file includes: article code, Item Title, article Quantity and Amount in Total;The field information of the brand message data file includes: article code, Chinese brand and English product Board.
Preferably, wherein the information conversion unit, according to preset transformation rule to the brand message data file In field information carry out information conversion, with obtain by information conversion brand message data file, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message there are corresponding English product Board information, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message there are corresponding Chinese product Board information, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
Preferably, wherein the data pick-up unit, data pick-up is carried out based on brand message according to demand, to obtain the Two Item Information files, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information File is stored;
The business record data extracted not successfully are manually extracted.
Preferably, wherein described manually extract the business record data extracted not successfully, comprising:
Judge in the business record data that extract not successfully every business record data whether contain brand message or not at The data volume for the business record data that function extracts and the total amount of data of the business record data in the first Item Information file Percentage whether be more than or equal to preset percentage threshold value, if so, the business record number for manually being extracted, and being extracted According to storage into the second Item Information file;Conversely, then without manually extracting.
Preferably, wherein the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
Preferably, wherein the Data Integration unit, using the Item Information file Jing Guo filtration treatment as integer Message file is stored into index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and will be described Json formatted file is imported into index database, enables to call elasticsearch to look into carry out information by external api It askes.
The present invention provides a kind of method and system handled automatically service data information, comprising: utilizes distribution Formula Computational frame mapreduce carries out data cleansing to the original service data of acquisition, with obtain item information data file and Brand message data file;Information is carried out to the field information in the brand message data file according to preset transformation rule Conversion, to obtain the brand message data file by information conversion;Turn by the item information data file and by information The brand message data file changed is associated, and carries out data pick-up based on brand message according to demand, to obtain the second object Product message file;The keyword message is increased to the second Item Information text by the keyword message for obtaining each article In part, and part-of-speech tagging is carried out to the keyword message;According to preset keyword message filtering rule, by second object Undesirable keyword message is filtered in product message file, and using the Item Information file Jing Guo filtration treatment as Integer message file is stored into index database.Technical solution of the present invention utilizes distributed computing framework mapreduce base Data cleansing is carried out in original service data of the rigorous cleaning rule to acquisition, guarantees that zero error is accomplished in data cleansing as far as possible, Avoid the compatibility of program;By being supplemented brand message and manually being extracted, the accuracy of data ensure that;? When carrying out part-of-speech tagging, is handled using the part-of-speech tagging of Zhang Huaping, ensure that the accuracy of participle;According to preset key Word information filtering rule is filtered, and to obtain the storage of integer message file into index database, is realized accurately automatic Service data information is handled, it is time saving and energy saving.
Detailed description of the invention
By reference to the following drawings, exemplary embodiments of the present invention can be more fully understood by:
Fig. 1 is the flow chart according to the automatic method 100 handled service data information of embodiment of the present invention; And
Fig. 2 is to be shown according to the structure of the automatic system 200 handled service data information of embodiment of the present invention It is intended to.
Specific embodiment
Exemplary embodiments of the present invention are introduced referring now to the drawings, however, the present invention can use many different shapes Formula is implemented, and is not limited to the embodiment described herein, and to provide these embodiments be at large and fully disclose The present invention, and the scope of the present invention is sufficiently conveyed to person of ordinary skill in the field.Show for what is be illustrated in the accompanying drawings Term in example property embodiment is not limitation of the invention.In the accompanying drawings, identical cells/elements use identical attached Icon note.
Unless otherwise indicated, term (including scientific and technical terminology) used herein has person of ordinary skill in the field It is common to understand meaning.Further it will be understood that with the term that usually used dictionary limits, should be understood as and its The context of related fields has consistent meaning, and is not construed as Utopian or too formal meaning.
Fig. 1 is the flow chart according to the automatic method 100 handled service data information of embodiment of the present invention. As shown in Figure 1, the automatic method handled service data information that embodiments of the present invention provide, is counted using distribution It calculates frame mapreduce and carries out data cleansing based on original service data of the rigorous cleaning rule to acquisition, guarantee that data are clear It washes and accomplishes zero error as far as possible, avoid the compatibility of program;By being supplemented brand message and manually being extracted, guarantee The accuracys of data;It when carrying out part-of-speech tagging, is handled using the part-of-speech tagging of Zhang Huaping, ensure that the accurate of participle Property;It is filtered according to preset keyword message filtering rule, to obtain the storage of integer message file into index database, It realizes and accurately service data information is handled automatically, it is time saving and energy saving.Embodiments of the present invention provide automatic right The method that service data information is handled utilizes distributed computing framework since step 101 place, in step 101 Mapreduce carries out data cleansing to the original service data of acquisition, to obtain item information data file and brand message number According to file.
Preferably, wherein the field information of the item information data file includes: article code, Item Title, article Quantity and Amount in Total;The field information of the brand message data file includes: article code, Chinese brand and English product Board.
In embodiments of the present invention, it is mutually tied in data training using machine learning algorithm popular at present It closes.Decision tree is a kind of tree structure, wherein each internal node indicates the test on an attribute, each branch represents one Test output, each leaf node represent a kind of classification, and rule is found in mass data, main to be come using mapreduce technology Training data, using complete big data platform equipment, mapreduce is mainly a kind of programming model, for extensive The concurrent operation of data set (being greater than 1TB).Concept " Map (mapping) " and " Reduce (reduction) ", are their main thoughts, all It is to be borrowed in Functional Programming, there are also the characteristics borrowed in vector programming language.It greatly facilitates programming people Member will not distributed parallel programming in the case where, the program of oneself is operated in distributed system.Current software realization It is that Map (mapping) function is specified to specify concurrent Reduce for one group of key-value pair is mapped to one group of new key-value pair (reduction) function, for guaranteeing that each of the key-value pair of all mappings shares identical key group.
After carrying out data cleansing using original service data of the distributed computing framework mapreduce to acquisition, acquisition Item information data file and brand message data file are respectively as follows: coding-r-00000 and cate-r-00000.Wherein, it compiles Field in code-r-00000 file includes: article code, Item Title, number of articles and Amount in Total;cate-r-00000 Field in file includes: article code, Chinese brand and English brand.
Preferably, in step 102 according to preset transformation rule to the field information in the brand message data file Information conversion is carried out, to obtain the brand message data file by information conversion.
Preferably, wherein it is described according to preset transformation rule to the field information in the brand message data file into Row information conversion, to obtain the brand message data file by information conversion, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message there are corresponding English product Board information, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message there are corresponding Chinese product Board information, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
Preferably, in step 103 that the brand message data of the item information data file and process information conversion are literary Part is associated, and to obtain the first Item Information file, and data pick-up is carried out based on brand message according to demand, to obtain the Two Item Information files.
Preferably, wherein described carry out data pick-up based on brand message according to demand, to obtain the second Item Information text Part, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information File is stored;
The business record data extracted not successfully are manually extracted.
Preferably, wherein described manually extract the business record data extracted not successfully, comprising:
Judge in the business record data that extract not successfully every business record data whether contain brand message or not at The data volume for the business record data that function extracts and the total amount of data of the business record data in the first Item Information file Percentage whether be more than or equal to preset percentage threshold value, if so, the business record number for manually being extracted, and being extracted According to storage into the second Item Information file;Conversely, then without manually extracting.
In embodiments of the present invention, extracting brand needs us to remove one brand dictionary of maintenance, which includes Chinese Brand and English brand, and can be associated with.Maintenance program work mainly carries out dictionary table to repeat filtering and abnormality detection Filtering.Turn firstly, it is necessary to carry out information to the field information in the brand message data file according to preset transformation rule It changes, to obtain the brand message data file by information conversion,;Then, by the item information data file and by believing The brand message data file of breath conversion, which is associated, obtains the first Item Information file brand.dic, the first Item Information text The field information of part brand.dic includes: article code, Chinese brand, English brand, Item Title, quantity and Amount in Total Field.
It can be extracted according to brand according to mapreduce task, generate the second Item Information file after the completion of executing Multiple coding-r-00000 and failed extracted file still-r-00000.Wherein, the field of multiple coding-r-00000 files It include: that article code, Chinese brand, English brand, Item Title, quantity and Amount in Total field, representative are successfully extracted The a plurality of record of brand out.The field of still-r-00000 file includes: article code, Item Title, quantity and Amount in Total Field represents a plurality of record for extracting brand not successfully.
Artificial extraction supplement dictionary filtering is carried out to the failed data for extracting brand, wherein if really commodity do not have When the quantity of the record of brand or still-r-00000 file is very small compared to total data volume, even not successfully The business record number that every business record data are extracted without brand message or not successfully really in the business record data of extraction According to data volume and the first Item Information file in the percentage of total amount of data of business record data be less than default hundred Divide than threshold value, then without manually extracting.
Preferably, the keyword message is increased to described by the keyword message that each article is obtained in step 104 In two Item Information files, and part-of-speech tagging is carried out to the keyword message.
Preferably, wherein the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
It preferably, will be in the second Item Information file in step 105 according to preset keyword message filtering rule Undesirable keyword message is filtered, and using the Item Information file Jing Guo filtration treatment as integer information File is stored into index database.
Preferably, wherein the Item Information file using Jing Guo filtration treatment is arrived as integer message file storage In index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and will be described Json formatted file is imported into index database, enables to call elasticsearch to look into carry out information by external api It askes.
In embodiments of the present invention, part of speech mark is done using the part-of-speech tagging tool of Zhang Huaping for keyword message Note, undesirable keyword message is filtered, and using the Item Information file Jing Guo filtration treatment as complete object Integer message file is organized into the storage of json file into index database by product message file by way of code, because The importing of this json format is supported in the importing of index database, enables to call elasticsearch to look by external api Ask the result needed.Being designed into ETL process above all uses mapreduce frame to handle.
Fig. 2 is to be shown according to the structure of the automatic system 200 handled service data information of embodiment of the present invention It is intended to.As shown in Fig. 2, the automatic system 200 handled service data information that embodiments of the present invention provide, packet It includes: data cleansing unit 201, information conversion unit 202, data pick-up unit 203, part-of-speech tagging unit 204 and Data Integration Unit 205.Preferably, the data cleansing unit 201, for the original using distributed computing framework mapreduce to acquisition Beginning business datum carries out data cleansing, to obtain item information data file and brand message data file.
Preferably, wherein the field information of the item information data file includes: article code, Item Title, article Quantity and Amount in Total;The field information of the brand message data file includes: article code, Chinese brand and English product Board.
Preferably, the information conversion unit 202, for literary to the brand message data according to preset transformation rule Field information in part carries out information conversion, to obtain the brand message data file by information conversion.
Preferably, wherein the information conversion unit 202, literary to the brand message data according to preset transformation rule Field information in part carries out information conversion, to obtain the brand message data file by information conversion, comprising: for only The business datum of Chinese brand message, judges whether the Chinese brand message has corresponding English brand message, if so, then straight It connects and the corresponding English brand message of the Chinese brand message is added in English brand field;For only English brand letter The business datum of breath, judges whether the English brand message has corresponding Chinese brand message, if so, then directly by the English The corresponding Chinese brand message of literary brand message is added in Chinese brand field.
Preferably, the data pick-up unit 203, for by the item information data file and by information conversion Brand message data file is associated, and to obtain the first Item Information file, and is counted according to demand based on brand message According to extraction, to obtain the second Item Information file.
Preferably, wherein the data pick-up unit 203, carries out data pick-up based on brand message according to demand, to obtain Take the second Item Information file, comprising: extracted according to demand based on brand message, the business record data that success is extracted It is stored as the second Item Information file;The business record data extracted not successfully are manually extracted.
Preferably, wherein described manually extract the business record data extracted not successfully, comprising: judgement is failed The business record number that whether every business record data extract containing brand message or not successfully in the business record data of extraction According to data volume and the first Item Information file in the percentage of total amount of data of business record data whether be greater than In preset percentage threshold value, if so, manually being extracted, and the business record data extracted are stored to second object In product message file;Conversely, then without manually extracting.
Preferably, the part-of-speech tagging unit 204, for obtaining the keyword message of each article, by the keyword Information increases in the second Item Information file, and carries out part-of-speech tagging to the keyword message.
Preferably, wherein the preset keyword message filtering rule includes: by the process comprising presetting sensitive words The keyword message of part-of-speech tagging is filtered.
Preferably, the Data Integration unit 205, for according to preset keyword message filtering rule, by described the Undesirable keyword message is filtered in two Item Information files, and by the Item Information file Jing Guo filtration treatment As the storage of integer message file into index database.
Preferably, wherein the Data Integration unit, using the Item Information file Jing Guo filtration treatment as integer Message file is stored into index database, comprising: the Item Information file Jing Guo filtration treatment is carried out form arrangement, to obtain Json formatted file, and the json formatted file is imported into index database, it enables to call by external api Elasticsearch carries out information inquiry.
The automatic system 200 that service data information is handled and another reality of the invention of the embodiment of the present invention The automatic method 100 handled service data information for applying example is corresponding, and details are not described herein.
The present invention is described by reference to a small amount of embodiment.However, it is known in those skilled in the art, as Defined by subsidiary Patent right requirement, in addition to the present invention other embodiments disclosed above equally fall in it is of the invention In range.
Normally, all terms used in the claims are all solved according to them in the common meaning of technical field It releases, unless in addition clearly being defined wherein.All references " one/described/be somebody's turn to do [device, component etc.] " are all opened ground At least one example being construed in described device, component etc., unless otherwise expressly specified.Any method disclosed herein Step need not all be run with disclosed accurate sequence, unless explicitly stated otherwise.

Claims (14)

1. a kind of method handled automatically service data information, which is characterized in that the described method includes:
Data cleansing is carried out using original service data of the distributed computing framework mapreduce to acquisition, to obtain article letter Cease data file and brand message data file;
Information conversion is carried out to the field information in the brand message data file according to preset transformation rule, to obtain warp Cross the brand message data file of information conversion;
The item information data file and the brand message data file by information conversion are associated, to obtain first Item Information file, and data pick-up is carried out based on brand message according to demand, to obtain the second Item Information file;
The keyword message for obtaining each article increases to the keyword message in the second Item Information file, and Part-of-speech tagging is carried out to the keyword message;
According to preset keyword message filtering rule, keyword undesirable in the second Item Information file is believed Breath is filtered, and is stored the Item Information file Jing Guo filtration treatment as integer message file into index database.
2. the method according to claim 1, wherein the field information of the item information data file includes: Article code, Item Title, number of articles and Amount in Total;The field information of the brand message data file includes: article Coding, Chinese brand and English brand.
3. according to the method described in claim 2, it is characterized in that, it is described according to preset transformation rule to the brand message Field information in data file carries out information conversion, to obtain the brand message data file by information conversion, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message has corresponding English brand letter Breath, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message has corresponding Chinese brand letter Breath, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
4. the method according to claim 1, wherein described carry out data pumping based on brand message according to demand It takes, to obtain the second Item Information file, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information file It is stored;
The business record data extracted not successfully are manually extracted.
5. according to the method described in claim 4, it is characterized in that, described carry out people to the business record data extracted not successfully Work extracts, comprising:
Judge whether every business record data contain brand message or failed pumping in the business record data extracted not successfully The hundred of the total amount of data of business record data in the data volume of the business record data taken and the first Item Information file Divide than whether being more than or equal to preset percentage threshold value, if so, manually being extracted, and the business record data extracted is deposited It stores up in the second Item Information file;Conversely, then without manually extracting.
6. the method according to claim 1, wherein the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
7. the method according to claim 1, wherein the Item Information file using Jing Guo filtration treatment as Integer message file is stored into index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and by the json Formatted file is imported into index database, enables to call elasticsearch to carry out information inquiry by external api.
8. a kind of system handled automatically service data information, which is characterized in that the system comprises:
Data cleansing unit, for carrying out data using original service data of the distributed computing framework mapreduce to acquisition Cleaning, to obtain item information data file and brand message data file;
Information conversion unit, for being carried out according to preset transformation rule to the field information in the brand message data file Information conversion, to obtain the brand message data file by information conversion;
Data pick-up unit, for by the item information data file and by information conversion brand message data file into Row association to obtain the first Item Information file, and carries out data pick-up based on brand message according to demand, to obtain the second object Product message file;
The keyword message is increased to described second for obtaining the keyword message of each article by part-of-speech tagging unit In Item Information file, and part-of-speech tagging is carried out to the keyword message;
Data Integration unit, for according to preset keyword message filtering rule, by the second Item Information file not Satisfactory keyword message is filtered, and using the Item Information file Jing Guo filtration treatment as integer information text Part is stored into index database.
9. system according to claim 8, which is characterized in that the field information of the item information data file includes: Article code, Item Title, number of articles and Amount in Total;The field information of the brand message data file includes: article Coding, Chinese brand and English brand.
10. system according to claim 9, which is characterized in that the information conversion unit, according to preset transformation rule Information conversion is carried out to the field information in the brand message data file, to obtain the brand message number by information conversion According to file, comprising:
For the business datum of only Chinese brand message, judge whether the Chinese brand message has corresponding English brand letter Breath, if so, then directly the corresponding English brand message of the Chinese brand message is added in English brand field;
For the business datum of only English brand message, judge whether the English brand message has corresponding Chinese brand letter Breath, if so, then directly the corresponding Chinese brand message of the English brand message is added in Chinese brand field.
11. system according to claim 8, which is characterized in that the data pick-up unit is believed based on brand according to demand Breath carries out data pick-up, to obtain the second Item Information file, comprising:
It is extracted according to demand based on brand message, the business record data that success is extracted are as the second Item Information file It is stored;
The business record data extracted not successfully are manually extracted.
12. system according to claim 11, which is characterized in that described to be carried out to the business record data extracted not successfully It is artificial to extract, comprising:
Judge whether every business record data contain brand message or failed pumping in the business record data extracted not successfully The hundred of the total amount of data of business record data in the data volume of the business record data taken and the first Item Information file Divide than whether being more than or equal to preset percentage threshold value, if so, manually being extracted, and the business record data extracted is deposited It stores up in the second Item Information file;Conversely, then without manually extracting.
13. system according to claim 8, which is characterized in that the preset keyword message filtering rule includes:
It will be comprising presetting being filtered by the keyword message of part-of-speech tagging for sensitive words.
14. system according to claim 8, which is characterized in that the Data Integration unit, by the object Jing Guo filtration treatment Product message file is stored as integer message file into index database, comprising:
Item Information file Jing Guo filtration treatment is subjected to form arrangement, to obtain json formatted file, and by the json Formatted file is imported into index database, enables to call elasticsearch to carry out information inquiry by external api.
CN201811612300.XA 2018-12-27 2018-12-27 Method and system for automatically processing service data information Active CN109785099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811612300.XA CN109785099B (en) 2018-12-27 2018-12-27 Method and system for automatically processing service data information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811612300.XA CN109785099B (en) 2018-12-27 2018-12-27 Method and system for automatically processing service data information

Publications (2)

Publication Number Publication Date
CN109785099A true CN109785099A (en) 2019-05-21
CN109785099B CN109785099B (en) 2021-07-06

Family

ID=66497751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811612300.XA Active CN109785099B (en) 2018-12-27 2018-12-27 Method and system for automatically processing service data information

Country Status (1)

Country Link
CN (1) CN109785099B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552730A (en) * 2020-04-28 2020-08-18 杭州数梦工场科技有限公司 Data distribution method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020143862A1 (en) * 2000-05-19 2002-10-03 Atitania Ltd. Method and apparatus for transferring information between a source and a destination on a network
WO2007036932A2 (en) * 2005-09-27 2007-04-05 Zetapoint Ltd. Data table management system and methods useful therefor
CN101866331A (en) * 2009-12-24 2010-10-20 北京信息科技大学 Conversion method and device of XML (Extensible Markup Language) documents of different languages
CN102880709A (en) * 2012-09-28 2013-01-16 用友软件股份有限公司 Data warehouse management system and data warehouse management method
CN106649455A (en) * 2016-09-24 2017-05-10 孙燕群 Big data development standardized systematic classification and command set system
CN108241677A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for the tax revenue sorting code number for obtaining commodity
CN108415980A (en) * 2018-02-09 2018-08-17 平安科技(深圳)有限公司 Question and answer data processing method, electronic device and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020143862A1 (en) * 2000-05-19 2002-10-03 Atitania Ltd. Method and apparatus for transferring information between a source and a destination on a network
WO2007036932A2 (en) * 2005-09-27 2007-04-05 Zetapoint Ltd. Data table management system and methods useful therefor
CN101866331A (en) * 2009-12-24 2010-10-20 北京信息科技大学 Conversion method and device of XML (Extensible Markup Language) documents of different languages
CN102880709A (en) * 2012-09-28 2013-01-16 用友软件股份有限公司 Data warehouse management system and data warehouse management method
CN106649455A (en) * 2016-09-24 2017-05-10 孙燕群 Big data development standardized systematic classification and command set system
CN108241677A (en) * 2016-12-26 2018-07-03 航天信息股份有限公司 A kind of method and system for the tax revenue sorting code number for obtaining commodity
CN108415980A (en) * 2018-02-09 2018-08-17 平安科技(深圳)有限公司 Question and answer data processing method, electronic device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VISHAL SHUKLA: "《Elasticsearch集成Hadoop最佳实践》", 30 June 2017, 清华大学出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111552730A (en) * 2020-04-28 2020-08-18 杭州数梦工场科技有限公司 Data distribution method and device, electronic equipment and storage medium
CN111552730B (en) * 2020-04-28 2024-01-26 杭州数梦工场科技有限公司 Data distribution method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109785099B (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN110543374B (en) Centralized data coordination using artificial intelligence mechanism
CN107885874A (en) Data query method and apparatus, computer equipment and computer-readable recording medium
US20040158562A1 (en) Data quality system
CN110990546B (en) Intelligent question-answer corpus updating method and device
CN106296195A (en) A kind of Risk Identification Method and device
CN109598517B (en) Commodity clearance processing, object processing and category prediction method and device thereof
CN109582772A (en) Contract information extracting method, device, computer equipment and storage medium
CN109522417A (en) A kind of trading company's abstracting method of company name
CN107357785A (en) Theme feature word abstracting method and system, feeling polarities determination methods and system
CN109492104A (en) Training method, classification method, system, equipment and the medium of intent classifier model
CN109447273A (en) Model training method, advertisement recommended method, relevant apparatus, equipment and medium
CN113722483A (en) Topic classification method, device, equipment and storage medium
CN106095745A (en) Transaction record extracting method based on log and system thereof
CN110990711A (en) WeChat public number recommendation algorithm and system based on machine learning
CN110427604A (en) Table integration method and device
CN109902157A (en) A kind of training sample validation checking method and device
CN110019820A (en) Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history
CN116245097A (en) Method for training entity recognition model, entity recognition method and corresponding device
CN112035449A (en) Data processing method and device, computer equipment and storage medium
CN108228787A (en) According to the method and apparatus of multistage classification processing information
CN106997350A (en) A kind of method and device of data processing
CN109785099A (en) A kind of method and system that service data information is handled automatically
CN108874780A (en) A kind of segmentation methods system
CN109271479A (en) A kind of resume structuring processing method
CN109685103A (en) A kind of text Multi-label learning method based on broad sense K mean algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 3106, floor 31, building a, No. 2, South Zhongguancun Street, Haidian District, Beijing 100086

Applicant after: ELE-CLOUD INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 100195, Beijing, Haidian District apricot Road, No. 18

Applicant before: ELE-CLOUD INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant