CN107025233A - A kind of processing method and processing device of data characteristics - Google Patents

A kind of processing method and processing device of data characteristics Download PDF

Info

Publication number
CN107025233A
CN107025233A CN201610066847.9A CN201610066847A CN107025233A CN 107025233 A CN107025233 A CN 107025233A CN 201610066847 A CN201610066847 A CN 201610066847A CN 107025233 A CN107025233 A CN 107025233A
Authority
CN
China
Prior art keywords
field
feature
sample
class
plaintext
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610066847.9A
Other languages
Chinese (zh)
Other versions
CN107025233B (en
Inventor
张研
杨冠军
蒋程诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen yunwangwandian e-commerce Co.,Ltd.
Original Assignee
Suning Commerce Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Commerce Group Co Ltd filed Critical Suning Commerce Group Co Ltd
Priority to CN201610066847.9A priority Critical patent/CN107025233B/en
Publication of CN107025233A publication Critical patent/CN107025233A/en
Application granted granted Critical
Publication of CN107025233B publication Critical patent/CN107025233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a kind of processing method and processing device of data characteristics, it is related to big data processing technology field, the cost of data extraction can be reduced and the accuracy of data extraction is improved.The method of the present invention includes:From business log acquisition plaintext sample, the plaintext sample at least includes special field and feature field, and the special field includes the field for being used to represent to perform order and operational order;According to the feature class being pre-configured with, feature is obtained in plain text from the feature field, and records sample signature, wherein, the same sample signature of content identical special field correspondence;A special field of the correspondence sample signature is extracted, and by acquired feature in plain text, splices to one special field, obtains spliced field;The spliced field is exported as feature samples.Data characteristics of the present invention suitable for big data processing is extracted.

Description

A kind of processing method and processing device of data characteristics
Technical field
The present invention relates to big data processing technology field, more particularly to a kind of processing method and processing device of data characteristics.
Background technology
With the development of Internet technology, the data volume exponentially speed increase of online data in order to tackle the processing of mass data, has developed many big data processing schemes, to realize the information needed for the extracting data of magnanimity.
For different field and different types of data, due to there is very big difference in terms of data dimension, form, data source is also intricate, results in the need for taking the information needed for many computing resources are screened and extracted in the data of magnanimity.In currently existing scheme, mainly by way of text-processing or tables of data, wherein effective data characteristics is extracted by certain programming language, so as to realize that data are extracted.
But, the data characteristics of tables of data institute foundation is more single, it is difficult to the overview of the really necessary data of accurate description user, so as to influence subsequent data analysis, the effect of modeling.Especially in the refreshing frequencys such as ad system very high business data processing system, for extensive and various dimensions ad datas, it is necessary to which frequent updating is modeled, cost is very high but accuracy that data are extracted is still relatively low.
The content of the invention
Embodiments of the invention provide a kind of processing method and processing device of data characteristics, can reduce the cost of data extraction and improve the accuracy of data extraction.
To reach above-mentioned purpose, embodiments of the invention are adopted the following technical scheme that:
In a first aspect, embodiments of the invention provide a kind of processing method of data characteristics, including:
From business log acquisition plaintext sample, the plaintext sample at least includes special field and feature field, and the special field includes the field for being used to represent to perform order and operational order;
According to the feature class being pre-configured with, feature is obtained in plain text from the feature field, and records sample signature, wherein, the same sample signature of content identical special field correspondence;
A special field of the correspondence sample signature is extracted, and by acquired feature in plain text, splices to one special field, obtains spliced field;
The spliced field is exported as feature samples.
It is described from business log acquisition plaintext sample with reference in a first aspect, in the first possible implementation of first aspect, including:
Read the clear text field in the business diary;
First kind field is rejected in the clear text field;And/or, the character of Second Type field in the clear text field is changed into true-to-shape;
By MapReduce frameworks, the field after rejecting and/or conversion process is stored in internal memory in Map modes.
With reference in a first aspect, in second of possible implementation of first aspect, the feature class that the basis is pre-configured with obtains feature in plain text from the feature field, including:
It is successively read the field in the field in the feature class, the feature class identical with the content of at least one field in the plaintext sample;
The content of field in the feature class, is successively read the field with identical content as the feature field from the plaintext sample;
By the feature field being successively read from plaintext sample record in characteristic set.
It is described to export the spliced field as feature samples in the third possible implementation with reference to second of possible implementation of first aspect, including:
By MapReduce frameworks, the feature samples and the characteristic set are imported into the Reduce stages;
It is described to record the feature field being successively read from the plaintext sample in characteristic set, including:The identical feature field read from the plaintext sample is output to identical calculations node.
With reference in a first aspect, in the 4th kind of possible implementation of first aspect, in addition to:
Essential characteristic class is read, and the essential characteristic class is updated by reflex mechanism;
It regard the essential characteristic class of last update as the feature class being pre-configured with.
Second aspect, embodiments of the invention provide a kind of processing unit of data characteristics, including:
Extraction unit, for from business log acquisition plaintext sample, the plaintext sample at least to include special field and feature field, and the special field is including being used for expression execution order and the field of operational order;
Recognition unit, for according to the feature class being pre-configured with, feature to be obtained in plain text from the feature field, and records sample signature, wherein, the same sample signature of content identical special field correspondence;
Concatenation unit, a special field for extracting the correspondence sample signature, and by acquired feature in plain text, splice to one special field, obtain spliced field;
Output unit, for the spliced field to be exported as feature samples.
With reference to second aspect, in the first possible implementation of second aspect, in addition to pretreatment unit, for reading the clear text field in the business diary;And first kind field is rejected in the clear text field;And/or, the character of Second Type field in the clear text field is changed into true-to-shape;Again by MapReduce frameworks, the field after rejecting and/or conversion process is stored in internal memory in Map modes.
With reference to second aspect, in second of possible implementation of second aspect, the recognition unit, specifically for the field being successively read in the feature class, the field in the feature class is identical with the content of at least one field in the plaintext sample;And the content of the field in the feature class, the field with identical content is successively read from the plaintext sample as the feature field;Again by the feature field being successively read from plaintext sample record in characteristic set.
With reference to second of possible implementation of second aspect, in the third possible implementation, the output unit, specifically for by MapReduce frameworks, the feature samples and the characteristic set are imported into the Reduce stages;And the identical feature field read from the plaintext sample is output to identical calculations node.
With reference to second aspect, in the 4th kind of possible implementation of second aspect, in addition to feature class administrative unit, update the essential characteristic class for reading essential characteristic class, and by reflex mechanism;And it regard the essential characteristic class of last update as the feature class being pre-configured with.
The processing method and processing device of data characteristics provided in an embodiment of the present invention, according to the feature class being pre-configured with, feature plaintext is obtained from the feature field of plaintext sample and records sample signature, and extract a special field of the correspondence sample signature, feature is spliced with special field in plain text, the spliced field is exported as feature samples again, feature samples used are extracted as data.Relative to prior art, the present embodiment extracts required feature from mass data, the data for being difficult to extract extensive and various dimensions in the prior art are solved, having extenuated needs the problem of frequent updating is modeled, so as to reduce the cost of data extraction and improve the accuracy of data extraction.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, the required accompanying drawing used in embodiment will be briefly described below, apparently, drawings in the following description are only some embodiments of the present invention, for those of ordinary skill in the art, on the premise of not paying creative work, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is system architecture schematic diagram provided in an embodiment of the present invention;
Fig. 2 is the flow chart of the processing method of data characteristics provided in an embodiment of the present invention;
Fig. 3 a, Fig. 3 b and Fig. 3 c are respectively the structural representation of the processing unit of data characteristics provided in an embodiment of the present invention.
Embodiment
To make those skilled in the art more fully understand technical scheme, the present invention is described in further detail with reference to the accompanying drawings and detailed description.Embodiments of the present invention are described in more detail below, the example of the embodiment is shown in the drawings, wherein same or similar label represents same or similar element or the element with same or like function from beginning to end.The embodiment described below with reference to accompanying drawing is exemplary, is only used for explaining the present invention, and is not construed as limiting the claims.Those skilled in the art of the present technique are appreciated that unless expressly stated singulative " one " used herein, " one ", " described " and "the" may also comprise plural form.It should be further understood that, the wording " comprising " that uses refers to there is the feature, integer, step, operation, element and/or component in the specification of the present invention, but it is not excluded that in the presence of or add other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim element to be " connected " or during " coupled " to another element, it can be directly connected or coupled to other elements, or can also have intermediary element.In addition, " connection " used herein or " coupling " can include wireless connection or coupling.Wording "and/or" used herein includes one or more associated any cells for listing item and all combined.Those skilled in the art of the present technique are appreciated that unless otherwise defined all terms (including technical term and scientific terminology) used herein have the general understanding identical meaning with the those of ordinary skill in art of the present invention.It should also be understood that those terms defined in such as general dictionary should be understood that with the meaning consistent with the meaning in the context of prior art, and unless defined as here, it will not be explained with idealization or excessively formal implication.
The present embodiment can be using the distributed treatment framework (also referred to as MapReduce frameworks) based on MapReduce, and the specific framework of the MapReduce frameworks wherein used in the present embodiment can be as shown in Figure 1.In the process of implementation, pending data are existed in internal memory in map modes.According to be specifically the MapReduce frameworks based on hadoop, for the extraction of feature, the feature field and special field of data are extracted and exported in the map stages, and in the reduce stages, accumulative identical feature field;For sample, sampling is carried out in the map stages, the feature samples that have recorded sample signature are exported in the reduce stages.
The embodiment of the present invention provides a kind of processing method of data characteristics, as shown in Fig. 2 including:
S1, from business log acquisition plaintext sample.
Wherein, plaintext sample at least includes special field and feature field, and special field includes the field for being used to represent to perform order and operational order.The daily record data that business diary is recorded when can be operation system operation, for example:The daily record data that advertisement delivery system is recorded when running.Plaintext sample can be the non-encrypted character in business diary, and acquired plaintext sample is specifically as follows the textual form for meeting tab separations, and the special field including being used to represent " exist and show " and " click ", such as:" show " and " clk ".
Process S1-S4, the server in map stages can specifically be performed in MapReduce frameworks.
S2, according to the feature class being pre-configured with, obtains feature in plain text, and record sample signature from the feature field.
In the present embodiment, the server in map stages reads the feature class that is pre-configured with, and feature class includes the field that is configured according to sequencing in feature class, and the field in feature class is identical with the content of at least one field in plaintext sample.The server in map stages reads the plaintext sample of input in key-value modes, and exist in internal memory in map modes according to the feature class being pre-configured with.Wherein, the internal memory described in the present embodiment can be specifically the internal memory of the local device of user or the internal memory of the server in map stages.
The server in map stages, which can be peeled off first, is used for the special field for representing " exist and show " and " click " in plaintext sample;Further according to the field contents described in the feature class being pre-configured with, the extraction feature field successively from plaintext sample.Sample signature correspondence plaintext sample, and for representing the special field of " exist and show " and " click " often repeatedly, therefore the same sample signature of content identical special field correspondence in same plaintext sample in plaintext sample.Wherein, sample signature can plaintext sample be pre-configured with by server-assignment or in plaintext sample when existing in map modes in internal memory.
S3, extracts a special field of the correspondence sample signature, and by acquired feature in plain text, splices to one special field, obtain spliced field.
For example:For plaintext sample:" show clk A ..., show clk B ..., show clk C ..., show clk D ",
Wherein, special field is " show clk ", feature field is " A B C D ", therefore can obtain feature:A show clk, B show clk, C show clk, D show clk, spliced field is obtained by splicing:“show clk feaA feaB feaC feaD”.
S4, the spliced field is exported as feature samples.
Wherein, feature samples can be output to the server in reduce stages by the server in map stages.
In the present embodiment, for the extraction of feature, need in the map stages according to the feature class being pre-configured with, feature is obtained from feature field in plain text, the feature class being pre-configured with can be obtained by the reflex mechanism in java, in order to which user is when extracting feature, for general requirment, without being based on tables of data development features extraction procedure using prior art;For specific demand, the feature extraction framework (running the MapReduce frameworks that this implementation performs flow) of the present embodiment need to be only used, according to the feature class being pre-configured with, required feature is extracted from mass data.
The reflex mechanism used in the present embodiment includes:In compiling and it is uncertain be which class needs to be loaded, but specific class is just loaded when program is run, so as to obtain the structure attribute of class.Use the class being not aware that in compiling duration.Such as:After a class is loaded, Java Virtual Machine automatically generates a Class object, and is loaded into the information such as statement and definition of this corresponding method of Class objects, member and building method among virtual machine by this Class object acquisition.For concrete example, the process for obtaining the feature class being pre-configured with by the reflex mechanism in java can include:
Utilize java reflex mechanisms, defined feature class factory class (Feature), such as shown in following codes:
And in extraction feature under personal business configuration configuration feature class class name, wherein supporting the multiple many features of slot of configuration.And need not load in advance.
User profile is parsed when calling afterwards to obtain feature class name according to No. slot and reflect feature analysis class, is used for feature extractor with extraction feature.Wherein it is possible to increase any kind of feature extraction service class by specific business demand, feature class name is configured in configuration file, and the feature class oneself write is used for different slot during feature extraction.Further, the processing of pretreatment class also individually defines a pretreatment factory class, to utilize java reflex mechanism.
The processing method of data characteristics provided in an embodiment of the present invention, according to the feature class being pre-configured with, feature plaintext is obtained from the feature field of plaintext sample and records sample signature, and extract a special field of the correspondence sample signature, feature is spliced with special field in plain text, the spliced field is exported as feature samples again, feature samples used are extracted as data.Relative to prior art, the present embodiment extracts required feature from mass data, the data for being difficult to extract extensive and various dimensions in the prior art are solved, having extenuated needs the problem of frequent updating is modeled, so as to reduce the cost of data extraction and improve the accuracy of data extraction.
In the present embodiment, can be with to there is plaintext sample in internal memory in map modes or before plaintext sample is stored in internal memory, the field in plaintext sample pre-processed, such as in the server in map stages:Based on the character of the coded systems such as URL-ENCODE, base64, the pretreatments such as the conversion of half-angle full-shape, English capital and small letter conversion can be carried out, user-defined preprocessing process can also be included.Therefore it is described from business log acquisition plaintext sample, including:
Read the clear text field in the business diary.First kind field is rejected in the clear text field.And/or, the character of Second Type field in the clear text field is changed into true-to-shape.By MapReduce frameworks, the field after rejecting and/or conversion process is stored in internal memory in Map modes.
Wherein, first kind field refers to the field that there is error in data, can not read, or is intended to indicate that the character of certain content (such as:The character of certain content can include being used for character, the decollator on expression modification date etc.);Second Type field refers to can be converted, such as:Carry out the character of the conversion of half-angle full-shape or English capital and small letter conversion, the true-to-shape that the character style after changing pre-sets into user, or the form prestored in the server in map stages.
In the present embodiment, the feature class that the basis is pre-configured with obtains feature in plain text from the feature field, including:
It is successively read the field in the feature class.And the content of the field in the feature class, the field with identical content is successively read from the plaintext sample as the feature field.Again by the feature field being successively read from plaintext sample record in characteristic set.
Wherein, the field in the feature class is identical with the content of at least one field in the plaintext sample.Specifically, the server in map stages obtains new plaintext sample set, the feature class that the preparation being pre-configured with is extracted is initialized here, according to configuration the need for the feature that extracts, call feature class to do feature extraction one by one.For example:
Plaintext sample is:“show clk A B C D”;
The feature class being pre-configured with includes:
Feaclass=featureclass1;Dpd=A;Slot=1,
Feaclass=featureclass2;Dpd=B;Slot=2,
Feaclass=featureclass3;Dpd=C;Slot=3,
Feaclass=featureclass4;Dpd=D;Slot=4,
Wherein, server can initialize featureclass1, featureclass2, featureclass3 and featureclass4, according still further to configuration sequence, successively extraction feature feaA, feaB, until feaD.The characteristic set { feaA, feaB, feaC, feaD } that server is extracted, and plaintext sample show clk A B C D, the relation that server is completed according to the relation between special field and feature field between the process of splicing, field can include:{ feaA show clk ... }, final splicing completion obtains a feature samples:show clkfeaAfeaBfeaCfeaD.
In the present embodiment, it is described to export the spliced field as feature samples, including:
By MapReduce frameworks, the feature samples and the characteristic set are imported into the Reduce stages.It is described to record the feature field being successively read from the plaintext sample in characteristic set, including:The identical feature field read from the plaintext sample is output to identical calculations node.
For example:The present embodiment can use hadoop MapReduce frameworks, perform S1-S4 by the server in map stages, then (implementing result includes by implementing result:Feature samples and characteristic set) it is output to the server in reduce stages.If specifically, feature samples, then be directly output to reduce, do not process;If characteristic set, then using point bucket principle of MapReduce frameworks, identical feature is assigned in identical calculations node.The server in reduce stages, receives feature samples, then direct output characteristic sample;Characteristic set is received, then is exported again after the corresponding show clk values of characteristic set that add up.
In the present embodiment, in addition to:
Essential characteristic class is read, and the essential characteristic class is updated by reflex mechanism.
It regard the essential characteristic class of last update as the feature class being pre-configured with.
The embodiment of the present invention also provides a kind of processing unit of data characteristics, if applying in MapReduce frameworks, in the server that specifically may operate in the map stages, and as shown in Figure 3 a, the processing unit includes:
Extraction unit, for from business log acquisition plaintext sample, the plaintext sample at least to include special field and feature field, and the special field is including being used for expression execution order and the field of operational order.
Recognition unit, for according to the feature class being pre-configured with, feature to be obtained in plain text from the feature field, and records sample signature, wherein, the same sample signature of content identical special field correspondence.
Concatenation unit, a special field for extracting the correspondence sample signature, and by acquired feature in plain text, splice to one special field, obtain spliced field.
Output unit, for the spliced field to be exported as feature samples.
In the present embodiment, the recognition unit, specifically for the field being successively read in the feature class, the field in the feature class is identical with the content of at least one field in the plaintext sample.And the content of the field in the feature class, the field with identical content is successively read from the plaintext sample as the feature field.Again by the feature field being successively read from plaintext sample record in characteristic set.
In the present embodiment, the output unit, specifically for by MapReduce frameworks, the feature samples and the characteristic set are imported into the Reduce stages.And the identical feature field read from the plaintext sample is output to identical calculations node.
Further, as shown in Figure 3 b, in addition to:Pretreatment unit, for reading the clear text field in the business diary.And first kind field is rejected in the clear text field.And/or, the character of Second Type field in the clear text field is changed into true-to-shape.Again by MapReduce frameworks, the field after rejecting and/or conversion process is stored in internal memory in Map modes.
Further, as shown in Figure 3 c, in addition to feature class administrative unit, the essential characteristic class is updated for reading essential characteristic class, and by reflex mechanism.And it regard the essential characteristic class of last update as the feature class being pre-configured with.
The processing unit of data characteristics provided in an embodiment of the present invention, according to the feature class being pre-configured with, feature plaintext is obtained from the feature field of plaintext sample and records sample signature, and extract a special field of the correspondence sample signature, feature is spliced with special field in plain text, the spliced field is exported as feature samples again, feature samples used are extracted as data.Relative to prior art, the present embodiment extracts required feature from mass data, the data for being difficult to extract extensive and various dimensions in the prior art are solved, having extenuated needs the problem of frequent updating is modeled, so as to reduce the cost of data extraction and improve the accuracy of data extraction.
Each embodiment in this specification is described by the way of progressive, and identical similar part is mutually referring to what each embodiment was stressed is the difference with other embodiment between each embodiment.For apparatus embodiments, because it is substantially similar to embodiment of the method, so describing fairly simple, the relevent part can refer to the partial explaination of embodiments of method.One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, computer program is can be by instruct the hardware of correlation to complete, described program can be stored in a computer read/write memory medium, the program is upon execution, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..It is described above; only embodiment of the invention, but protection scope of the present invention is not limited thereto, any one skilled in the art the invention discloses technical scope in; the change or replacement that can be readily occurred in, should all be included within the scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.

Claims (10)

1. a kind of processing method of data characteristics, it is characterised in that including:
From business log acquisition plaintext sample, the plaintext sample at least includes special field and feature field, The special field includes the field for being used to represent to perform order and operational order;
According to the feature class being pre-configured with, feature is obtained in plain text from the feature field, and records sample signature, Wherein, the same sample signature of content identical special field correspondence;
A special field of the correspondence sample signature is extracted, and by acquired feature in plain text, splicing is extremely One special field, obtains spliced field;
The spliced field is exported as feature samples.
2. according to the method described in claim 1, it is characterised in that described from business log acquisition plaintext sample This, including:
Read the clear text field in the business diary;
First kind field is rejected in the clear text field;And/or, change second in the clear text field The character of type field is true-to-shape;
By MapReduce frameworks, the field after rejecting and/or conversion process is stored in Map modes interior Deposit.
3. according to the method described in claim 1, it is characterised in that the feature class that the basis is pre-configured with Feature is obtained from the feature field in plain text, including:
It is successively read in the field in the field in the feature class, the feature class and the plaintext sample The content of at least one field is identical;
The content of field in the feature class, is successively read with identical interior from the plaintext sample The field of appearance is used as the feature field;
By the feature field being successively read from plaintext sample record in characteristic set.
4. method according to claim 3, it is characterised in that described to make the spliced field Sample output is characterized, including:
By MapReduce frameworks, the feature samples and the characteristic set are imported into the Reduce stages;
It is described to record the feature field being successively read from the plaintext sample in characteristic set, bag Include:The identical feature field read from the plaintext sample is output to identical calculations node.
5. according to the method described in claim 1, it is characterised in that also include:
Essential characteristic class is read, and the essential characteristic class is updated by reflex mechanism;
It regard the essential characteristic class of last update as the feature class being pre-configured with.
6. a kind of processing unit of data characteristics, it is characterised in that including:
Extraction unit, for from business log acquisition plaintext sample, the plaintext sample at least to include special word Section and feature field, the special field include the field for being used to represent to perform order and operational order;
Recognition unit, for according to the feature class being pre-configured with, feature to be obtained in plain text from the feature field, And sample signature is recorded, wherein, the same sample signature of content identical special field correspondence;
Concatenation unit, a special field for extracting the correspondence sample signature, and by acquired spy Levy in plain text, splice to one special field, obtain spliced field;
Output unit, for the spliced field to be exported as feature samples.
7. device according to claim 6, it is characterised in that also including pretreatment unit, for reading Take the clear text field in the business diary;And first kind field is rejected in the clear text field;And/or, The character of Second Type field in the clear text field is changed into true-to-shape;Pass through MapReduce frames again Frame, internal memory is stored in by the field after rejecting and/or conversion process in Map modes.
8. device according to claim 6, it is characterised in that the recognition unit, specifically for according to Field in the secondary field read in the feature class, the feature class and at least one in the plaintext sample The content of bar field is identical;And the content of the field in the feature class, from the plaintext sample according to The secondary field with identical content that reads is used as the feature field;To successively it be read from the plaintext sample again The feature field taken is recorded in characteristic set.
9. device according to claim 8, it is characterised in that the output unit, specifically for logical MapReduce frameworks are crossed, the feature samples and the characteristic set are imported into the Reduce stages;And will from institute State the identical feature field read in literary sample clearly and be output to identical calculations node.
10. device according to claim 6, it is characterised in that also including feature class administrative unit, is used In reading essential characteristic class, and the essential characteristic class is updated by reflex mechanism;And by last update Essential characteristic class be used as the feature class being pre-configured with.
CN201610066847.9A 2016-01-29 2016-01-29 Data feature processing method and device Active CN107025233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610066847.9A CN107025233B (en) 2016-01-29 2016-01-29 Data feature processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610066847.9A CN107025233B (en) 2016-01-29 2016-01-29 Data feature processing method and device

Publications (2)

Publication Number Publication Date
CN107025233A true CN107025233A (en) 2017-08-08
CN107025233B CN107025233B (en) 2020-04-28

Family

ID=59524525

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610066847.9A Active CN107025233B (en) 2016-01-29 2016-01-29 Data feature processing method and device

Country Status (1)

Country Link
CN (1) CN107025233B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934628A (en) * 2019-03-08 2019-06-25 智者四海(北京)技术有限公司 Characteristic processing method and device
CN111224743A (en) * 2018-11-23 2020-06-02 中兴通讯股份有限公司 Detection method, terminal and computer readable storage medium
CN111461253A (en) * 2020-04-17 2020-07-28 浙江百应科技有限公司 Automatic feature extraction system and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079074A (en) * 2007-07-26 2007-11-28 杭州华三通信技术有限公司 Data storage and retrieving method and system
CN101483553A (en) * 2009-02-24 2009-07-15 中兴通讯股份有限公司 Audit apparatus and method for customer network behavior
CN103473306A (en) * 2013-09-10 2013-12-25 北京思特奇信息技术股份有限公司 Method and system for adopting structured query language (SQL) mark substitution method to achieve data self-extraction
CN104050269A (en) * 2014-06-23 2014-09-17 上海帝联信息科技股份有限公司 Log compression method and device and log decompression method and device
CN104717085A (en) * 2013-12-16 2015-06-17 ***通信集团湖南有限公司 Log parsing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079074A (en) * 2007-07-26 2007-11-28 杭州华三通信技术有限公司 Data storage and retrieving method and system
CN101483553A (en) * 2009-02-24 2009-07-15 中兴通讯股份有限公司 Audit apparatus and method for customer network behavior
CN103473306A (en) * 2013-09-10 2013-12-25 北京思特奇信息技术股份有限公司 Method and system for adopting structured query language (SQL) mark substitution method to achieve data self-extraction
CN104717085A (en) * 2013-12-16 2015-06-17 ***通信集团湖南有限公司 Log parsing method and device
CN104050269A (en) * 2014-06-23 2014-09-17 上海帝联信息科技股份有限公司 Log compression method and device and log decompression method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111224743A (en) * 2018-11-23 2020-06-02 中兴通讯股份有限公司 Detection method, terminal and computer readable storage medium
CN109934628A (en) * 2019-03-08 2019-06-25 智者四海(北京)技术有限公司 Characteristic processing method and device
CN109934628B (en) * 2019-03-08 2021-03-19 智者四海(北京)技术有限公司 Feature processing method and device
CN111461253A (en) * 2020-04-17 2020-07-28 浙江百应科技有限公司 Automatic feature extraction system and method

Also Published As

Publication number Publication date
CN107025233B (en) 2020-04-28

Similar Documents

Publication Publication Date Title
US10157177B2 (en) System and method for extracting entities in electronic documents
US11762926B2 (en) Recommending web API's and associated endpoints
US20100121883A1 (en) Reporting language filtering and mapping to dimensional concepts
US20210125082A1 (en) Operative enterprise application recommendation generated by cognitive services from unstructured requirements
US10324895B2 (en) Generating index entries in source files
US20180314688A1 (en) Instant translation of user interfaces of a web application
US9298689B2 (en) Multiple template based search function
CN107025233A (en) A kind of processing method and processing device of data characteristics
Ravulavaru Google Cloud AI Services Quick Start Guide: Build Intelligent Applications with Google Cloud AI Services
Ray et al. Review of cloud-based natural language processing services and tools for chatbots
CN112925523B (en) Object comparison method, device, equipment and computer readable medium
CN104408198A (en) Method and device for acquiring webpage contents
CN111898762B (en) Deep learning model catalog creation
Bhandarkar et al. Text summarization using combination of sequence-to-sequence model with attention approach
CN110929085B (en) System and method for processing electric customer service message generation model sample based on meta-semantic decomposition
Sankar et al. The Applied AI and Natural Language Processing Workshop: Explore practical ways to transform your simple projects into powerful intelligent applications
Sahin et al. Text summarization
US20150324333A1 (en) Systems and methods for automatically generating hyperlinks
CN111860862A (en) Performing hierarchical simplification of learning models
Exman et al. Apogee: Application Ontology Generation with Size Optimization
EP4303719A1 (en) Automated generation of web applications based on wireframe metadata generated from user requirements
US11556591B2 (en) Tenant-isolated custom annotations for search within a public corpus
US11966725B2 (en) Microservice termination while maintaining high availability
CN117743698B (en) Network malicious handwriting recognition method and system based on AI large model
US10474750B1 (en) Multiple information classes parsing and execution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200326

Address after: 210042 No. 1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing City, Jiangsu Province

Applicant after: Suning Cloud Computing Co.,Ltd.

Address before: 210042 Nanjing Province, Xuanwu District, Jiangsu Suning Avenue, Suning headquarters, No. 1

Applicant before: SUNING COMMERCE GROUP Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210602

Address after: 518001 unit 3510-131, Luohu business center, 2028 Shennan East Road, Chengdong community, Dongmen street, Luohu District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen yunwangwandian e-commerce Co.,Ltd.

Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210042

Patentee before: Suning Cloud Computing Co.,Ltd.

TR01 Transfer of patent right