CN105912674A - Method, device and system for noise reduction and classification of data - Google Patents

Method, device and system for noise reduction and classification of data Download PDF

Info

Publication number
CN105912674A
CN105912674A CN201610227851.9A CN201610227851A CN105912674A CN 105912674 A CN105912674 A CN 105912674A CN 201610227851 A CN201610227851 A CN 201610227851A CN 105912674 A CN105912674 A CN 105912674A
Authority
CN
China
Prior art keywords
data
feature
noise reducing
described feature
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610227851.9A
Other languages
Chinese (zh)
Inventor
李光辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JINGSHUO CENTURY TECHNOLOGY (BEIJING) Co Ltd
Original Assignee
JINGSHUO CENTURY TECHNOLOGY (BEIJING) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JINGSHUO CENTURY TECHNOLOGY (BEIJING) Co Ltd filed Critical JINGSHUO CENTURY TECHNOLOGY (BEIJING) Co Ltd
Priority to CN201610227851.9A priority Critical patent/CN105912674A/en
Publication of CN105912674A publication Critical patent/CN105912674A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, device and system for noise reduction and classification of data. The device for the noise reduction and the classification of the data comprises a data management module, a characteristic generation module and a data processing module, wherein the data management module is configured to lead in one or multiple data sources, integrate data in the one or multiple data sources, classify the data into multiple data groups and store the data groups in a database; the characteristic generation module is configured to generate a characteristic library, wherein the characteristic library comprises multiple characteristic groups, each characteristic group is generated by the step of generation of multiple characteristics, each characteristic is corresponding to processing of one or multiple data groups, and each characteristic comprises a characteristic condition and one or multiple key contents; relations among the multiple characteristics are set to generate the characteristic groups, wherein each characteristic group comprises the multiple characteristics and relations among the multiple characteristics; data processing types corresponding to the characteristic groups are set; and the data processing module selects the characteristics, the characteristic groups or the characteristic library to mark noise and/or carry out the data classification for the data stored in the data library. The method, device and system for the noise reduction and the classification of the data provided by the invention can be used for mass data processing.

Description

Noise reducing of data and sorting technique, Apparatus and system
Technical field
The present invention relates to data processing field, particularly relate to a kind of Noise reducing of data and sorting technique, device and System.
Background technology
At big data age, the demand processed for data is increasing.But, for mass data When carrying out statistical analysis, it is generally required to reject some interference data, such as, magnanimity social data is carried out During process, certainly exist a large amount of noise data.Generally, formed by modes such as semanteme, summaries Automatic noise reduction cleaning machine system in order to avoid deleting useful data by mistake, so relatively can be rough, to enter The further fine processing of row, in addition it is also necessary to manually process.
It addition, in the face of the noise reduction of mass data cleans, artificial treatment does not has specific purpose tool.For counting The personnel processed according to noise reduction classification, have various technology requirement.Such as, Noise reducing of data classification process is carried out Personnel need it can have data programming and database instruction relevant knowledge.Simultaneously as data volume is big, Also has the requirement of device hardware performance, it is impossible to the larger range of universal process work to big data.
Summary of the invention
The defect that the present invention exists to overcome above-mentioned prior art, it is provided that a kind of Noise reducing of data and classification side Method, Apparatus and system, it is simple to mass data processing.
According to an aspect of the present invention, it is provided that a kind of Noise reducing of data and sorter, including data pipe Reason module, is configured to import one or more data source, integrates the number in the one or more data source According to, and it is divided into multiple datasets in data base;Feature generation module, is configured to generate feature database, Described feature database includes multiple feature group, and described feature group generates as follows: generate multiple feature, Each described one or more data set of feature alignment processing, each described feature includes characteristic condition and Individual or multiple key contents;Relation between the plurality of feature is set, to generate described feature group, institute State feature group and include the relation between the plurality of feature and the plurality of feature;Corresponding described feature is set The data processing type of group;Data processing module, selects described feature, described feature group or described feature Storehouse is to storing data markers noise in the database and/or data classification.
According to another aspect of the invention, also provide for a kind of Noise reducing of data and categorizing system, including: data Storehouse, stores the data of one or more data source;Multiple described Noise reducing of datas and sorter, often Individual described Noise reducing of data and sorter also include: feature database management and sharing module, and being configured to management should Noise reducing of data and the feature database of sorter, and with other Noise reducing of datas and sorter sharing feature storehouse; Primary processing unit, is configured to that data processing task is distributed to multiple Noise reducing of data and sorter performs.
Preferably, during described primary processing unit is multiple described Noise reducing of data and sorter.
Preferably, each described Noise reducing of data and sorter also include: mixed-media network modules mixed-media, are configured to and institute State primary processing unit and described database communication.
According to another aspect of the invention, also provide for a kind of Noise reducing of data and sorting technique, including: import One or more data sources, integrate the data in the one or more data source, and are divided into multiple data Group is stored in data base;Generating feature database, described feature database includes multiple feature group, described feature group Generate as follows: generate multiple feature, each described one or more data set of feature alignment processing, Each described feature includes characteristic condition and one or more key content;Arrange between the plurality of feature Relation, to generate described feature group, described feature group includes the plurality of feature and the plurality of feature Between relation;The data processing type of corresponding described feature group is set;Select described feature, described spy Levy group or described feature database the data stored in the database are processed.
Preferably, the data being stored in described data base are word, audio frequency, picture or video;Described pass Key content is key word, crucial audio frequency, key picture or key video sequence.
Preferably, described feature database, described feature group and described feature include respectively corresponding described feature database, Described feature group and the title of described feature and annotation information, select described feature, described feature group or institute State feature database and process and also include storing data in the database: show described feature database, Described feature group and the title of described feature and annotation information.
Preferably, described characteristic condition includes: comprises, do not comprise, be more than, be less than, be equal to, be more than It is equal to, is less than or equal to, with corresponding key content beginning, with corresponding key content ending and corresponding key Content is similar and dissimilar with corresponding key content.
Preferably, described data processing type includes: deletion, the classification of labelling noise, data.
Compared to existing technology, the present invention has the advantage that
1, reducing hardware cost, making the noise reduction classification of mass data process can be on the computer of low configuration Carry out;
2, after data carry out noise reduction classification process, can be used for various data analysis, reduce time cost, spirit Activity is strong, it is not necessary to carry out data mining for specific data analysis, and special software system is write in research, Achievement durability is higher
3, utilize multiple Noise reducing of data and sorter parallel data processing noise reduction and classification, improve data fall The efficiency made an uproar and classify, Noise reducing of data idle in effective use system and sorter.
4, human cost when mass data noise reduction and classification is reduced, it is not necessary to Noise reducing of data and the personnel of classification There is data programming and database instruction relevant knowledge.
Accompanying drawing explanation
Its example embodiment is described in detail, above and other feature of the present invention and excellent by referring to accompanying drawing Point will be apparent from.
Fig. 1 shows the flow chart of Noise reducing of data according to embodiments of the present invention and sorting technique.
Fig. 2 shows the schematic diagram that data store according to embodiments of the present invention.
Fig. 3 shows the schematic diagram of Noise reducing of data according to embodiments of the present invention and sorter.
Fig. 4 shows the schematic diagram of Noise reducing of data according to embodiments of the present invention and categorizing system.
Fig. 5 shows showing according to Noise reducing of data in Fig. 4 and the Noise reducing of data of categorizing system and sorter It is intended to.
Detailed description of the invention
It is described more fully with example embodiment referring now to accompanying drawing.But, example embodiment can Implement in a variety of forms, and be not understood as limited to embodiment set forth herein;On the contrary, it is provided that this A little embodiments make the present invention will fully and completely, and the design of example embodiment are passed on all sidedly To those skilled in the art.The most identical reference represents same or similar structure, thus Repetition thereof will be omitted.
In order to solve in prior art, mass data processes noise reduction and data processing equipment hardware is wanted by classification Asking high, the specialty of data treatment people is required high problem, the present invention provides a kind of Noise reducing of data and divides Class method.Noise reducing of data according to embodiments of the present invention and classification side is shown with specific reference to Fig. 1, Fig. 1 The flow chart of method, and particularly illustrate 3 steps.
Step S110: import one or more data source, integrates the data in one or more data source, And it is divided into multiple datasets in data base.
Specifically, import one or more data source can include accessing social network sites server and obtaining Data source, using one or more key words from search engine web site server import related web page data as Data source, the data etc. imported in home server.Those skilled in the art can realize more importing number Mode according to source.After importing one or more data source, the data in these one or more data sources are entered Row is integrated.Such as, the data in multiple data sources are stored by identical saving format.The most such as, Data in multiple data sources are carried out preliminary classification, and by such as text, audio frequency, image and video etc. Different types of data stores.After integration, the data in one or more data sources are divided into many numbers It is stored in data base according to group.
Specifically, can store the data in data base by the form of tables of data, as shown in Figure 2. In the tables of data shown in Fig. 2, every string represents a data set.In some change case, a number Can be with behavior unit according to group.In other change case, a data set can be with multiple row or multirow For unit.In fig. 2, T1, T2, T3 etc. in A column data (A group data) are text data; P1, P2, P3 etc. in B column data (B group data) are view data;C column data (C group data) In A1, A2, A3 etc. be voice data;V1, V2, V3 in D column data (D group data) Deng for video data.
In addition to carry out the tables of data that packet stores in Fig. 2 by data type, data can also be belonged to by other Property is grouped.For example, it is possible to be grouped by different data sources, such as, A column data is from micro- The data that rich server obtains, B column data is the data obtained from wechat server, C column data be from The data etc. that Baidu's server obtains.Again for example, it is possible to be grouped by different topics, such as, A Column data is for being the data that search word obtains after Baidu scans for " A ", and B column data is with " B " The data obtained after Baidu scans for for search word, C column data is for being that search word is hundred with " C " The data that degree obtains after scanning for.
Except tables of data, the data in one or more data sources also can store by other means, this Skilled person can realize more variation pattern, for example, it is possible to can also by data fragmentation, point Page or a point table are read out or store, and these variation patterns are the most within the scope of the present invention.
Step S120: generate feature database.
Specifically, the present invention generates feature database as follows:
Step 1: create feature database.When creating feature database, it is also possible to input feature vector library name, affiliated Classification and the annotation in this feature storehouse.The feature database title confession operator of input distinguish different feature databases. The annotation of affiliated classification and feature database can know the concrete function in this feature storehouse for operator.
Step 2: create feature group under newly-built feature database.When creating feature group, it is also possible to input Feature group name and the annotation of this feature group.The feature group name confession operator of input distinguish different spies Levy group.The annotation of feature group can know the concrete function of this feature group for operator.Specifically, Feature group includes the relation between multiple feature and multiple feature.Create feature group as follows many Individual feature:
Step I: a newly-built blank feature;
Step II, select and specify for process data set;
Step III, select or input a kind of characteristic condition, set one or more key content.
Characteristic condition includes: comprises, do not comprise, be more than, be less than, be equal to, be more than or equal to, be less than In, with corresponding key content beginning, similar with corresponding key content with corresponding key content ending and with Corresponding key content is dissimilar.
Specifically, owing to pending data can be word, audio frequency, picture or video, correspondingly, Above-mentioned key content can also be key word, crucial audio frequency, key picture or key video sequence.A reality Executing in example, when pending data are word, key content can be key word.Such as, in conjunction with on State characteristic condition, this feature can be a certain group or multi-group data starts with set " key word ", With set " key word " ending.In other embodiment, when pending data are picture, Key content can be key word, it is also possible to be key picture.Such as, in conjunction with features described above condition, should Feature can be a certain group or multi-group data with similar to " key word ", can be certain one or more groups number According to this similar to " key picture ", can be a certain group or multi-group data to comprise " key picture " similar. Likewise it is possible to create for audio frequency and the feature of video data.Specifically, the process of image judges Can carry out by pixel distribution, picture shape and image outline in image are identified analysis.Sound The process of frequency judges can be by coming row discriminatory analysiss such as speech recognition in audio frequency, vocal print, audio intensity Carry out.In like manner, can judge to be identified video data analyzing in conjunction with the process of image and audio frequency.
Step IV: add a new feature by step I to step III.And select step IV to add Feature and the relation of previous bar feature.Relation described herein can be " or " or " and ".
Step V: repeat above-mentioned steps above.
Step VI: the data processing type of feature group is set.Data processing type can be deletion, labelling Noise and/or data classification.Specifically, above-mentioned labelling noise may include that and is defined as noise and definition For non-noise.The classification of above-mentioned data may include that under the XX group that these data are assigned to the big class of XX, Data are stored in XX catalogue etc..When carrying out data classification, it is also possible to include multiplexing of classifying.Classification is multiple Classifying data with referring to, when performing according to data processing type, classification multiplexing is Refer to when a data meets multiple condition, repeat to be stored in multiple classification by a data.If do not carried out Multiplexing then can be by the data data processing type by first coupling.
The feature group generated by above-mentioned steps I to step VI, such as, may is that when certain in certain tables of data The A column data of row data comprises " certified products ", and does not comprises in B column data " purchase ", or C row Numerical value in data is less than " 10 ", and D column data starts with " Taobao ", then this journey data are fixed Justice is noise.
More specifically, above-mentioned, feature, feature group and three levels of feature database, people in the art are only shown Member can also process demand according to real data and arrange more level, such as, can also arrange feature documents, Feature documents includes multiple feature database.
Noise reducing of data and the main indirect labor of sorting technique that the present invention provides carry out mass data process, use In batch execution data accurate in mass data, it is therefore desirable to need in the compiling procedure of feature in real time Check data content, check the feature implementation effect write.In one embodiment of the invention, use The mode of dynamic sampling, writes with the situation auxiliary user that sample performs.For improving the effectiveness of sample, Noise reducing of data and sorting technique that the present invention provides are preferably used in feature group front sequence characteristics and accurately extract sample This, sample size can be the 5% of 5 ten thousand to total amount of data.In other embodiments of the present invention, if Data are picture, audio or video, then without being sampled.
Step S130: select at feature, feature group or the feature database data to being stored in data base Reason.
Specifically, user can be according to shown feature database, feature group and the title of feature and annotation Information selects.As can be seen here, user only needs the steps such as input, selection, directly according to word content The noise reduction and the classification that carry out data process, the programming of its need not have and the knowledge of database instruction, it is possible to Complete early stage noise reduction and the classification of mass data.
Specifically, the step that step, the step of feature group generation and the feature that features described above storehouse generates generates Can test at any time in Zhou, to facilitate user according to performing result, feature to be adjusted.Additionally, Features described above storehouse, feature group and/or the execution of feature and test can be counted according to data storing mode Calculate, and record progress, suspend at any time, continue or terminate.At the same time it can also be according to data volume estimate into Exhibition and remaining time.Such as, when estimating progress and residue by reduced data amount and processed time Between.
Specifically, above-mentioned data after above-mentioned steps S130 processes, step S130 can be again introduced into In process, make in the way of such iterative processing Noise reducing of data and data classification finer.On State feature database, feature group and feature can Reusability, for the data source of stable supply, Ke Yijian Vertical automatic business processing mechanism, performs automatically.
Specifically, it is also possible to by the way of distributed collaborative, data are carried out noise reduction and classification.Such as, Multiple devices carrying out Noise reducing of data and classification belong to same LAN, can carry out data when having in LAN When the device of noise reduction and classification is in idle condition, task can be distributed on backstage the Noise reducing of data of free time With the device of classification, idle computing capability cooperation is utilized to carry out the distribution process of mass data to promote data Processing speed.When there being multiple collaborative process of device carrying out Noise reducing of data and classification, it is also possible to difference User or carry out the device of Noise reducing of data and classification and carry out the mandate of feature database and call mutually.
According to said method, the present invention also provides for a kind of Noise reducing of data and sorter, as shown in Figure 3. Noise reducing of data and sorter 200 include data management module 210, feature generation module 220 and data Processing module 230.
Data management module 210 performs above-mentioned steps S110, is configured to import one or more data source, Integrate the data in one or more data source, and be divided into multiple datasets in data base.
Feature generation module 220 performs above-mentioned steps S120, is configured to generate feature database.Feature database includes Multiple feature groups.Feature group includes a plurality of feature.
Data processing module 230 performs above-mentioned steps S130, is configured to select feature, feature group or feature The storehouse data markers noise to being stored in data base and/or data classification.
Noise reducing of data and sorter 200 that the present invention provides can be integrated in, as processor, use of typically handling official business In the operating system of X86 or X64 framework, in the electronic installation such as mobile device.For the sake of clarity, figure 3 three modules only illustrating Noise reducing of data and sorter 200, those skilled in the art are according to this explanation The description of book can also realize more module, does not repeats them here.
In order to realize the process of distributed data noise reduction and classification, the present invention also provide for a kind of Noise reducing of data and Categorizing system.In conjunction with Fig. 4 and Fig. 5, Noise reducing of data and the categorizing system that the present invention provides is described.Data drop Make an uproar and categorizing system includes data base 400, multiple Noise reducing of data and sorter 300 and primary processing unit 500.In the present embodiment, it is shown that three Noise reducing of datas and sorter 300A, 300B, 300C, data In noise reduction and categorizing system, the quantity of Noise reducing of data and sorter 300 is not so limited.Main process fills Putting 500 can be in multiple Noise reducing of data and sorter 300, it is also possible to be other electronics dress Put.Noise reducing of data and sorter 300 can be integrated in the electronics dresses such as low configuration computer, mobile device In putting.Data base 400, multiple Noise reducing of data and sorter 300 and primary processing unit 500 are by having Line or wirelessly carry out communication.Alternatively, data base 400, multiple Noise reducing of data and sorter 300 and primary processing unit 500 be positioned in same LAN.In some change case, data base 400, many Individual Noise reducing of data and sorter 300 and primary processing unit 500 can also be positioned at different LANs.
Specifically, data base 400 stores the data of one or more data source.Noise reducing of data and classification Device 300 includes being configured to the data management module 310 of execution above-mentioned steps S110, being configured to perform The feature generation module 320 stating step S120, the data being configured to execution above-mentioned steps S130 process mould Block 330 and feature database manage and sharing module 340.Feature database management is configured to sharing module 340 Manage this Noise reducing of data and the feature database of sorter 300, and with other Noise reducing of datas and sorter 300 Sharing feature storehouse.Feature database management is configured to manage the right to use of each feature database with sharing module 340 Limit.Noise reducing of data and sorter 300 also include mixed-media network modules mixed-media 350.Mixed-media network modules mixed-media 350 be configured to Primary processing unit and database communication.Primary processing unit 400 is configured to distribute to data processing task many Individual Noise reducing of data and sorter 300 perform.
In one embodiment, each Noise reducing of data and sorter 300 include feature database management and share Module 340.The Noise reducing of data of one free time and sorter 300 can be as primary processing unit 400 with to them His Noise reducing of data and sorter 300 distribute task.In other words, in the present embodiment, as main process Noise reducing of data and the sorter 300 of device 400 can be replaced according to practical situation.A change Change in example, it is intended that a Noise reducing of data and sorter 300 as primary processing unit 400 with to other data Noise reduction and sorter 300 distribute task.In this change case, as the number of primary processing unit 400 Fix according to noise reduction and sorter 300.For example, it is possible to the of a relatively high integrated noise reduction of specified configuration and point The electronic installation of class device 300 is as primary processing unit 400.
Compared to existing technology, the present invention has the advantage that
1, reducing hardware cost, making the noise reduction classification of mass data process can be on the computer of low configuration Carry out;
2, after data carry out noise reduction classification process, can be used for various data analysis, reduce time cost, spirit Activity is strong, it is not necessary to carry out data mining for specific data analysis, and special software system is write in research, Achievement durability is higher
3, utilize multiple Noise reducing of data and sorter parallel data processing noise reduction and classification, improve data fall The efficiency made an uproar and classify, Noise reducing of data idle in effective use system and sorter.
4, human cost when mass data noise reduction and classification is reduced, it is not necessary to Noise reducing of data and the personnel of classification There is data programming and database instruction relevant knowledge.
More than it is particularly shown and described the illustrative embodiments of the present invention.It should be understood that the present invention It is not limited to disclosed embodiment, is included in scope on the contrary, it is intended to contain Interior various amendments and equivalent replacement.

Claims (9)

1. a Noise reducing of data and sorter, it is characterised in that including:
Data management module, is configured to import one or more data source, integrates the one or more number According to the data in source, and it is divided into multiple datasets in data base;
Feature generation module, is configured to generate feature database, and described feature database includes multiple feature group, described Feature group generates as follows:
Generate multiple feature, each described one or more data set of feature alignment processing, Mei Gesuo State feature and include characteristic condition and one or more key content;
Relation between the plurality of feature is set, to generate described feature group, described feature group bag Include the relation between the plurality of feature and the plurality of feature;
The data processing type of corresponding described feature group is set;
Data processing module, selects described feature, described feature group or described feature database described to being stored in Data markers noise in data base and/or data classification.
2. a Noise reducing of data and categorizing system, it is characterised in that including:
Data base, stores the data of one or more data source;
Multiple Noise reducing of datas as claimed in claim 1 and sorter, described Noise reducing of data and classification dress Put and also include:
Feature database management and sharing module, be configured to manage the feature of this Noise reducing of data and sorter Storehouse, and with other Noise reducing of datas and sorter sharing feature storehouse;
Primary processing unit, is configured to that data processing task is distributed to multiple Noise reducing of data and sorter comes Perform.
3. Noise reducing of data as claimed in claim 2 and categorizing system, it is characterised in that described main process Device is in multiple described Noise reducing of data and sorter.
4. Noise reducing of data as claimed in claim 2 and categorizing system, it is characterised in that each described number Also include according to noise reduction and sorter:
Mixed-media network modules mixed-media, is configured to and described primary processing unit and described database communication.
5. a Noise reducing of data and sorting technique, it is characterised in that including:
Import one or more data source, integrate the data in the one or more data source, and be divided into Multiple datasets are in data base;
Generating feature database, described feature database includes that multiple feature group, described feature group generate as follows:
Generate multiple feature, each described one or more data set of feature alignment processing, Mei Gesuo State feature and include characteristic condition and one or more key content;
Relation between the plurality of feature is set, to generate described feature group, described feature group bag Include the relation between the plurality of feature and the plurality of feature;
The data processing type of corresponding described feature group is set;
Select described feature, described feature group or the described feature database data to storing in the database Process.
6. Noise reducing of data as claimed in claim 5 and sorting technique, it is characterised in that
The data being stored in described data base are word, audio frequency, picture or video;
Described key content is key word, crucial audio frequency, key picture or key video sequence.
7. Noise reducing of data as claimed in claim 5 and sorting technique, it is characterised in that described feature database, Described feature group and described feature include corresponding described feature database, described feature group and described feature respectively Title and annotation information, select described feature, described feature group or described feature database to being stored in described number Carry out processing also including according to the data in storehouse:
Show described feature database, described feature group and the title of described feature and annotation information.
8. Noise reducing of data as claimed in claim 5 and sorting technique, it is characterised in that described feature bar Part includes: comprises, do not comprise, be more than, be less than, be equal to, be more than or equal to, be less than or equal to, with correspondence Key content beginning, similar with corresponding key content with corresponding key content ending and with in corresponding key Hold dissmilarity.
9. Noise reducing of data as claimed in claim 5 and sorting technique, it is characterised in that at described data Reason type includes: deletion, the classification of labelling noise, data.
CN201610227851.9A 2016-04-13 2016-04-13 Method, device and system for noise reduction and classification of data Pending CN105912674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610227851.9A CN105912674A (en) 2016-04-13 2016-04-13 Method, device and system for noise reduction and classification of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610227851.9A CN105912674A (en) 2016-04-13 2016-04-13 Method, device and system for noise reduction and classification of data

Publications (1)

Publication Number Publication Date
CN105912674A true CN105912674A (en) 2016-08-31

Family

ID=56746702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610227851.9A Pending CN105912674A (en) 2016-04-13 2016-04-13 Method, device and system for noise reduction and classification of data

Country Status (1)

Country Link
CN (1) CN105912674A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1573784A (en) * 2003-06-04 2005-02-02 微软公司 Origination/destination features and lists for spam prevention
CN1852268A (en) * 2005-10-19 2006-10-25 华为技术有限公司 Junk-mail preventing method and system
CN101178721A (en) * 2007-10-12 2008-05-14 北京拓尔思信息技术有限公司 Method for classifying and managing useful poser information in forum
CN101216839A (en) * 2008-01-17 2008-07-09 中兴通讯股份有限公司 Network data centralization method and apparatus
CN102737126A (en) * 2012-06-19 2012-10-17 合肥工业大学 Classification rule mining method under cloud computing environment
US20140149380A1 (en) * 2012-11-26 2014-05-29 Yahoo! Inc. Methods and apparatuses for document processing at distributed processing nodes
CN104333549A (en) * 2014-10-28 2015-02-04 福建师范大学 Data package filtering method applied to distributive firewall system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1573784A (en) * 2003-06-04 2005-02-02 微软公司 Origination/destination features and lists for spam prevention
CN1852268A (en) * 2005-10-19 2006-10-25 华为技术有限公司 Junk-mail preventing method and system
CN101178721A (en) * 2007-10-12 2008-05-14 北京拓尔思信息技术有限公司 Method for classifying and managing useful poser information in forum
CN101216839A (en) * 2008-01-17 2008-07-09 中兴通讯股份有限公司 Network data centralization method and apparatus
CN102737126A (en) * 2012-06-19 2012-10-17 合肥工业大学 Classification rule mining method under cloud computing environment
US20140149380A1 (en) * 2012-11-26 2014-05-29 Yahoo! Inc. Methods and apparatuses for document processing at distributed processing nodes
CN104333549A (en) * 2014-10-28 2015-02-04 福建师范大学 Data package filtering method applied to distributive firewall system

Similar Documents

Publication Publication Date Title
US9836524B2 (en) Internal linking co-convergence using clustering with hierarchy
JP5241370B2 (en) Table classification apparatus, table classification method, and table classification program
US20110270826A1 (en) Document analysis system
US10366154B2 (en) Information processing device, information processing method, and computer program product
Felix et al. The exploratory labeling assistant: Mixed-initiative label curation with large document collections
US20080065630A1 (en) Method and Apparatus for Assessing Similarity Between Online Job Listings
JPWO2008107997A1 (en) Form type identification program, form type identification method, and form type identification device
CN103744889B (en) A kind of method and apparatus for problem progress clustering processing
CN103430172A (en) Search apparatus, search method, and program
JP7103496B2 (en) Related score calculation system, method and program
EP3040876A1 (en) Information processing device, information processing method and program
CN105224663A (en) A kind of data-accessing tasks management method based on multiple data source and device
JP6898542B2 (en) Information processing device, its control method, and program
CN111190965A (en) Text data-based ad hoc relationship analysis system and method
CN111460257A (en) Thematic generation method and device, electronic equipment and storage medium
CN116226526A (en) Intellectual property intelligent retrieval platform and method
US20110029528A1 (en) Citation record extraction system and method, and program product
JP2011191834A (en) Method, device and program for classifying document
CN105912674A (en) Method, device and system for noise reduction and classification of data
JP6496078B2 (en) Analysis support device, analysis support method, and analysis support program
JPH11282874A (en) Information filtering method and device
JP2005141476A (en) Document management device, program and recording medium
JP2006078740A (en) Program, device, and method for problem generation
US20230394227A1 (en) Apparatus for generating draft document and method therefor
JP2020144674A (en) Document output system, document output method, and document output program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100041 Beijing, Shijingshan District Xing Xing street, building 30, room 3, building 9, room 9014

Applicant after: Jing Shuo Technology (Beijing) Limited by Share Ltd

Address before: 100010 Beijing city Dongcheng District bamboo rod alley No. 1 9 floor room 1007

Applicant before: JINGSHUO CENTURY TECHNOLOGY (BEIJING) CO., LTD.

CB02 Change of applicant information
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160831

WD01 Invention patent application deemed withdrawn after publication