CN103049568A - Method for classifying documents in mass document library - Google Patents
Method for classifying documents in mass document library Download PDFInfo
- Publication number
- CN103049568A CN103049568A CN2012105930968A CN201210593096A CN103049568A CN 103049568 A CN103049568 A CN 103049568A CN 2012105930968 A CN2012105930968 A CN 2012105930968A CN 201210593096 A CN201210593096 A CN 201210593096A CN 103049568 A CN103049568 A CN 103049568A
- Authority
- CN
- China
- Prior art keywords
- document
- keyword
- term
- word
- category
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000008878 coupling Effects 0.000 claims description 10
- 238000010168 coupling process Methods 0.000 claims description 10
- 238000005859 coupling reaction Methods 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 3
- 230000003247 decreasing effect Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000012141 concentrate Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210593096.8A CN103049568B (en) | 2012-12-31 | 2012-12-31 | The method of the document classification to magnanimity document library |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210593096.8A CN103049568B (en) | 2012-12-31 | 2012-12-31 | The method of the document classification to magnanimity document library |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103049568A true CN103049568A (en) | 2013-04-17 |
CN103049568B CN103049568B (en) | 2016-05-18 |
Family
ID=48062208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210593096.8A Active CN103049568B (en) | 2012-12-31 | 2012-12-31 | The method of the document classification to magnanimity document library |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103049568B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714051A (en) * | 2013-12-30 | 2014-04-09 | 传神联合(北京)信息技术有限公司 | Pre-processing method of documents to be translated |
CN103729350A (en) * | 2013-12-30 | 2014-04-16 | 武汉传神信息技术有限公司 | Multi-dimension preprocessing method for files to be translated |
CN103729344A (en) * | 2013-12-30 | 2014-04-16 | 传神联合(北京)信息技术有限公司 | Method for labeling statements in document manuscript |
CN103955449A (en) * | 2014-04-21 | 2014-07-30 | 安一恒通(北京)科技有限公司 | Target sample positioning method and device |
CN104615772A (en) * | 2015-02-16 | 2015-05-13 | 重庆大学 | Text evaluation data specialization level analyzing method for electronic commerce |
CN104679733A (en) * | 2013-11-26 | 2015-06-03 | ***通信集团公司 | Voice conversation translation method, device and system |
CN104778371A (en) * | 2015-04-21 | 2015-07-15 | 天脉聚源(北京)传媒科技有限公司 | Method and device for evaluating document content speciality |
CN106484788A (en) * | 2016-09-19 | 2017-03-08 | 合肥清浊信息科技有限公司 | Patent search system based on industry keyword |
WO2017117781A1 (en) * | 2016-01-07 | 2017-07-13 | 马岩 | Network information classification method and system |
CN107798074A (en) * | 2017-09-29 | 2018-03-13 | 汤东澜 | Information processing method and server |
CN107992633A (en) * | 2018-01-09 | 2018-05-04 | 国网福建省电力有限公司 | Electronic document automatic classification method and system based on keyword feature |
CN108182182A (en) * | 2017-12-27 | 2018-06-19 | 传神语联网网络科技股份有限公司 | Document matching process, device and computer readable storage medium in translation database |
CN108572942A (en) * | 2018-04-20 | 2018-09-25 | 北京深度智耀科技有限公司 | A kind of method and apparatus creating hyperlink |
CN109543023A (en) * | 2018-09-29 | 2019-03-29 | 中国石油化工股份有限公司石油勘探开发研究院 | Document classification method and system based on trie and LCS algorithm |
CN109871433A (en) * | 2019-02-21 | 2019-06-11 | 北京奇艺世纪科技有限公司 | Calculation method, device, equipment and the medium of document and the topic degree of correlation |
CN111552766A (en) * | 2019-02-11 | 2020-08-18 | 国际商业机器公司 | Characterizing references applied on reference graphs using machine learning |
CN111782601A (en) * | 2020-06-08 | 2020-10-16 | 北京海泰方圆科技股份有限公司 | Electronic file processing method and device, electronic equipment and machine readable medium |
CN112015884A (en) * | 2020-08-28 | 2020-12-01 | 欧冶云商股份有限公司 | Method and device for extracting keywords of user visiting data and storage medium |
WO2021139466A1 (en) * | 2020-01-06 | 2021-07-15 | 北京大米科技有限公司 | Topic word determination method for text, device, storage medium, and terminal |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040139059A1 (en) * | 2002-12-31 | 2004-07-15 | Conroy William F. | Method for automatic deduction of rules for matching content to categories |
CN101593200A (en) * | 2009-06-19 | 2009-12-02 | 淮海工学院 | Chinese Web page classification method based on the keyword frequency analysis |
-
2012
- 2012-12-31 CN CN201210593096.8A patent/CN103049568B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040139059A1 (en) * | 2002-12-31 | 2004-07-15 | Conroy William F. | Method for automatic deduction of rules for matching content to categories |
CN101593200A (en) * | 2009-06-19 | 2009-12-02 | 淮海工学院 | Chinese Web page classification method based on the keyword frequency analysis |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104679733A (en) * | 2013-11-26 | 2015-06-03 | ***通信集团公司 | Voice conversation translation method, device and system |
CN104679733B (en) * | 2013-11-26 | 2018-02-23 | ***通信集团公司 | A kind of voice dialogue interpretation method, apparatus and system |
CN103729350A (en) * | 2013-12-30 | 2014-04-16 | 武汉传神信息技术有限公司 | Multi-dimension preprocessing method for files to be translated |
CN103729344A (en) * | 2013-12-30 | 2014-04-16 | 传神联合(北京)信息技术有限公司 | Method for labeling statements in document manuscript |
CN103714051A (en) * | 2013-12-30 | 2014-04-09 | 传神联合(北京)信息技术有限公司 | Pre-processing method of documents to be translated |
CN103714051B (en) * | 2013-12-30 | 2016-05-18 | 传神联合(北京)信息技术有限公司 | A kind of preprocess method of waiting for translating shelves |
CN103729344B (en) * | 2013-12-30 | 2016-08-31 | 传神联合(北京)信息技术有限公司 | A kind of method of statement mark in document manuscript |
CN103729350B (en) * | 2013-12-30 | 2017-01-04 | 语联网(武汉)信息技术有限公司 | The preprocess method of various dimensions waiting for translating shelves |
CN103955449A (en) * | 2014-04-21 | 2014-07-30 | 安一恒通(北京)科技有限公司 | Target sample positioning method and device |
CN104615772B (en) * | 2015-02-16 | 2017-11-03 | 重庆大学 | A kind of professional degree analyzing method of text evaluating data for ecommerce |
CN104615772A (en) * | 2015-02-16 | 2015-05-13 | 重庆大学 | Text evaluation data specialization level analyzing method for electronic commerce |
CN104778371A (en) * | 2015-04-21 | 2015-07-15 | 天脉聚源(北京)传媒科技有限公司 | Method and device for evaluating document content speciality |
WO2017117781A1 (en) * | 2016-01-07 | 2017-07-13 | 马岩 | Network information classification method and system |
CN106484788A (en) * | 2016-09-19 | 2017-03-08 | 合肥清浊信息科技有限公司 | Patent search system based on industry keyword |
CN107798074A (en) * | 2017-09-29 | 2018-03-13 | 汤东澜 | Information processing method and server |
CN108182182A (en) * | 2017-12-27 | 2018-06-19 | 传神语联网网络科技股份有限公司 | Document matching process, device and computer readable storage medium in translation database |
CN107992633B (en) * | 2018-01-09 | 2021-07-27 | 国网福建省电力有限公司 | Automatic electronic document classification method and system based on keyword features |
CN107992633A (en) * | 2018-01-09 | 2018-05-04 | 国网福建省电力有限公司 | Electronic document automatic classification method and system based on keyword feature |
CN108572942A (en) * | 2018-04-20 | 2018-09-25 | 北京深度智耀科技有限公司 | A kind of method and apparatus creating hyperlink |
CN109543023A (en) * | 2018-09-29 | 2019-03-29 | 中国石油化工股份有限公司石油勘探开发研究院 | Document classification method and system based on trie and LCS algorithm |
CN111552766A (en) * | 2019-02-11 | 2020-08-18 | 国际商业机器公司 | Characterizing references applied on reference graphs using machine learning |
CN111552766B (en) * | 2019-02-11 | 2024-03-01 | 国际商业机器公司 | Using machine learning to characterize reference relationships applied on reference graphs |
CN109871433B (en) * | 2019-02-21 | 2021-07-23 | 北京奇艺世纪科技有限公司 | Method, device, equipment and medium for calculating relevance between document and topic |
CN109871433A (en) * | 2019-02-21 | 2019-06-11 | 北京奇艺世纪科技有限公司 | Calculation method, device, equipment and the medium of document and the topic degree of correlation |
WO2021139466A1 (en) * | 2020-01-06 | 2021-07-15 | 北京大米科技有限公司 | Topic word determination method for text, device, storage medium, and terminal |
CN111782601A (en) * | 2020-06-08 | 2020-10-16 | 北京海泰方圆科技股份有限公司 | Electronic file processing method and device, electronic equipment and machine readable medium |
CN112015884A (en) * | 2020-08-28 | 2020-12-01 | 欧冶云商股份有限公司 | Method and device for extracting keywords of user visiting data and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN103049568B (en) | 2016-05-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103049568A (en) | Method for classifying documents in mass document library | |
CN102129451B (en) | Method for clustering data in image retrieval system | |
CN109885773B (en) | Personalized article recommendation method, system, medium and equipment | |
CN101593200B (en) | Method for classifying Chinese webpages based on keyword frequency analysis | |
CN103258000B (en) | Method and device for clustering high-frequency keywords in webpages | |
CN107862070B (en) | Online classroom discussion short text instant grouping method and system based on text clustering | |
Yang et al. | Discovering topic representative terms for short text clustering | |
CN103823838B (en) | A kind of method of multi-format document typing and comparison | |
CN105022827A (en) | Field subject-oriented Web news dynamic aggregation method | |
CN104915447A (en) | Method and device for tracing hot topics and confirming keywords | |
CN103838756A (en) | Method and device for determining pushed information | |
Xie et al. | Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb | |
CN108647322B (en) | Method for identifying similarity of mass Web text information based on word network | |
CN102750379B (en) | Fast character string matching method based on filtering type | |
CN107844493B (en) | File association method and system | |
CN103106245A (en) | Method which is used for classifying translation manuscript in automatic fragmentation mode and based on large-scale term corpus | |
Konow et al. | Faster and smaller inverted indices with treaps | |
Culpepper et al. | Efficient in-memory top-k document retrieval | |
CN102314497A (en) | Method and equipment for identifying body contents of markup language files | |
CN104408033A (en) | Text message extracting method and system | |
CN107784110A (en) | A kind of index establishing method and device | |
CN115563313A (en) | Knowledge graph-based document book semantic retrieval system | |
CN109885641B (en) | Method and system for searching Chinese full text in database | |
CN103778206A (en) | Method for providing network service resources | |
CN103646029A (en) | Similarity calculation method for blog articles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information |
Inventor after: Jiang Chao Inventor after: Zhang Pi Inventor before: Jiang Chao |
|
COR | Change of bibliographic data | ||
C56 | Change in the name or address of the patentee | ||
CP03 | Change of name, title or address |
Address after: 430070 East Lake Hubei Development Zone, Optics Valley Software Park, a phase of the west, South Lake Road South, Optics Valley Software Park, No. 2, No. 5, layer 205, six Patentee after: Language network (Wuhan) Information Technology Co., Ltd. Address before: 430073 East Lake Hubei Development Zone, Optics Valley Software Park, a phase of the west, South Lake Road South, Optics Valley Software Park, No. 2, No. 5, layer 205, six Patentee before: Wuhan Transn Information Technology Co., Ltd. |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Method for classifying documents in mass document library Effective date of registration: 20181115 Granted publication date: 20160518 Pledgee: Bank of Communications Co., Ltd. Wuhan Branch of Hubei Free Trade Experimental Zone Pledgor: Language network (Wuhan) Information Technology Co., Ltd. Registration number: 2018420000061 |
|
PC01 | Cancellation of the registration of the contract for pledge of patent right | ||
PC01 | Cancellation of the registration of the contract for pledge of patent right |
Date of cancellation: 20200617 Granted publication date: 20160518 Pledgee: Bank of Communications Co.,Ltd. Wuhan Branch of Hubei Free Trade Experimental Zone Pledgor: IOL (WUHAN) INFORMATION TECHNOLOGY Co.,Ltd. Registration number: 2018420000061 |