CN101248433B - 具有签名产生及关联性检测的匹配引擎 - Google Patents
具有签名产生及关联性检测的匹配引擎 Download PDFInfo
- Publication number
- CN101248433B CN101248433B CN2006800227288A CN200680022728A CN101248433B CN 101248433 B CN101248433 B CN 101248433B CN 2006800227288 A CN2006800227288 A CN 2006800227288A CN 200680022728 A CN200680022728 A CN 200680022728A CN 101248433 B CN101248433 B CN 101248433B
- Authority
- CN
- China
- Prior art keywords
- document
- character
- token
- signature
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (12)
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US67931405P | 2005-05-09 | 2005-05-09 | |
US60/679,314 | 2005-05-09 | ||
US11/361,447 US7747642B2 (en) | 2005-05-09 | 2006-02-24 | Matching engine for querying relevant documents |
US11/361,447 | 2006-02-24 | ||
US11/361,340 | 2006-02-24 | ||
US11/361,340 US7516130B2 (en) | 2005-05-09 | 2006-02-24 | Matching engine with signature generation |
PCT/US2006/017846 WO2006122086A2 (en) | 2005-05-09 | 2006-05-08 | Matching engine with signature generation and relevance detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101248433A CN101248433A (zh) | 2008-08-20 |
CN101248433B true CN101248433B (zh) | 2010-09-01 |
Family
ID=37397221
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2006800227288A Active CN101248433B (zh) | 2005-05-09 | 2006-05-08 | 具有签名产生及关联性检测的匹配引擎 |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP5072832B2 (zh) |
CN (1) | CN101248433B (zh) |
WO (1) | WO2006122086A2 (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7516130B2 (en) * | 2005-05-09 | 2009-04-07 | Trend Micro, Inc. | Matching engine with signature generation |
US7860853B2 (en) * | 2007-02-14 | 2010-12-28 | Provilla, Inc. | Document matching engine using asymmetric signature generation |
JP5372853B2 (ja) | 2010-07-08 | 2013-12-18 | 株式会社日立製作所 | デジタルシーケンス特徴量算出方法及びデジタルシーケンス特徴量算出装置 |
JP5617674B2 (ja) * | 2011-02-14 | 2014-11-05 | 日本電気株式会社 | 文書間類似度算出装置、文書間類似度算出方法、及び、文書間類似度算出プログラム |
CN107798637A (zh) * | 2016-08-30 | 2018-03-13 | 北京国双科技有限公司 | 同案异判文书的获取方法及装置 |
CN112580108B (zh) * | 2020-12-10 | 2024-04-19 | 深圳证券信息有限公司 | 签名和***完整性验证方法及计算机设备 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5325091A (en) * | 1992-08-13 | 1994-06-28 | Xerox Corporation | Text-compression technique using frequency-ordered array of word-number mappers |
CN1369839A (zh) * | 2001-02-16 | 2002-09-18 | 意蓝科技股份有限公司 | 文件关联性判定***与方法 |
US6584470B2 (en) * | 2001-03-01 | 2003-06-24 | Intelliseek, Inc. | Multi-layered semiotic mechanism for answering natural language questions using document retrieval combined with information extraction |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2758826B2 (ja) * | 1994-03-02 | 1998-05-28 | 株式会社リコー | 文書検索装置 |
JPH09293079A (ja) * | 1996-04-18 | 1997-11-11 | Internatl Business Mach Corp <Ibm> | 情報検索方法、情報検索装置及び情報検索プログラムを格納する記憶媒体 |
EP0961210A1 (en) * | 1998-05-29 | 1999-12-01 | Xerox Corporation | Signature file based semantic caching of queries |
US6493709B1 (en) * | 1998-07-31 | 2002-12-10 | The Regents Of The University Of California | Method and apparatus for digitally shredding similar documents within large document sets in a data processing environment |
JP2002269116A (ja) * | 2001-03-13 | 2002-09-20 | Ricoh Co Ltd | 文書検索システム及びプログラム |
JP3719666B2 (ja) * | 2001-07-12 | 2005-11-24 | 松下電器産業株式会社 | 文書照合装置 |
US7139756B2 (en) * | 2002-01-22 | 2006-11-21 | International Business Machines Corporation | System and method for detecting duplicate and similar documents |
-
2006
- 2006-05-08 WO PCT/US2006/017846 patent/WO2006122086A2/en active Application Filing
- 2006-05-08 CN CN2006800227288A patent/CN101248433B/zh active Active
- 2006-05-08 JP JP2008511259A patent/JP5072832B2/ja active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5325091A (en) * | 1992-08-13 | 1994-06-28 | Xerox Corporation | Text-compression technique using frequency-ordered array of word-number mappers |
CN1369839A (zh) * | 2001-02-16 | 2002-09-18 | 意蓝科技股份有限公司 | 文件关联性判定***与方法 |
US6584470B2 (en) * | 2001-03-01 | 2003-06-24 | Intelliseek, Inc. | Multi-layered semiotic mechanism for answering natural language questions using document retrieval combined with information extraction |
Also Published As
Publication number | Publication date |
---|---|
WO2006122086A3 (en) | 2007-03-29 |
JP5072832B2 (ja) | 2012-11-14 |
JP2008541272A (ja) | 2008-11-20 |
CN101248433A (zh) | 2008-08-20 |
WO2006122086A2 (en) | 2006-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7747642B2 (en) | Matching engine for querying relevant documents | |
US7516130B2 (en) | Matching engine with signature generation | |
US7860853B2 (en) | Document matching engine using asymmetric signature generation | |
US7461056B2 (en) | Text mining apparatus and associated methods | |
US8781817B2 (en) | Phrase based document clustering with automatic phrase extraction | |
Treeratpituk et al. | Disambiguating authors in academic publications using random forests | |
US7424421B2 (en) | Word collection method and system for use in word-breaking | |
CN1728142B (zh) | 信息检索***中的短语识别方法和设备 | |
US20050021545A1 (en) | Very-large-scale automatic categorizer for Web content | |
US8266150B1 (en) | Scalable document signature search engine | |
CN101248433B (zh) | 具有签名产生及关联性检测的匹配引擎 | |
CN101933017B (zh) | 文件检索装置、文件检索***和文件检索方法 | |
JP4426041B2 (ja) | カテゴリ因子による情報検索方法 | |
Sohrabi et al. | Finding similar documents using frequent pattern mining methods | |
Orlando et al. | Seed: A framework for extracting social events from press news | |
Soualmia et al. | Matching health information seekers' queries to medical terms | |
Carmel et al. | Morphological disambiguation for Hebrew search systems | |
JPH1166086A (ja) | 類似文書検索装置および類似文書検索方法 | |
CN112700830B (zh) | 从电子病历中提取结构化信息的方法、装置及存储介质 | |
Patra et al. | A novel word clustering and cluster merging technique for named entity recognition | |
Alajmi et al. | DACS Dewey index-based Arabic Document Categorization System | |
Wei et al. | Improving database quality through eliminating duplicate records | |
Ling et al. | Mining generalized query patterns from web logs | |
Cisłak | Full-text and Keyword Indexes for String Searching | |
Tsay et al. | A scalable approach for Chinese term extraction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: TREND MICRO INCORPORATED (JAPAN) Free format text: FORMER OWNER: TREND MICRO INC. Effective date: 20120720 |
|
C41 | Transfer of patent application or patent right or utility model | ||
C56 | Change in the name or address of the patentee |
Owner name: TREND MICRO INC. Free format text: FORMER NAME: TREND MICRO SHANJING CORPORATION Owner name: TREND MICRO SHANJING CORPORATION Free format text: FORMER NAME: PROVILLA, INC. |
|
CP03 | Change of name, title or address |
Address after: California, USA Patentee after: TREND MICRO INCORPORATED Address before: Delaware Patentee before: Trend Micro mountain Co. Address after: Delaware Patentee after: Trend Micro mountain Co. Address before: California, USA Patentee before: Provilla, Inc. |
|
TR01 | Transfer of patent right |
Effective date of registration: 20120720 Address after: Tokyo, Japan Patentee after: Trend Polytron Technologies Inc. (Japan) Address before: California, USA Patentee before: TREND MICRO INCORPORATED |