CN109062898A - 特征词去重方法、装置、设备及其存储介质 - Google Patents
特征词去重方法、装置、设备及其存储介质 Download PDFInfo
- Publication number
- CN109062898A CN109062898A CN201810852217.3A CN201810852217A CN109062898A CN 109062898 A CN109062898 A CN 109062898A CN 201810852217 A CN201810852217 A CN 201810852217A CN 109062898 A CN109062898 A CN 109062898A
- Authority
- CN
- China
- Prior art keywords
- phrase
- word
- value
- feature
- words
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012545 processing Methods 0.000 claims description 23
- 230000011218 segmentation Effects 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 abstract description 14
- 238000000605 extraction Methods 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 239000000284 extract Substances 0.000 description 12
- 230000006870 function Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 230000006854 communication Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 241000208340 Araliaceae Species 0.000 description 2
- 101100042793 Gallus gallus SMC2 gene Proteins 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (14)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810852217.3A CN109062898A (zh) | 2018-07-27 | 2018-07-27 | 特征词去重方法、装置、设备及其存储介质 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810852217.3A CN109062898A (zh) | 2018-07-27 | 2018-07-27 | 特征词去重方法、装置、设备及其存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109062898A true CN109062898A (zh) | 2018-12-21 |
Family
ID=64831434
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810852217.3A Pending CN109062898A (zh) | 2018-07-27 | 2018-07-27 | 特征词去重方法、装置、设备及其存储介质 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109062898A (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411568A (zh) * | 2010-09-20 | 2012-04-11 | 苏州同程旅游网络科技有限公司 | 基于旅游业特征词库的中文分词方法 |
US20160188554A1 (en) * | 2014-12-30 | 2016-06-30 | Chengnan Liu | Method for generating random content for an article |
CN106528508A (zh) * | 2016-10-27 | 2017-03-22 | 乐视控股(北京)有限公司 | 一种重复文本的判定方法和装置 |
CN108132930A (zh) * | 2017-12-27 | 2018-06-08 | 曙光信息产业(北京)有限公司 | 特征词提取方法及装置 |
CN108304384A (zh) * | 2018-01-29 | 2018-07-20 | 上海名轩软件科技有限公司 | 拆词方法及设备 |
-
2018
- 2018-07-27 CN CN201810852217.3A patent/CN109062898A/zh active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102411568A (zh) * | 2010-09-20 | 2012-04-11 | 苏州同程旅游网络科技有限公司 | 基于旅游业特征词库的中文分词方法 |
US20160188554A1 (en) * | 2014-12-30 | 2016-06-30 | Chengnan Liu | Method for generating random content for an article |
CN106528508A (zh) * | 2016-10-27 | 2017-03-22 | 乐视控股(北京)有限公司 | 一种重复文本的判定方法和装置 |
CN108132930A (zh) * | 2017-12-27 | 2018-06-08 | 曙光信息产业(北京)有限公司 | 特征词提取方法及装置 |
CN108304384A (zh) * | 2018-01-29 | 2018-07-20 | 上海名轩软件科技有限公司 | 拆词方法及设备 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107798136B (zh) | 基于深度学习的实体关系抽取方法、装置及服务器 | |
CN110929038B (zh) | 基于知识图谱的实体链接方法、装置、设备和存储介质 | |
US10360294B2 (en) | Methods and systems for efficient and accurate text extraction from unstructured documents | |
CN111767716B (zh) | 企业多级行业信息的确定方法、装置及计算机设备 | |
US20150100308A1 (en) | Automated Formation of Specialized Dictionaries | |
CN110377886A (zh) | 项目查重方法、装置、设备及存储介质 | |
CN111241389A (zh) | 基于矩阵的敏感词过滤方法、装置、电子设备、存储介质 | |
CN110032650B (zh) | 一种训练样本数据的生成方法、装置及电子设备 | |
Lepage | Analogies between binary images: Application to chinese characters | |
CN107220307A (zh) | 网页搜索方法和装置 | |
CN110147425A (zh) | 一种关键词提取方法、装置、计算机设备及存储介质 | |
CN110020312A (zh) | 提取网页正文的方法和装置 | |
Hussein | Visualizing document similarity using n-grams and latent semantic analysis | |
CN113722472B (zh) | 一种技术文献信息提取方法、***及存储介质 | |
CN115017315A (zh) | 一种前沿主题识别方法、***及计算机设备 | |
JP5869948B2 (ja) | パッセージ分割方法、装置、及びプログラム | |
CN113449063B (zh) | 一种构建文档结构信息检索库的方法及装置 | |
CN109062898A (zh) | 特征词去重方法、装置、设备及其存储介质 | |
Rofiq | Indonesian news extractive text summarization using latent semantic analysis | |
KR20070118154A (ko) | 정보 처리 장치 및 방법, 및 프로그램 기록 매체 | |
CN113468339A (zh) | 基于知识图谱的标签提取方法、***、电子设备及介质 | |
KR100659370B1 (ko) | 시소러스 매칭에 의한 문서 db 형성 방법 및 정보검색방법 | |
Balaji et al. | Finding related research papers using semantic and co-citation proximity analysis | |
Büchler et al. | Scaling historical text re-use | |
CN112395429A (zh) | 基于图神经网络的hs编码判定、推送、应用方法、***及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201221 Address after: No.31 Yanqi street, Yanqi Economic Development Zone, Huairou District, Beijing Applicant after: Beijing Huihong Technology Co.,Ltd. Address before: Room 107, building 2, Olympic Village street, Chaoyang District, Beijing Applicant before: HANERGY MOBILE ENERGY HOLDING GROUP Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211109 Address after: No.31 Yanqi street, Yanqi Economic Development Zone, Huairou District, Beijing Applicant after: Dongjun new energy Co.,Ltd. Address before: No.31 Yanqi street, Yanqi Economic Development Zone, Huairou District, Beijing Applicant before: Beijing Huihong Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181221 |
|
RJ01 | Rejection of invention patent application after publication |