WO2008144964A8 - Detecting name entities and new words - Google Patents

Detecting name entities and new words Download PDF

Info

Publication number
WO2008144964A8
WO2008144964A8 PCT/CN2007/001755 CN2007001755W WO2008144964A8 WO 2008144964 A8 WO2008144964 A8 WO 2008144964A8 CN 2007001755 W CN2007001755 W CN 2007001755W WO 2008144964 A8 WO2008144964 A8 WO 2008144964A8
Authority
WO
WIPO (PCT)
Prior art keywords
new words
text string
name entities
input entry
detecting name
Prior art date
Application number
PCT/CN2007/001755
Other languages
French (fr)
Other versions
WO2008144964A1 (en
Inventor
Jun Wu
Zheng Huang
Xin Zheng
Dekang Lin
Hangjun Ye
Yingyu Wan
Po Zhang
Original Assignee
Google Inc
Jun Wu
Zheng Huang
Xin Zheng
Dekang Lin
Hangjun Ye
Yingyu Wan
Po Zhang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Inc, Jun Wu, Zheng Huang, Xin Zheng, Dekang Lin, Hangjun Ye, Yingyu Wan, Po Zhang filed Critical Google Inc
Priority to CN200780100123A priority Critical patent/CN101815996A/en
Priority to US12/602,646 priority patent/US20100180199A1/en
Priority to KR1020097027483A priority patent/KR20100029221A/en
Priority to PCT/CN2007/001755 priority patent/WO2008144964A1/en
Priority to TW097139051A priority patent/TW201015348A/en
Publication of WO2008144964A1 publication Critical patent/WO2008144964A1/en
Publication of WO2008144964A8 publication Critical patent/WO2008144964A8/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

Various aspects can be implemented for detecting name entities and/or new words from input entries. In general, one aspect can be a method that includes receiving an input entry comprising a text string. The method also includes identifying segmentation information from the input entry. The method further includes generating a candidate text string from the text string of the input entry based on the segmentation information. Other implementations of this aspect includes corresponding systems, apparatus, and processing engines.
PCT/CN2007/001755 2007-06-01 2007-06-01 Detecting name entities and new words WO2008144964A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN200780100123A CN101815996A (en) 2007-06-01 2007-06-01 Detect name entities and neologisms
US12/602,646 US20100180199A1 (en) 2007-06-01 2007-06-01 Detecting name entities and new words
KR1020097027483A KR20100029221A (en) 2007-06-01 2007-06-01 Detecting name entities and new words
PCT/CN2007/001755 WO2008144964A1 (en) 2007-06-01 2007-06-01 Detecting name entities and new words
TW097139051A TW201015348A (en) 2007-06-01 2008-10-09 Detecting name entities and new words

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2007/001755 WO2008144964A1 (en) 2007-06-01 2007-06-01 Detecting name entities and new words

Publications (2)

Publication Number Publication Date
WO2008144964A1 WO2008144964A1 (en) 2008-12-04
WO2008144964A8 true WO2008144964A8 (en) 2009-02-12

Family

ID=40074547

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/001755 WO2008144964A1 (en) 2007-06-01 2007-06-01 Detecting name entities and new words

Country Status (5)

Country Link
US (1) US20100180199A1 (en)
KR (1) KR20100029221A (en)
CN (1) CN101815996A (en)
TW (1) TW201015348A (en)
WO (1) WO2008144964A1 (en)

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7917355B2 (en) * 2007-08-23 2011-03-29 Google Inc. Word detection
US7983902B2 (en) * 2007-08-23 2011-07-19 Google Inc. Domain dictionary creation by detection of new topic words using divergence value comparison
US8091023B2 (en) * 2007-09-28 2012-01-03 Research In Motion Limited Handheld electronic device and associated method enabling spell checking in a text disambiguation environment
JP5379155B2 (en) * 2007-12-06 2013-12-25 グーグル・インコーポレーテッド CJK name detection
US8214346B2 (en) * 2008-06-27 2012-07-03 Cbs Interactive Inc. Personalization engine for classifying unstructured documents
US9009591B2 (en) * 2008-12-11 2015-04-14 Microsoft Corporation User-specified phrase input learning
CN101901235B (en) * 2009-05-27 2013-03-27 国际商业机器公司 Method and system for document processing
KR101638442B1 (en) * 2009-11-24 2016-07-12 한국전자통신연구원 Method and apparatus for segmenting chinese sentence
US20110184723A1 (en) * 2010-01-25 2011-07-28 Microsoft Corporation Phonetic suggestion engine
US8402032B1 (en) 2010-03-25 2013-03-19 Google Inc. Generating context-based spell corrections of entity names
CN102411563B (en) * 2010-09-26 2015-06-17 阿里巴巴集团控股有限公司 Method, device and system for identifying target words
US8438011B2 (en) 2010-11-30 2013-05-07 Microsoft Corporation Suggesting spelling corrections for personal names
CN102682763B (en) * 2011-03-10 2014-07-16 北京三星通信技术研究有限公司 Method, device and terminal for correcting named entity vocabularies in voice input text
US8630989B2 (en) 2011-05-27 2014-01-14 International Business Machines Corporation Systems and methods for information extraction using contextual pattern discovery
US10176168B2 (en) * 2011-11-15 2019-01-08 Microsoft Technology Licensing, Llc Statistical machine translation based search query spelling correction
US9348479B2 (en) 2011-12-08 2016-05-24 Microsoft Technology Licensing, Llc Sentiment aware user interface customization
US9378290B2 (en) * 2011-12-20 2016-06-28 Microsoft Technology Licensing, Llc Scenario-adaptive input method editor
CN104428734A (en) 2012-06-25 2015-03-18 微软公司 Input method editor application platform
US8959109B2 (en) 2012-08-06 2015-02-17 Microsoft Corporation Business intelligent in-document suggestions
EP2891078A4 (en) 2012-08-30 2016-03-23 Microsoft Technology Licensing Llc Feature-based candidate selection
CN103678336B (en) * 2012-09-05 2017-04-12 阿里巴巴集团控股有限公司 Method and device for identifying entity words
CN102929862B (en) * 2012-11-06 2015-06-10 深圳市宜搜科技发展有限公司 New word acquiring method and system
CN103870449B (en) * 2012-12-10 2018-06-12 百度国际科技(深圳)有限公司 The on-line automatic method and electronic device for excavating neologisms
US9031829B2 (en) 2013-02-08 2015-05-12 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US8996355B2 (en) 2013-02-08 2015-03-31 Machine Zone, Inc. Systems and methods for reviewing histories of text messages from multi-user multi-lingual communications
US10650103B2 (en) 2013-02-08 2020-05-12 Mz Ip Holdings, Llc Systems and methods for incentivizing user feedback for translation processing
US9600473B2 (en) 2013-02-08 2017-03-21 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US8996352B2 (en) 2013-02-08 2015-03-31 Machine Zone, Inc. Systems and methods for correcting translations in multi-user multi-lingual communications
US8990068B2 (en) 2013-02-08 2015-03-24 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US8996353B2 (en) * 2013-02-08 2015-03-31 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
US9298703B2 (en) 2013-02-08 2016-03-29 Machine Zone, Inc. Systems and methods for incentivizing user feedback for translation processing
US9231898B2 (en) 2013-02-08 2016-01-05 Machine Zone, Inc. Systems and methods for multi-user multi-lingual communications
EP3030982A4 (en) 2013-08-09 2016-08-03 Microsoft Technology Licensing Llc Input method editor providing language assistance
US20150317393A1 (en) * 2014-04-30 2015-11-05 Cerner Innovation, Inc. Patient search with common name data store
US9372848B2 (en) 2014-10-17 2016-06-21 Machine Zone, Inc. Systems and methods for language detection
US10162811B2 (en) 2014-10-17 2018-12-25 Mz Ip Holdings, Llc Systems and methods for language detection
US10765956B2 (en) 2016-01-07 2020-09-08 Machine Zone Inc. Named entity recognition on chat data
JP6897168B2 (en) * 2017-03-06 2021-06-30 富士フイルムビジネスイノベーション株式会社 Information processing equipment and information processing programs
US11586810B2 (en) * 2017-06-26 2023-02-21 Microsoft Technology Licensing, Llc Generating responses in automated chatting
US10769387B2 (en) 2017-09-21 2020-09-08 Mz Ip Holdings, Llc System and method for translating chat messages
CN111353308A (en) * 2018-12-20 2020-06-30 北京深知无限人工智能研究院有限公司 Named entity recognition method, device, server and storage medium
US11042580B2 (en) * 2018-12-30 2021-06-22 Paypal, Inc. Identifying false positives between matched words
JP7139271B2 (en) * 2019-03-20 2022-09-20 ヤフー株式会社 Information processing device, information processing method, and program
US20220261092A1 (en) * 2019-05-24 2022-08-18 Krishnamoorthy VENKATESA Method and device for inputting text on a keyboard
US11393455B2 (en) 2020-02-28 2022-07-19 Rovi Guides, Inc. Methods for natural language model training in natural language understanding (NLU) systems
US11574127B2 (en) 2020-02-28 2023-02-07 Rovi Guides, Inc. Methods for natural language model training in natural language understanding (NLU) systems
US11392771B2 (en) 2020-02-28 2022-07-19 Rovi Guides, Inc. Methods for natural language model training in natural language understanding (NLU) systems
US11626103B2 (en) 2020-02-28 2023-04-11 Rovi Guides, Inc. Methods for natural language model training in natural language understanding (NLU) systems
CN112861534B (en) * 2021-01-18 2023-07-21 北京奇艺世纪科技有限公司 Object name recognition method and device

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5893133A (en) * 1995-08-16 1999-04-06 International Business Machines Corporation Keyboard for a system and method for processing Chinese language text
US5832478A (en) * 1997-03-13 1998-11-03 The United States Of America As Represented By The National Security Agency Method of searching an on-line dictionary using syllables and syllable count
US6640006B2 (en) * 1998-02-13 2003-10-28 Microsoft Corporation Word segmentation in chinese text
CN1143232C (en) * 1998-11-30 2004-03-24 皇家菲利浦电子有限公司 Automatic segmentation of text
JP2001043221A (en) * 1999-07-29 2001-02-16 Matsushita Electric Ind Co Ltd Chinese word dividing device
CN1226717C (en) * 2000-08-30 2005-11-09 国际商业机器公司 Automatic new term fetch method and system
US7076731B2 (en) * 2001-06-02 2006-07-11 Microsoft Corporation Spelling correction system and method for phrasal strings using dictionary looping
US7136805B2 (en) * 2002-06-11 2006-11-14 Fuji Xerox Co., Ltd. System for distinguishing names of organizations in Asian writing systems
CN100555276C (en) * 2004-01-15 2009-10-28 中国科学院计算技术研究所 A kind of detection method of Chinese new words and detection system thereof
US7424421B2 (en) * 2004-03-03 2008-09-09 Microsoft Corporation Word collection method and system for use in word-breaking
US20080077570A1 (en) * 2004-10-25 2008-03-27 Infovell, Inc. Full Text Query and Search Systems and Method of Use
US20070067157A1 (en) * 2005-09-22 2007-03-22 International Business Machines Corporation System and method for automatically extracting interesting phrases in a large dynamic corpus
CN100405371C (en) * 2006-07-25 2008-07-23 北京搜狗科技发展有限公司 Method and system for abstracting new word

Also Published As

Publication number Publication date
CN101815996A (en) 2010-08-25
US20100180199A1 (en) 2010-07-15
WO2008144964A1 (en) 2008-12-04
TW201015348A (en) 2010-04-16
KR20100029221A (en) 2010-03-16

Similar Documents

Publication Publication Date Title
WO2008144964A8 (en) Detecting name entities and new words
WO2009026193A3 (en) System and method for search
WO2008011142A3 (en) Method and apparatus for providing search capability and targeted advertising for audio, image, and video content over the internet
WO2007143223A3 (en) System and method for entity based information categorization
WO2009111721A3 (en) Voice recognition grammar selection based on context
WO2008057474A3 (en) Methods and systems for analyzing data in media material having a layout
TW200709635A (en) Method and apparatus for certificate roll-over
WO2007106806A3 (en) Methods and apparatus for using radar to monitor audiences in media environments
GB2465094A (en) Method and system for data context service
WO2008107305A3 (en) Search-based word segmentation method and device for language without word boundary tag
WO2007115079A3 (en) Expanded snippets
EP2284731A3 (en) Personalized search engine based on special keyword placement
WO2006039398A8 (en) Methods and systems for selecting a language for text segmentation
MY141679A (en) Method for facilitating shale shaker operation
WO2006125138A3 (en) Searching a database including prioritizing results based on historical data
WO2007146876A3 (en) Methods and apparatus to meter content exposure using closed caption information
WO2005006283A3 (en) Rendering advertisements with documents having one or more topics using user topic interest information
WO2008051750A3 (en) Associating geographic-related information with objects
WO2007149341A3 (en) System to associate a demographic to a user of an electronic system
WO2009026189A3 (en) Methods and apparatus for providing location data with variable validity and quality
EP1895460A3 (en) Methods and apparatus for managing RFID and other data
WO2008118568A3 (en) In-line high-throughput contraband detection system
WO2008051783A3 (en) Context-free grammar
WO2008046063A3 (en) Methods and apparatuses for searching and categorizing messages within a network system
WO2008030510A3 (en) System and method for weighted search and advertisement placement

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780100123.0

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07721328

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 12602646

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20097027483

Country of ref document: KR

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 07721328

Country of ref document: EP

Kind code of ref document: A1