WO2015050321A8 - 자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법 - Google Patents

자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법 Download PDF

Info

Publication number
WO2015050321A8
WO2015050321A8 PCT/KR2014/007959 KR2014007959W WO2015050321A8 WO 2015050321 A8 WO2015050321 A8 WO 2015050321A8 KR 2014007959 W KR2014007959 W KR 2014007959W WO 2015050321 A8 WO2015050321 A8 WO 2015050321A8
Authority
WO
WIPO (PCT)
Prior art keywords
morpheme
alignment
expressions
method therefor
corpus
Prior art date
Application number
PCT/KR2014/007959
Other languages
English (en)
French (fr)
Other versions
WO2015050321A1 (ko
Inventor
지창진
Original Assignee
주식회사 시스트란인터내셔널
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 시스트란인터내셔널 filed Critical 주식회사 시스트란인터내셔널
Priority to US15/026,275 priority Critical patent/US10282413B2/en
Priority to JP2016546716A priority patent/JP6532088B2/ja
Priority to CN201480054951.5A priority patent/CN105593845B/zh
Publication of WO2015050321A1 publication Critical patent/WO2015050321A1/ko
Publication of WO2015050321A8 publication Critical patent/WO2015050321A8/ko

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법이 개시된다. 형태소 분석 장치는 지식 데이터 베이스와 분석기를 포함한다. 지식 데이터베이스는 언어별 형태소 분석에 사용되는 다수의 지식 정보를 저장하되, 정상 표현에 대응되는 형태소 정보를 저장하는 형태소 사전과 파괴 표현-여기서 파괴 표현은 맞춤법으로 틀렸거나 정규화 및 표준화되지 않은 표현임-에 대응되는 정상 표현 정보를 저장하는 정렬 코퍼스를 포함한다. 분석기는 입력되는 어절에 대해 상기 지식 데이터베이스를 사용하여 형태소 분석을 수행하여 분석 결과를 출력하되, 입력 어절에 대한 형태소가 상기 형태소 사전에 없는 경우, 상기 입력 어절에 포함된 파괴 표현에 대해 상기 정렬 코퍼스를 사용하여 상기 파괴 표현에 대응되는 정상 표현을 찾아서 형태소 분석을 수행한다.
PCT/KR2014/007959 2013-10-02 2014-08-27 자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법 WO2015050321A1 (ko)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US15/026,275 US10282413B2 (en) 2013-10-02 2014-08-27 Device for generating aligned corpus based on unsupervised-learning alignment, method thereof, device for analyzing destructive expression morpheme using aligned corpus, and method for analyzing morpheme thereof
JP2016546716A JP6532088B2 (ja) 2013-10-02 2014-08-27 自律学習整列ベースの整列コーパス生成装置およびその方法と、整列コーパスを用いた破壊表現の形態素分析装置およびその形態素分析方法
CN201480054951.5A CN105593845B (zh) 2013-10-02 2014-08-27 基于自学排列的排列语料库的生成装置及其方法、使用排列语料库的破坏性表达语素分析装置及其语素分析方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR20130118062A KR101509727B1 (ko) 2013-10-02 2013-10-02 자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법
KR10-2013-0118062 2013-10-02

Publications (2)

Publication Number Publication Date
WO2015050321A1 WO2015050321A1 (ko) 2015-04-09
WO2015050321A8 true WO2015050321A8 (ko) 2015-05-14

Family

ID=52778882

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2014/007959 WO2015050321A1 (ko) 2013-10-02 2014-08-27 자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법

Country Status (5)

Country Link
US (1) US10282413B2 (ko)
JP (1) JP6532088B2 (ko)
KR (1) KR101509727B1 (ko)
CN (1) CN105593845B (ko)
WO (1) WO2015050321A1 (ko)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6466138B2 (ja) * 2014-11-04 2019-02-06 株式会社東芝 外国語文作成支援装置、方法及びプログラム
KR101702055B1 (ko) 2015-06-23 2017-02-13 (주)아크릴 딥-러닝 기반 형태소 분석 장치와 형태소 분석 애플리케이션의 작동 방법
KR101839121B1 (ko) * 2015-09-14 2018-04-26 네이버 주식회사 사용자 질의 교정 시스템 및 방법
CN108205757B (zh) * 2016-12-19 2022-05-27 创新先进技术有限公司 电子支付业务合法性的校验方法和装置
US10635862B2 (en) * 2017-12-21 2020-04-28 City University Of Hong Kong Method of facilitating natural language interactions, a method of simplifying an expression and a system thereof
CN109815476B (zh) * 2018-12-03 2023-03-24 国网浙江省电力有限公司杭州供电公司 一种基于中文语素和拼音联合统计的词向量表示方法
KR102199835B1 (ko) * 2018-12-31 2021-01-07 주식회사 엘솔루 언어 교정 시스템 및 그 방법과, 그 시스템에서의 언어 교정 모델 학습 방법
KR102352163B1 (ko) 2019-11-26 2022-01-19 고려대학교 산학협력단 뇌파 측정 기술을 이용하여 언어 능숙도를 진단하는 방법
CN113343719B (zh) * 2021-06-21 2023-03-14 哈尔滨工业大学 利用不同词嵌入模型进行协同训练的无监督双语翻译词典获取方法

Family Cites Families (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5477448A (en) * 1994-06-01 1995-12-19 Mitsubishi Electric Research Laboratories, Inc. System for correcting improper determiners
US6708311B1 (en) * 1999-06-17 2004-03-16 International Business Machines Corporation Method and apparatus for creating a glossary of terms
US7010479B2 (en) * 2000-07-26 2006-03-07 Oki Electric Industry Co., Ltd. Apparatus and method for natural language processing
GB2366893B (en) * 2000-09-08 2004-06-16 Roke Manor Research Improvements in or relating to word processor systems or the like
US7043422B2 (en) 2000-10-13 2006-05-09 Microsoft Corporation Method and apparatus for distribution-based language model adaptation
JP4947861B2 (ja) * 2001-09-25 2012-06-06 キヤノン株式会社 自然言語処理装置およびその制御方法ならびにプログラム
US7610189B2 (en) * 2001-10-18 2009-10-27 Nuance Communications, Inc. Method and apparatus for efficient segmentation of compound words using probabilistic breakpoint traversal
FR2841355B1 (fr) * 2002-06-24 2008-12-19 Airbus France Procede et dispositif pour elaborer une forme abregee d'un terme quelconque qui est utilise dans un message d'alarme destine a etre affiche sur un ecran du poste de pilotage d'un aeronef
JP2005100335A (ja) 2003-09-01 2005-04-14 Advanced Telecommunication Research Institute International 機械翻訳装置、機械翻訳コンピュータプログラム及びコンピュータ
US20050131931A1 (en) * 2003-12-11 2005-06-16 Sanyo Electric Co., Ltd. Abstract generation method and program product
JP2005251115A (ja) * 2004-03-08 2005-09-15 Shogakukan Inc 連想検索システムおよび連想検索方法
US7406416B2 (en) 2004-03-26 2008-07-29 Microsoft Corporation Representation of a deleted interpolation N-gram language model in ARPA standard format
JP3998668B2 (ja) * 2004-07-14 2007-10-31 沖電気工業株式会社 形態素解析装置、方法及びプログラム
KR100735308B1 (ko) * 2005-08-30 2007-07-03 경북대학교 산학협력단 단문 메시지에 대한 자동 띄어쓰기 프로그램이 기록된 기록매체
US7747427B2 (en) * 2005-12-05 2010-06-29 Electronics And Telecommunications Research Institute Apparatus and method for automatic translation customized for documents in restrictive domain
US8170868B2 (en) * 2006-03-14 2012-05-01 Microsoft Corporation Extracting lexical features for classifying native and non-native language usage style
CA2675208A1 (en) * 2007-01-10 2008-07-17 National Research Council Of Canada Means and method for automatic post-editing of translations
US9465791B2 (en) * 2007-02-09 2016-10-11 International Business Machines Corporation Method and apparatus for automatic detection of spelling errors in one or more documents
US8332207B2 (en) * 2007-03-26 2012-12-11 Google Inc. Large language models in machine translation
JP2008287406A (ja) * 2007-05-16 2008-11-27 Sony Corp 情報処理装置および情報処理方法、プログラム、並びに、記録媒体
KR100911834B1 (ko) * 2007-12-11 2009-08-13 한국전자통신연구원 번역 시스템에서 오류 보정 패턴을 이용한 번역 오류 수정 방법 및 장치
US8229728B2 (en) * 2008-01-04 2012-07-24 Fluential, Llc Methods for using manual phrase alignment data to generate translation models for statistical machine translation
JP2009245308A (ja) * 2008-03-31 2009-10-22 Fujitsu Ltd 文書校正支援プログラム、文書校正支援方法および文書校正支援装置
KR101496885B1 (ko) * 2008-04-07 2015-02-27 삼성전자주식회사 문장 띄어쓰기 시스템 및 방법
KR100961717B1 (ko) * 2008-09-16 2010-06-10 한국전자통신연구원 병렬 코퍼스를 이용한 기계번역 오류 탐지 방법 및 장치
US20100076764A1 (en) * 2008-09-19 2010-03-25 General Motors Corporation Method of dialing phone numbers using an in-vehicle speech recognition system
JP4701292B2 (ja) * 2009-01-05 2011-06-15 インターナショナル・ビジネス・マシーンズ・コーポレーション テキスト・データに含まれる固有表現又は専門用語から用語辞書を作成するためのコンピュータ・システム、並びにその方法及びコンピュータ・プログラム
JP5436868B2 (ja) 2009-01-13 2014-03-05 Kddi株式会社 正解判定装置、正解判定システム、正解判定方法および正解判定プログラム
EP2405423B1 (en) * 2009-03-03 2013-09-11 Mitsubishi Electric Corporation Voice recognition device
JP2010257021A (ja) 2009-04-22 2010-11-11 Kddi Corp 文章修正装置、文章修正システム、文章修正方法、文章修正プログラム
KR101027791B1 (ko) * 2009-08-11 2011-04-07 주식회사 케피코 직분식 연료레일의 마운트 구조체
KR101250900B1 (ko) 2009-08-17 2013-04-04 한국전자통신연구원 문서정보 학습기반 통계적 hmm 품사 태깅 장치 및 그 방법
KR20110061209A (ko) * 2009-12-01 2011-06-09 한국전자통신연구원 후처리 지식 생성 장치
US9020805B2 (en) * 2010-09-29 2015-04-28 International Business Machines Corporation Context-based disambiguation of acronyms and abbreviations
JP5392228B2 (ja) * 2010-10-14 2014-01-22 株式会社Jvcケンウッド 番組検索装置および番組検索方法
US8316030B2 (en) * 2010-11-05 2012-11-20 Nextgen Datacom, Inc. Method and system for document classification or search using discrete words
US9164983B2 (en) * 2011-05-27 2015-10-20 Robert Bosch Gmbh Broad-coverage normalization system for social media language
US20130103390A1 (en) * 2011-10-21 2013-04-25 Atsushi Fujita Method and apparatus for paraphrase acquisition
US9501759B2 (en) * 2011-10-25 2016-11-22 Microsoft Technology Licensing, Llc Search query and document-related data translation
US9311286B2 (en) * 2012-01-25 2016-04-12 International Business Machines Corporation Intelligent automatic expansion/contraction of abbreviations in text-based electronic communications
US9785631B2 (en) * 2012-03-16 2017-10-10 Entit Software Llc Identification and extraction of acronym/definition pairs in documents
JP5870790B2 (ja) * 2012-03-19 2016-03-01 富士通株式会社 文章校正装置、及び文章校正方法
US9659059B2 (en) * 2012-07-20 2017-05-23 Salesforce.Com, Inc. Matching large sets of words
KR20150024188A (ko) * 2013-08-26 2015-03-06 삼성전자주식회사 음성 데이터에 대응하는 문자 데이터를 변경하는 방법 및 이를 위한 전자 장치

Also Published As

Publication number Publication date
US20160217122A1 (en) 2016-07-28
CN105593845A (zh) 2016-05-18
KR101509727B1 (ko) 2015-04-07
JP6532088B2 (ja) 2019-06-19
US10282413B2 (en) 2019-05-07
JP2016538666A (ja) 2016-12-08
WO2015050321A1 (ko) 2015-04-09
CN105593845B (zh) 2018-04-17

Similar Documents

Publication Publication Date Title
WO2015050321A8 (ko) 자율학습 정렬 기반의 정렬 코퍼스 생성 장치 및 그 방법과, 정렬 코퍼스를 사용한 파괴 표현 형태소 분석 장치 및 그 형태소 분석 방법
GB2542288A (en) Enhancing reading accuracy, efficiency and retention
RU2013156495A (ru) Разрешение семантической неоднозначности при помощи семантического классификатора
GB2553233A (en) Techniques for providing visual translation cards including contextually relevant definitions and examples
Shin et al. Assessing the relative roles of vocabulary and syntactic knowledge in reading comprehension
Ojha et al. The RGNLP machine translation systems for WAT 2018
Droganova Building a dependency parsing model for Russian with maltparser and Mystem tagset
KR20160050652A (ko) 신규 언어의 트리뱅크를 구축하는 방법
Zhao et al. Grammatical planning scope in sentence production: Evidence from Chinese sentences.
Prastikawati Error analysis and its significance for English foreign teachers
SIRAIT An analysis on grammatical errors in communication made by the employees at Golden Virgo Hotel Batam
Moonga An analysis of written english errors made by grade 11 pupils in a multilingual context: a case of selected schools in Kabwe and monze districts of Zambia
Aung A lexicon based sentiment analyzer framework for student-teacher textual comments
Ferreira Luz et al. Semantic Parsing: Syntactic assurance to target sentence using LSTM Encoder CFG-Decoder
Yan China's Top New Words
Whanchit Persuasive Features in Reviews Written by EFL Students
Kithulgoda Should We Say „This Is Wrong‟?; Impact of Explicit Corrective Feedback on Language Accuracy
Septiani et al. PRONUNCIATION ERROR PERFORMANCE OF ENGLISH DEPARTMENT STUDENTS OF MUHAMMADIYAH UNIVERSITY OF MALANG
Nagy Automatic Detection of Multiword Expressions with Dependency Parsers on Different Languages
Ge Automatic scoring of english writing based on joint of lexical and phrasal features
CHEN Teaching development grants final and financial report: Developing a corpus-based online pronunciation learning system for Cantonese learners of Mandarin and Japanese
An General Error Analysis and Teaching Mode Reform in Chinese Grammar Teaching----with students majored in Chinese Language of Yili Normal University as an example
Saptayani et al. Grammatical Errors on Narrative Writing Committed by 8th Grade Students in SMP N 5 Singaraja in Academic Year 2014/2015
朱虹 On the Analysis of Tag Questions in English Teaching
Bozşahin Verbal Categories in Turkish Sign Language

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14851343

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 15026275

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2016546716

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14851343

Country of ref document: EP

Kind code of ref document: A1