KR102165940B1

KR102165940B1 - System and method for providing cbmr based music identifying serivce using note

Info

Publication number: KR102165940B1
Application number: KR1020190024699A
Authority: KR
Inventors: 전대덕
Original assignee: 전대덕
Priority date: 2019-03-04
Filing date: 2019-03-04
Publication date: 2020-10-14
Also published as: KR20200106328A

Abstract

CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템이 제공되며, 음원 콘텐츠가 스트리밍 또는 재생되는 화면 상에 음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트를 출력하고, 음원 콘텐츠를 오디오 파일로 수집하여 음악 검색을 위한 질의(Query) 키워드로 추출하여 송출하고, 입력값에 대응하는 피드백을 수신하여 출력하는 사용자 단말, 및 사용자 단말에서 출력된 음악 검색 이벤트를 수신하는 수신부, 사용자 단말에서 출력되는 음원 콘텐츠를 수집하여 실시간 스트리밍하는 스트리밍부, 실시간 스트리밍으로 수신된 음원 콘텐츠의 일부 또는 전부를 키워드로 내용기반 음악 검색(CBMR: Content-Based Music Retrieval)을 진행하는 검색부, 및 진행 결과를 피드백 응답으로 사용자 단말로 전송하는 전송부를 포함하는 음악 검색 서비스 제공 서버를 포함한다.A music search service providing system using CBMR-based sound is provided, and a music search event input through voice recognition or user interface is output on the screen where sound source content is streamed or played, and music is searched by collecting sound source content as an audio file. A user terminal that extracts and transmits it as a query keyword for, and receives and outputs a feedback corresponding to the input value, a receiver that receives a music search event output from the user terminal, and collects sound source content output from the user terminal Streaming unit for real-time streaming, a search unit for content-based music search (CBMR: Content-Based Music Retrieval) using keywords for part or all of the sound source content received by real-time streaming, and the progress result to the user terminal as a feedback response. It includes a music search service providing server including a transmission unit for transmitting.

Description

ＣＢＭＲ 기반 음을 이용한 음악 검색 서비스 제공 시스템 및 방법{SYSTEM AND METHOD FOR PROVIDING CBMR BASED MUSIC IDENTIFYING SERIVCE USING NOTE}A system and method for providing a music search service using a CDM-based sound {SYSTEM AND METHOD FOR PROVIDING CBMR BASED MUSIC IDENTIFYING SERIVCE USING NOTE}

본 발명은 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법에 관한 것으로, 음을 이용하여 음원을 검색할 수 있는 시스템 및 방법을 제공한다.The present invention relates to a method for providing a music search service using a CBMR-based sound, and provides a system and method for searching a sound source using sound.

최근, 급격히 증대되고 있는 멀티미디어 데이터를 사용자에게 편하고 효과적으로 제공하는 것은 내용기반 정보 시스템의 핵심적인 요소이다. 내용기반 음악 검색(CBMR: Content-Based Music Retrieval)이란 악보나 오디오, 심볼문자, 음악파일을 대상으로 마이크로폰을 사용하여 직접 노래를 부르거나 오디오 파일을 재생하거나, 가상 건반이나 오선지에 선율을 음표로 나타내거나 텍스트 박스에 음계나 음정을 입력하여 음악을 검색하는 것으로 노래가사를 텍스트로 탐색하는 가사검색이나 곡명과 같은 메타데이터로 검색하는 기존의 음악 검색과 구별된다. CBMR은 일반 이용자가 선율로 곡명이나 악보 오디오 파일을 찾는 데는 물론, 분위기가 비슷한 음악을 추천한다든지, 음원의 저작권 관리 영역에서 표절여부를 판정하는데 유용하게 사용될 수 있다. In recent years, it is a core element of a content-based information system to provide users with convenient and effective multimedia data, which is rapidly increasing. Content-based music search (CBMR: Content-Based Music Retrieval) targets sheet music, audio, symbolic text, and music files, using a microphone to directly sing a song or play an audio file, or use a melody on a virtual keyboard or staff as notes. It is distinguished from the existing music search, which searches for a song by text or metadata such as a song name, as a search for music by displaying or entering a scale or pitch in a text box. CBMR can be usefully used not only for general users to find song titles or sheet music audio files with melody, but also for recommending music with a similar atmosphere, or for determining plagiarism in the copyright management area of a sound source.

이때, 멜로디를 이용한 내용기반 음악검색 방법이 연구 및 개발되었는데, 이와 관련하여 선행기술인 한국공개특허 제2002-0053979호(2002년07월06일 공개)에는, 인터넷과 웹을 기반으로 하여 원격 사용자가 멜로디를 기반으로 하여 서버에 저장된 음악 자료로부터 원하는 곡을 조회하고 검색할 수 있도록, 네트워크를 통한 음악 자료 검색 방법을 제공하고, 그 구성은 노래자체(멜로디)에 의한 내용 기반의 검색이 가능하도록, 악곡들을 카테고리로 분류하고, 악곡 자료를 저장할 때 실제 오디오 샘플 자료와 함께 해당 곡의 악보도 함께 악곡 데이터베이스에 저장하고, 사용자로부터 내용 기반의 악곡 검색을 위한 오디오 샘플 및 그 검색 명령을 입력받고, 오디오 샘플을 분석하여 악보를 생성하고 검색 명령을 분석하여 검색 기준과 조건을 설정하고, 분석된 검색 기준과 조건에 따라, 생성된 악보를 바탕으로 악곡 데이터베이스를 통해 내용 기반의 악곡 검색을 수행하고, 내용 기반의 악곡 검색 검색 자료를 사용자에게 보여줄 수 있도록 문서로 변환하여 사용자에게 제공하는 구성을 포함한다.At this time, a content-based music search method using melody has been researched and developed. In this regard, Korean Patent Publication No. 2002-0053979 (published on July 6, 2002), a prior art, allows remote users based on the Internet and the web. It provides a music data search method through the network so that you can search and search for desired songs from music data stored in the server based on the melody, and the composition is to enable content-based search by the song itself (melody). The music is classified into categories, and when the music data is saved, the actual audio sample data and the score of the corresponding song are also stored in the music database, and the user inputs an audio sample for content-based music search and the search command. Analyzes samples to generate music scores, analyzes search commands to set search criteria and conditions, performs a content-based music search through a music database based on the generated music scores according to the analyzed search criteria and conditions, and It includes a composition that converts the base music search search data into a document so that it can be displayed to the user and provides it to the user.

다만, 음악데이터베이스를 검색하기 위해 음악 자체를 녹음하여 키워드로 사용하거나 선율을 직접 입력하여 검색에 사용하는 경우, 동일한 음악을 입력함으로 고정도의 인식이 가능하나 키로 사용되는 음악을 녹음할 수 있는 경우에만 사용할 수 있는 단점이 있다. 또한, 내용기반 음원검색을 위해서는, 음악자체나 멜로디를 녹음하여 사용자 단말에 저장한 후 저장된 음원을 업로드하거나, 오프라인에서 음원이 나올 때 마이크를 켜서 녹음을 해야 하는데, 온라인에서 음원을 추출하는 프로그램이 별도로 존재해야 하고, 존재한다고 할지라도 저작권 방지 DRM이 깔려있는 경우가 있어서 녹음이 되지 않는 경우가 존재하므로, 음악 자체를 키워드로 입력하는 과정에서부터 음원추출이 되지 않아 이용할 수조차 없는 기술로 상용화가 어렵고, 상용화가 된다고 할지라도 사용편의가 현저하게 줄어들게 된다.However, if the music itself is recorded and used as a keyword to search the music database, or when the melody is directly input and used for search, high-accuracy recognition is possible by entering the same music, but only when music used as a key can be recorded. There are drawbacks that can be used. In addition, for content-based sound source search, the music itself or melody must be recorded and stored in the user terminal, and then the saved sound source must be uploaded, or when a sound source is released offline, the microphone must be turned on to record. It must exist separately, and even if it exists, it is difficult to commercialize it as a technology that cannot even be used because the sound source cannot be extracted from the process of entering the music itself as a keyword. Even if it is commercialized, the convenience of use is significantly reduced.

본 발명의 일 실시예는, 온라인 및 오프라인에서 음악을 검색하기 위하여 음원을 입력할 때 음원이 플레이되는 인터페이스 상에 오버레이되도록 검색 인터페이스를 출력시키고, 검색 인터페이스에서 스트리밍되고 있는 음원을 자동으로 추출하여 검색을 위한 키워드로 임시저장함으로써 저장으로 인한 저작권 침해가 없도록 하고, 임시저장된 키워드를 이용하여 웹크롤링을 통하여 데이터베이스를 구축하지 않고도 내용기반 음악검색이 가능하도록 하고, 사용자가 미처 음악검색 인터페이스를 구동하지 못했다고 할지라도, 음원이 플레이되고 있는 시점의 시간과 플레이되는 URL 또는 장소를 추출하여, 메타 데이터와 비교검색을 통하여 음악을 검색할 수 있도록 하는 하이브리드형 검색까지 제공할 수 있는, CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템 및 방법을 제공할 수 있다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.According to an embodiment of the present invention, when a sound source is input to search for music online or offline, a search interface is output so that the sound source is overlaid on an interface on which the sound source is played, and the sound source streamed from the search interface is automatically extracted and searched. Temporarily store as a keyword to prevent copyright infringement due to storage, and make content-based music search possible without building a database through web crawling using the temporarily stored keyword, and the user could not run the music search interface. However, music using CBMR-based sound that can provide a hybrid search that allows music to be searched through meta data and comparative search by extracting the time at the time the sound source is being played and the URL or place being played. A system and method for providing a search service can be provided. However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예는, 음원 콘텐츠가 스트리밍 또는 재생되는 화면 상에 음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트를 출력하고, 음원 콘텐츠를 오디오 파일로 수집하여 음악 검색을 위한 질의(Query) 키워드로 추출하여 송출하고, 입력값에 대응하는 피드백을 수신하여 출력하는 사용자 단말, 및 사용자 단말에서 출력된 음악 검색 이벤트를 수신하는 수신부, 사용자 단말에서 출력되는 음원 콘텐츠를 수집하여 실시간 스트리밍하는 스트리밍부, 실시간 스트리밍으로 수신된 음원 콘텐츠의 일부 또는 전부를 키워드로 내용기반 음악 검색(CBMR: Content-Based Music Retrieval)을 진행하는 검색부, 및 진행 결과를 피드백 응답으로 사용자 단말로 전송하는 전송부를 포함하는 음악 검색 서비스 제공 서버를 포함한다.As a technical means for achieving the above-described technical problem, an embodiment of the present invention outputs a music search event input through voice recognition or a user interface on a screen on which sound source content is streamed or played, and audio A user terminal that collects as a file, extracts it as a query keyword for music search, transmits it, and receives and outputs a feedback corresponding to an input value, and a receiver that receives a music search event output from the user terminal, in the user terminal A streaming unit that collects and streams the output sound source content in real time, a search unit that performs a content-based music search (CBMR: Content-Based Music Retrieval) based on a keyword for part or all of the sound source content received in real time streaming, and the progress result It includes a music search service providing server including a transmission unit that transmits the feedback response to the user terminal.

본 발명의 다른 실시예는, 사용자 단말로부터 음원 콘텐츠가 스트리밍 또는 재생되는 화면 상에 음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트가 출력되면, 음원 콘텐츠를 오디오 파일로 수집하여 실시간 스트리밍받는 단계, 실시간 스트리밍으로 수신된 음원 콘텐츠의 일부 또는 전부를 키워드로 내용기반 음악 검색(CBMR: Content-Based Music Retrieval)을 진행하는 단계, 및 진행 결과를 피드백 응답으로 사용자 단말로 전송하는 단계를 포함한다.In another embodiment of the present invention, when a music search event input through voice recognition or a user interface is output on a screen where sound source content is streamed or played from a user terminal, collecting the sound source content as an audio file and receiving real-time streaming, And performing a Content-Based Music Retrieval (CBMR) using part or all of the sound source content received through real-time streaming as a keyword, and transmitting the progress result to the user terminal as a feedback response.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 온라인 및 오프라인에서 음악을 검색하기 위하여 음원을 입력할 때 음원이 플레이되는 인터페이스 상에 오버레이되도록 검색 인터페이스를 출력시키고, 검색 인터페이스에서 스트리밍되고 있는 음원을 자동으로 추출하여 검색을 위한 키워드로 임시저장함으로써 저장으로 인한 저작권 침해가 없도록 하고, 임시저장된 키워드를 이용하여 웹크롤링을 통하여 데이터베이스를 구축하지 않고도 내용기반 음악검색이 가능하도록 하고, 사용자가 미처 음악검색 인터페이스를 구동하지 못했다고 할지라도, 음원이 플레이되고 있는 시점의 시간과 플레이되는 URL 또는 장소를 추출하여, 메타 데이터와 비교검색을 통하여 음악을 검색할 수 있도록 하는 하이브리드형 검색까지 제공할 수 있다.According to any one of the above-described problem solving means of the present invention, when a sound source is input to search for music online or offline, the search interface is output so that the sound source is overlaid on the played interface, and the sound source streamed from the search interface It automatically extracts and temporarily stores it as a keyword for search so that there is no copyright infringement due to storage, and content-based music search is possible without building a database through web crawling using the temporarily stored keyword. Even if the search interface has not been driven, it is possible to provide a hybrid search that enables music search through meta data and comparative search by extracting the time at which the sound source is being played and the URL or place being played.

도 1은 본 발명의 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템을 설명하기 위한 도면이다.
도 2는 도 1의 시스템에 포함된 음악 검색 서비스 제공 서버를 설명하기 위한 블록 구성도이다.
도 3은 본 발명의 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 도 1의 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템에 포함된 각 구성들 상호 간에 데이터가 송수신되는 과정을 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법을 설명하기 위한 동작 흐름도이다.1 is a diagram illustrating a system for providing a music search service using a CBMR-based sound according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a music search service providing server included in the system of FIG. 1.
3 is a diagram for explaining an embodiment in which a music search service using a CBMR-based sound according to an embodiment of the present invention is implemented.
FIG. 4 is a diagram illustrating a process of transmitting and receiving data between components included in the system for providing a music search service using a CBMR-based sound of FIG. 1 according to an embodiment of the present invention.
5 is a flowchart illustrating a method of providing a music search service using a CBMR-based sound according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are assigned to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only "directly connected" but also "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, it means that other components may be further included, and one or more other features, not excluding other components, unless specifically stated to the contrary. It is to be understood that it does not preclude the presence or addition of any number, step, action, component, part, or combination thereof.

명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다. The terms "about", "substantially" and the like, as used throughout the specification, are used in or close to the numerical value when manufacturing and material tolerances specific to the stated meaning are presented, and are used in the sense of the present invention. To assist, accurate or absolute figures are used to prevent unfair use of the stated disclosure by unscrupulous infringers. As used throughout the specification of the present invention, the term "step (to)" or "step of" does not mean "step for".

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1개의 유닛이 2개 이상의 하드웨어를 이용하여 실현되어도 되고, 2개 이상의 유닛이 1개의 하드웨어에 의해 실현되어도 된다. In the present specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, or two or more units may be realized using one hardware.

본 명세서에 있어서 단말, 장치 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말, 장치 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말, 장치 또는 디바이스에서 수행될 수도 있다. In this specification, some of the operations or functions described as being performed by the terminal, device, or device may be performed instead in a server connected to the terminal, device, or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal, device, or device connected to the server.

본 명세서에서 있어서, 단말과 매핑(Mapping) 또는 매칭(Matching)으로 기술된 동작이나 기능 중 일부는, 단말의 식별 정보(Identifying Data)인 단말기의 고유번호나 개인의 식별정보를 매핑 또는 매칭한다는 의미로 해석될 수 있다.In this specification, some of the operations or functions described as mapping or matching with the terminal means mapping or matching the unique number of the terminal or the identification information of the individual, which is the identification information of the terminal. Can be interpreted as.

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템을 설명하기 위한 도면이다. 도 1을 참조하면, CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템(1)은, 적어도 하나의 사용자 단말(100), 음악 검색 서비스 제공 서버(300), 적어도 하나의 음원 제공 서버(400)를 포함할 수 있다. 다만, 이러한 도 1의 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템(1)은, 본 발명의 일 실시예에 불과하므로, 도 1을 통하여 본 발명이 한정 해석되는 것은 아니다.1 is a diagram illustrating a system for providing a music search service using a CBMR-based sound according to an embodiment of the present invention. 1, a music search service providing system 1 using a CBMR-based sound includes at least one user terminal 100, a music search service providing server 300, and at least one sound source providing server 400 can do. However, since the music search service providing system 1 using the CBMR-based sound of FIG. 1 is only an embodiment of the present invention, the present invention is not limitedly interpreted through FIG. 1.

이때, 도 1의 각 구성요소들은 일반적으로 네트워크(network, 200)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 적어도 하나의 사용자 단말(100)은 네트워크(200)를 통하여 음악 검색 서비스 제공 서버(300) 및 적어도 하나의 음원 제공 서버(400)와 연결될 수 있다. 그리고, 음악 검색 서비스 제공 서버(300)는, 네트워크(200)를 통하여 적어도 하나의 사용자 단말(100), 적어도 하나의 음원 제공 서버(400)와 연결될 수 있다. 또한, 적어도 하나의 음원 제공 서버(400)는, 네트워크(200)를 통하여 음악 검색 서비스 제공 서버(300)와 연결될 수 있다.In this case, each component of FIG. 1 is generally connected through a network 200. For example, as shown in FIG. 1, at least one user terminal 100 may be connected to a music search service providing server 300 and at least one sound source providing server 400 through a network 200. In addition, the music search service providing server 300 may be connected to at least one user terminal 100 and at least one sound source providing server 400 through the network 200. In addition, at least one sound source providing server 400 may be connected to the music search service providing server 300 through the network 200.

여기서, 네트워크는, 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 RF, 3GPP(3rd Generation Partnership Project) 네트워크, LTE(Long Term Evolution) 네트워크, 5GPP(5th Generation Partnership Project) 네트워크, WIMAX(World Interoperability for Microwave Access) 네트워크, 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), 블루투스(Bluetooth) 네트워크, NFC 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Here, the network refers to a connection structure in which information exchange is possible between respective nodes such as a plurality of terminals and servers, and examples of such networks include RF, 3rd Generation Partnership Project (3GPP) network, and Long Term (LTE). Evolution) network, 5GPP (5th Generation Partnership Project) network, WIMAX (World Interoperability for Microwave Access) network, Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network) , Personal Area Network (PAN), Bluetooth (Bluetooth) network, NFC network, satellite broadcasting network, analog broadcasting network, Digital Multimedia Broadcasting (DMB) network, and the like, but are not limited thereto.

하기에서, 적어도 하나의 라는 용어는 단수 및 복수를 포함하는 용어로 정의되고, 적어도 하나의 라는 용어가 존재하지 않더라도 각 구성요소가 단수 또는 복수로 존재할 수 있고, 단수 또는 복수를 의미할 수 있음은 자명하다 할 것이다. 또한, 각 구성요소가 단수 또는 복수로 구비되는 것은, 실시예에 따라 변경가능하다 할 것이다.In the following, the term “at least one” is defined as a term including the singular number and the plural number, and even if the term “at least one” does not exist, each component may exist in the singular or plural, and may mean the singular or plural. It will be self-evident. In addition, it will be possible to change according to the embodiment that each component is provided in a singular or plural.

적어도 하나의 사용자 단말(100)은, CBMR 기반 음을 이용한 음악 검색 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 음악을 검색하는 단말일 수 있다. 이때, 적어도 하나의 사용자 단말(100)은 메타데이터를 검색하는 것이 아니라, 사용자 단말(100) 내에서 또는 다른 단말에서 재생되는 음원을 추출하여 실시간으로 음악 검색 서비스 제공 서버(300)로 전송하여, 피드백으로 검색된 음악의 정보를 수신하는 단말일 수 있다. 물론, 메타데이터를 이용하여 검색을 하는 것을 배제하는 것은 아니다. 여기서, 적어도 하나의 사용자 단말(100)은 음성신호로 검색 인터페이스를 구동할 수도 있도록 음성인식 인터페이스를 포함할 수 있고, 콘텐츠가 플레이되는 도중에 플레이되는 화면 상에 검색 인터페이스가 오버레이되도록 출력할 수도 있는 단말일 수 있다.The at least one user terminal 100 may be a terminal that searches for music using a web page, an app page, a program, or an application related to a music search service using a CBMR-based sound. At this time, the at least one user terminal 100 does not search for metadata, but extracts a sound source played in the user terminal 100 or in another terminal and transmits it to the music search service providing server 300 in real time, It may be a terminal that receives information on music retrieved as feedback. Of course, this does not preclude searching using metadata. Here, the at least one user terminal 100 may include a voice recognition interface to drive the search interface with a voice signal, and a terminal capable of outputting the search interface to be overlaid on a screen played while content is being played. Can be

여기서, 적어도 하나의 사용자 단말(100)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 사용자 단말(100)은, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 사용자 단말(100)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 스마트 패드(smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one user terminal 100 may be implemented as a computer capable of accessing a remote server or terminal through a network. Here, the computer may include, for example, a navigation system, a notebook equipped with a web browser, a desktop, a laptop, and the like. In this case, the at least one user terminal 100 may be implemented as a terminal capable of accessing a remote server or terminal through a network. At least one user terminal 100, for example, as a wireless communication device that is guaranteed portability and mobility, navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) ) All types of handheld-based wireless communication devices such as terminals, smartphones, smartpads, and tablet PCs may be included.

음악 검색 서비스 제공 서버(300)는, CBMR 기반 음을 이용한 음악 검색 서비스 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 제공하는 서버일 수 있다. 그리고, 음악 검색 서비스 제공 서버(300)는, 사용자 단말(100)로부터 실시간으로 스트리밍되는 음원 콘텐츠를 이용하여 CBMR 기반으로 음원을 검색하여 메타데이터를 추출하는 서버일 수 있다. 또한, 음악 검색 서비스 제공 서버(300)는, 탐색질의를 입력하는 방식에 따라 QBE(query by example), QBH(query by humming/sing), QBP(query by playing), QBN(query by music notation), QBC (query by contour) 검색을 수행하는 서버일 수 있고, 오디오 음악의 자동 전사(transcription), 오디오 박자 추적(beat tracking), 다성 음악에서 멜로디 추출과 음표 인식，코드나 키의 검출, 템포 추출, 유사도 매칭기법, 허밍질의 검색방법 등을 이용하여 음을 검색하는 서버일 수 있다. 그리고, 음악 검색 서비스 제공 서버(300)는, 사용자 단말(100)로부터 스트리밍된 음원 콘텐츠 자체를 이용하지 않고, 음원 콘텐츠로부터 악보 데이터를 생성한 후, 기 저장된 악보 데이터와 비교를 통하여 음원의 메타데이터를 검색하는 서버일 수 있으며, 이를 통하여 데이터베이스 구축비용과, 검색에서 발생하는 네트워킹 자원 및 컴퓨팅 자원의 소모를 최소화할 수 있는 서버일 수 있다. 또한, 음악 검색 서비스 제공 서버(300)는, 사용자 단말(100)에서 오프라인에서 출력되는 음원의 검색을 요청할 경우, 오프라인에서 발생되고 있는 상황정보를 이용하여 음원을 검색하는 서버일 수도 있는데, 예를 들어, TV에서 출력되는 OST의 경우 TV의 채널번호정보, 음원이 출력된 시간정보, TV의 메타데이터정보를 이용함으로써 음원 자체를 분석하지 않아도 메타데이터를 이용하거나 메타데이터와 함께 하이브리드로 검색을 수행하는 서버일 수도 있다. 또한, 음악 검색 서비스 제공 서버(300)는, 사용자 단말(100)에서 검색된 음원 콘텐츠를 다시 듣거나 구매하고 싶은 경우, 검색 결과를 적어도 하나의 음원 제공 서버(400)로 토스함으로써, 사용자가 별도의 검색이나 구매를 위한 검색을 하지 않도록 하는 서버일 수 있다.The music search service providing server 300 may be a server that provides a music search service web page, an app page, a program, or an application using a CBMR-based sound. In addition, the music search service providing server 300 may be a server that searches for a sound source based on CBMR using sound source content streamed in real time from the user terminal 100 and extracts metadata. In addition, the music search service providing server 300, QBE (query by example), QBH (query by humming/sing), QBP (query by playing), QBN (query by music notation) according to a method of inputting a search query , QBC (query by contour) search, it can be a server that performs automatic transcription of audio music, audio beat tracking, melody extraction and note recognition from polyphonic music, detection of chords or keys, and tempo extraction , A similarity matching technique, a humming quality search method, or the like may be used to search a sound. And, the music search service providing server 300, without using the sound source content itself streamed from the user terminal 100, after generating the sheet music data from the sound source content, through comparison with the previously stored sheet music data metadata of the sound source. It may be a server that searches for, and through this, it may be a server capable of minimizing the cost of building a database and consumption of networking resources and computing resources generated in the search. In addition, the music search service providing server 300 may be a server that searches for a sound source using context information occurring offline when a user terminal 100 requests a search for a sound source outputted offline. For example, in the case of OST output from TV, by using TV channel number information, sound source time information, and TV metadata information, metadata is used without analyzing the sound source itself, or a hybrid search is performed with metadata. It may be a server that does. In addition, the music search service providing server 300, when a user wants to listen to or purchase the sound source content retrieved from the user terminal 100 again, toss the search result to at least one sound source providing server 400, so that the user It could be a server that prevents you from searching for a search or purchase.

여기서, 음악 검색 서비스 제공 서버(300)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다.Here, the music search service providing server 300 may be implemented as a computer that can access a server or terminal in a remote location through a network. Here, the computer may include, for example, a navigation system, a notebook equipped with a web browser, a desktop, a laptop, and the like.

적어도 하나의 음원 제공 서버(400)는, CBMR 기반 음을 이용한 음악 검색 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하는 음원 제공자 또는 저작권자나 권원있는 배포자의 서버일 수 있다. 또한, 적어도 하나의 음원 제공 서버(400)는, 음악 검색 서비스 제공 서버(300)에서 음원에 대한 정보를 요청하는 경우 이를 제공하고, 음악 검색 서비스 제공 서버(300)에서 사용자 단말(100)로의 음원 제공 요청이 존재하는 경우, 사용자 단말(100)로 음원 콘텐츠의 실시간 스트리밍을 가능케하거나 구매가 가능하도록 결제 인터페이스를 제공하는 서버일 수 있다.The at least one sound source providing server 400 may be a music search service-related web page using CBMR-based sound, an app page, a sound source provider using a program or application, or a server of a copyright holder or authorized distributor. In addition, the at least one sound source providing server 400 provides information on a sound source when the music search service providing server 300 requests it, and the sound source from the music search service providing server 300 to the user terminal 100 When there is a request for provision, it may be a server that provides a payment interface to enable real-time streaming of sound source content to the user terminal 100 or to purchase.

여기서, 적어도 하나의 음원 제공 서버(400)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 네비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 음원 제공 서버(400)는, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 음원 제공 서버(400)는, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 네비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(smartphone), 스마트 패드(smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, the at least one sound source providing server 400 may be implemented as a computer that can access a remote server or terminal through a network. Here, the computer may include, for example, a navigation system, a notebook equipped with a web browser, a desktop, a laptop, and the like. In this case, the at least one sound source providing server 400 may be implemented as a terminal capable of accessing a remote server or terminal through a network. At least one sound source providing server 400, for example, as a wireless communication device that is guaranteed portability and mobility, navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular) , PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband) Internet) terminal, smart phone (smartphone), smart pad (smartpad), such as a tablet PC (Tablet PC) may include all kinds of handheld-based wireless communication devices.

도 2는 도 1의 시스템에 포함된 음악 검색 서비스 제공 서버를 설명하기 위한 블록 구성도이고, 도 3은 본 발명의 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.FIG. 2 is a block diagram illustrating a music search service providing server included in the system of FIG. 1, and FIG. 3 is a diagram illustrating an embodiment in which a music search service using a CBMR-based sound is implemented according to an embodiment of the present invention. It is a drawing for explanation.

도 2를 참조하면, 음악 검색 서비스 제공 서버(300)는, 수신부(310), 스트리밍부(320), 검색부(330), 전송부(340), 전처리부(350), 데이터베이스(360)를 포함할 수 있다.Referring to FIG. 2, the music search service providing server 300 includes a receiving unit 310, a streaming unit 320, a search unit 330, a transmission unit 340, a preprocessor 350, and a database 360. Can include.

본 발명의 일 실시예에 따른 음악 검색 서비스 제공 서버(300)나 연동되어 동작하는 다른 서버(미도시)가 적어도 하나의 사용자 단말(100), 및 적어도 하나의 음원 제공 서버(400)로 CBMR 기반 음을 이용한 음악 검색 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 전송하는 경우, 적어도 하나의 사용자 단말(100), 및 적어도 하나의 음원 제공 서버(400)는, CBMR 기반 음을 이용한 음악 검색 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 설치하거나 열 수 있다. 또한, 웹 브라우저에서 실행되는 스크립트를 이용하여 서비스 프로그램이 적어도 하나의 사용자 단말(100), 및 적어도 하나의 음원 제공 서버(400)에서 구동될 수도 있다. 여기서, 웹 브라우저는 웹(WWW: world wide web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(hyper text mark-up language)로 서술된 하이퍼 텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 넷스케이프(Netscape), 익스플로러(Explorer), 크롬(chrome) 등을 포함한다. 또한, 애플리케이션은 단말 상의 응용 프로그램(application)을 의미하며, 예를 들어, 모바일 단말(스마트폰)에서 실행되는 앱(app)을 포함한다.A music search service providing server 300 according to an embodiment of the present invention or another server (not shown) interlocked with each other is a CBMR based at least one user terminal 100 and at least one sound source providing server 400 Music search service using sound In the case of transmitting an application, program, app page, web page, etc., at least one user terminal 100 and at least one sound source providing server 400, a music search service using a CBMR-based sound You can install or open applications, programs, app pages, web pages, and more. In addition, a service program may be driven in at least one user terminal 100 and at least one sound source providing server 400 by using a script executed in a web browser. Here, the web browser is a program that enables you to use the web (WWW: world wide web) service, and refers to a program that receives and displays hypertext described in HTML (hyper text mark-up language). For example, Netscape , Explorer, chrome, etc. In addition, the application refers to an application program on the terminal, and includes, for example, an app executed on a mobile terminal (smartphone).

도 2를 참조하면, 수신부(310)는 사용자 단말(100)에서 출력된 음악 검색 이벤트를 수신할 수 있다. 이때, 사용자 단말(100)은, 음원 콘텐츠가 스트리밍 또는 재생되는 화면 상에 음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트를 출력하고, 음원 콘텐츠를 오디오 파일로 수집하여 음악 검색을 위한 질의(Query) 키워드로 추출하여 송출할 수 있다. 여기서, 음성인식 인터페이스 또는 사용자 인터페이스는, 음악을 검색하기 위한 인터페이스일 수 있다. 예를 들어, 음성인식 인터페이스는, 사용자가 질의를 하는 경우, 자연어를 전처리하며, 상황정보 및 메타정보를 종합하여 사용자의 질의에 대답을 해야 하는 것인지, 사용자의 명령을 수행하는 것인지를 파악하기 위한 것일 수 있다. 또한, 사용자 인터페이스는, 사용자가 직접 애플리케이션을 구동하거나 애플리케이션 아이콘을 터치함으로써 사용자 단말(100) 상에 플로팅 및 음원 콘텐츠 상에 오버레이되는 인터페이스일 수 있다. 이에 따라, 사용자가 음원 콘텐츠를 직접 녹음하여 저장한 후 업로드하는 과정이 없이도 실시간으로 음성인식 인터페이스 또는 사용자 인터페이스를 출력하는 것만으로도 음원 콘텐츠 추출 및 수신부(310)로의 업로드 또는 스트리밍이 가능하게 된다.Referring to FIG. 2, the receiver 310 may receive a music search event output from the user terminal 100. At this time, the user terminal 100 outputs a music search event input through voice recognition or a user interface on a screen on which sound source content is streamed or played, collects the sound source content as an audio file, and queries for music search (Query ) Can be extracted by keyword and transmitted. Here, the voice recognition interface or the user interface may be an interface for searching for music. For example, when a user makes a query, the voice recognition interface pre-processes natural language and combines context information and meta information to determine whether to answer the user's query or to execute the user's command. Can be. In addition, the user interface may be an interface that is overlaid on the floating and sound source content on the user terminal 100 by a user directly driving an application or touching an application icon. Accordingly, it is possible to extract sound source content and upload or stream to the receiving unit 310 simply by outputting a voice recognition interface or user interface in real time without the user directly recording and storing the sound source content and then uploading it.

또한, 질의 키워드를 전송하는 방법은 상술한 방법 이외에도 다양할 수 있는데, 예를 들어, 사용자 단말(100)은, 음원 콘텐츠가 출력되는 화면 상에 음악 검색 이벤트를 오버레이하여 출력하고, 음원 콘텐츠가 오디오 파일로 수집되는 시간은 기 설정된 시간 또는 사용자 단말(100)에서 오버레이된 음악 검색 이벤트를 터치한 시간일 수 있다. 종래기술은 음원 콘텐츠의 프로그램이나 애플리케이션이나 음원 검색 프로그램이나 애플리케이션이 별도로 구동되어 사용자가 음원 콘텐츠에서 출력되는 오디오 신호를 녹음한 후, 음원 검색 프로그램에서 다시 검색을 해야 했다면, 본 발명의 일 실시예에 따른 방법에서는, 예를 들어, 페이스북에서 영상이 출력되고, 사용자가 출력된 영상 내에 포함된 오디오 신호에 대응하는 음원 콘텐츠가 궁금하다고 가정하면, 페이스북이 출력되고 있는 화면 상에 음악 검색 이벤트가 오버레이됨으로써, 페이스북을 끄지 않고도 페이스북이 구동되고 있는 화면 상에서 음악을 검색할 수 있도록 구성될 수 있다. 이에 따라, 저작권이 존재하는 음원을 사용자가 불법적으로 취득하지 않도록 할 수 있고, 두 개의 애플리케이션이나 프로그램을 번갈아 출력하기 위하여 사용자가 많은 입력이나 조작을 하지 않아도 된다.In addition, the method of transmitting the query keyword may be various other than the above-described method. For example, the user terminal 100 overlays and outputs a music search event on a screen on which sound source content is output, and the sound source content is audio The time collected as a file may be a preset time or a time when an overlayed music search event is touched by the user terminal 100. In the prior art, if a sound source content program or application or a sound source search program or application is separately driven so that a user records an audio signal output from the sound source content, and then has to perform a search again in the sound source search program, one embodiment of the present invention In the method according to the method, for example, assuming that an image is output from Facebook and the user is curious about the sound source content corresponding to the audio signal included in the output image, a music search event is displayed on the screen on which Facebook is output. By being overlaid, it can be configured so that you can search for music on the screen on which Facebook is running without turning off Facebook. Accordingly, it is possible to prevent the user from illegally acquiring a sound source for which copyright exists, and the user does not have to make many inputs or manipulations to alternately output two applications or programs.

또한, 사용자 단말(100)은, 음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트를 출력하고, 사용자 단말(100)에 내장 또는 외장된 마이크를 통하여 입력된 음원 콘텐츠를 오디오 파일로 수집하여 음악 검색을 위한 질의 키워드로 추출하여 음악 검색 서비스 제공 서버(300)로 전송할 수 있다. 예를 들어, 사용자가 A 까페에 갔는데 현재 흘러나는 곡이 좋아서 해당 곡의 제목을 알고 싶다고 가정한다. 이때, 사용자는 음성으로 "지금 나오는 곡의 제목이 뭐지?"라고 발화하거나, 또는 음악 검색 애플리케이션을 구동시키는 방법으로, 사용자 단말(100)에서 마이크로 입력된 오디오 신호를 수신부(310)로 전송하도록 할 수 있다. 정리하면, 사용자 단말(100)은, 사용자 단말(100) 자체에서 출력되는 음원 뿐만 아니라, 사용자 단말(100)이 아닌 단말이나 오프라인 장소에서 출력되는 음원까지 구동되는 상황에 따라 자동으로 어느 소스(내부 또는 외부)를 택할 것인지를 결정할 수 있고, 결정된 소스를 수집하기 위한 프로그램(스테레오믹서)이나 모듈(마이크) 등을 구동시킴으로써 음원 추출을 수행할 수 있다.In addition, the user terminal 100 outputs a music search event input through voice recognition or a user interface, and collects sound source content input through a microphone built-in or external to the user terminal 100 as an audio file to search for music. It may be extracted as a query keyword for and transmitted to the music search service providing server 300. For example, suppose a user went to Cafe A and wants to know the title of the song because he likes the current song. At this time, the user utters a voice "What is the title of the song that is coming out?" or by driving a music search application, so that the audio signal input from the user terminal 100 is transmitted to the receiving unit 310. I can. In summary, the user terminal 100 automatically determines which source (internal) not only the sound source output from the user terminal 100 itself, but also the sound source output from the terminal or offline location other than the user terminal 100 is driven. Alternatively, it is possible to determine whether to select the external), and to extract the sound source by driving a program (stereo mixer) or a module (microphone) for collecting the determined source.

스트리밍부(320)는, 사용자 단말(100)에서 출력되는 음원 콘텐츠를 수집하여 실시간 스트리밍받을 수 있다. 이때, 사용자 단말(100)의 네트워킹 자원이 기 설정된 자원량을 만족하지 못하는 경우에는, 사용자 단말(100) 자체적으로 수집된 음원 콘텐츠로부터 악보 데이터를 생성하도록 하여, 악보 데이터만을 수집할 수도 있다. 악보 데이터보다 음원 콘텐츠의 용량이 더 크기 때문이다. 물론, 음원 콘텐츠 자체를 스트리밍할 수 있을 정도로 사용자 단말(100)의 네트워킹 자원이 만족된다면 상술한 방법은 이용되지 않을 수도 있다. 그리고, 사용자 단말(100)의 컴퓨팅 자원이 음원 콘텐츠로부터 악보를 추출할 만큼의 자원을 만족하지 않는 경우에는 상술한 방법은 역시 수행되지 않을 수도 있다. 이에 따라, 스트리밍부(320)는 사용자 단말(100)의 컴퓨팅 자원 및 네트워킹 자원을 고려하여 어느 방법으로 질의 키워드를 수집할 것인지를 결정하고, 결정에 따라 음원 콘텐츠를 자체적으로 스트리밍받거나, 음원 콘텐츠를 낮은 음질로 사이즈를 다운시켜 용량이 작아진 음원 콘텐츠를 수신하거나, 악보 데이터로 변환된 음원 콘텐츠를 수신하는 방법 등 플렉서블하게 조절할 수 있다.The streaming unit 320 may collect sound source content output from the user terminal 100 and receive real-time streaming. At this time, when the networking resource of the user terminal 100 does not satisfy a preset amount of resources, the user terminal 100 may generate sheet music data from the collected sound source content by itself, thereby collecting only sheet music data. This is because the volume of the sound source content is larger than the sheet music data. Of course, if the networking resources of the user terminal 100 are satisfied enough to stream the sound source content itself, the above-described method may not be used. In addition, when the computing resource of the user terminal 100 does not satisfy enough resources for extracting sheet music from sound source content, the above-described method may not be performed. Accordingly, the streaming unit 320 determines in which method to collect the query keyword in consideration of the computing resources and networking resources of the user terminal 100, and receives the sound source content by itself or receives the sound source content according to the determination. It is possible to flexibly adjust a method of receiving sound source content whose capacity is reduced by reducing the size with low sound quality, or receiving sound source content converted into sheet music data.

검색부(330)는, 실시간 스트리밍으로 수신된 음원 콘텐츠의 일부 또는 전부를 키워드로 내용기반 음악 검색(CBMR: Content-Based Music Retrieval)을 진행할 수 있다. 이때, 검색부(330)는, 서지정보를 제외하고 사용할 수 있는 정보(contents)로, 6 가지의 속성을 이용할 수 있다. 이때, 속성은, 피치, 시간, 화성, 음색, 편집, 텍스트 정보를 포함할 수 있다. 피치(pitch) 정보는 피치, 음정(interval), 선율의 윤곽(melody contour)과 같은 특성으로 표현된다. 피치는 음높이 즉 사람이 느끼는 음의 고유 주파수로 Hz 와 같은 초당 진동수나, A4, B3와 같은 기보법을 사용할 수 있다. 이때, A4는 보통 440 Hz를 나타내고 주파수가 2배가 되면 한 옥타브 높은 음(A5)이 된다. 두 개의 피치 사이의 차이는 음정이라 하며, 반음을 1도로 하여 8도까지 증(+) 감(-)을 표현한다. 선율(melody)이란 여러 개의 음을 규칙에 따라 시간적으로 배치한 것으로 듣는 사람에게 같은 느낌을 주는 C장조의 선율 EDCEDC 와 G 장조의 BAGBAG는 선율의 윤곽이 동일하다. 윤곽은 주로 파슨스(Parsons) 코드로 표현될 수 있다. 시간(temporal)정보는 템포(tempo), 박자(meter), 피치 길이(duration), 리듬을 포함한다. 템포는 분당 비트수로 표시하거나( J = 120은 1분에 120박자 빠르기), 아다지오나 프레스토 등과 같은 지시어로 표현할 수 있다. 그리고, 박자(meter)는 몇 분 음표 몇 개로 한 마디가 구성되는지를 나타내는 것으로 3/4박자는 4분음표 3개가 한 마디를 이루는 것을 의미한다. 피치 길이는 4분음표, 8분음표 등 음의 길이이다. 쉼표를 포함하여 음의 길이나 강약，빠르기 등에 따라 되풀이되는 음의 흐름을 리듬이라 한다. 화성(harmonic) 이란 둘 이상의 피치를 동시에 울려 합성된 음으로 다성(polyphony)이라고도 하며. 화성이 발생하지 않은 단성(monophony)과 구별된다. 화성은 코드(chord)로 표현되며 3화음(triads), 4화음 등이 있다. 음색(timbral)은 사람이 음을 들었을 때 느끼는 음의 이미지를 말하며, 그 음이 어떤 악기의 음이라고 판단하는 것은 악기에 따라 피치의 음색이 다르기 때문이다. 과학적으로는 기본파의 몇 배나 되는 고주파가 어떤 비율(spector)로 존재하는가에 따라 순간의 음색이 결정되지만，음이 나기 시작할 때와 뒤의 여운에 따라 음색은 변화한다. 이 시간의 변화는 악기에 따라 고유의 패턴이 있기 때문에 스펙터의 시간변화로 알 수 있다. 또한 운궁법(bowings), 운지법(fingerings), 피아노 페달링(pedalings)과 같은 편집 정보로도 음색을 알 수 있다. 편집(editorial) 정보는 운궁법, 운지법，이음줄, 음의 세기 (p，ff, crescendo 등) 등 연주 지시사항을 나타내는 정보이고, 텍스트(textual) 정보는 노래의 가사(lyric)이다. 6가지 내용기반 음악정보를 모두 랜덤하게 이용할 수도 있지만, 본 발명의 일 실시예에서는, 피치나 시간，화성 정보에 우선순위를 두고 우선적으로 이용할 수 있다. The search unit 330 may perform a Content-Based Music Retrieval (CBMR) using some or all of the sound source content received through real-time streaming as a keyword. In this case, the search unit 330 may use six attributes as information that can be used excluding bibliographic information. At this time, the attribute may include pitch, time, harmony, tone, editing, and text information. Pitch information is expressed in characteristics such as pitch, interval, and melody contour. Pitch is the pitch, that is, the natural frequency of the sound that a person feels, and it is possible to use a frequency per second such as Hz or a notation method such as A4 and B3. At this time, A4 usually represents 440 Hz, and when the frequency is doubled, it becomes an octave higher note (A5). The difference between the two pitches is called the pitch, and a semitone is 1 degree to express an increase (+) or decrease (-) up to 8 degrees. A melody is a temporal arrangement of several notes according to a rule, and EDCEDC in C major and BAGBAG in G major, which give the listener the same feeling, have the same outline of the melody. The contours can be expressed primarily in Parsons code. Temporal information includes a tempo, a meter, a pitch duration, and a rhythm. Tempo can be expressed in beats per minute (J = 120 is 120 beats per minute), or it can be expressed with indicators such as Adagio or Presto. In addition, a meter indicates how many quarter notes constitute a measure, and a 3/4 beat means that three quarter notes make up a measure. The pitch length is the length of notes such as quarter notes and eighth notes. Rhythm is the flow of repetitive notes according to the length, strength, and speed of the note, including commas. Harmonic is a sound synthesized by resonating two or more pitches at the same time, also called polyphony. It is distinguished from the monophony, where Mars did not occur. Harmons are expressed in chords, and there are triads and 4th chords. Timbral refers to the image of sound that a person feels when listening to a note, and the reason that the sound is judged to be a sound of a certain instrument is because the tone of the pitch differs depending on the instrument. Scientifically, the tone of the moment is determined by the spectrum of the high frequencies, which are several times the fundamental wave, but the tone changes depending on when the sound starts to sound and the lingering sound behind it. This change in time can be seen as the change in time of the Specter because each instrument has its own pattern. The tone can also be identified by editing information such as bowings, fingerings, and piano pedalings. The editorial information is information indicating performance instructions, such as fingering, fingering, joint lines, and intensity (p, ff, crescendo, etc.), and textual information is the lyrics of a song. All six content-based music information may be used randomly, but in an embodiment of the present invention, pitch, time, and harmony information may be prioritized and used.

또한, 검색부(330)는, 색인자질로서의 심볼문자를 이용할 수 있다. 음을 심볼문자로 표현하는 데는 주로 피치나 음길이(pitch or duration)가 사용될 수 있다. 예를 들어, 피치만 표현하는 방법과, 음길이로 표현하는 방법, 피치에 음길이까지 포함한 방법을 이용할 수 있으나, 이에 한정되는 것은 아니다. 피치는 도레미로 표현하는 방법과, Midi 파일에서 사용되는 절대(absolute) 피치，절대 피치를 12개의 값으로 축소한 directed modulo-12 value 등으로 나타낼 수 있다. 음악포맷은 중간 옥타브 do는 C4, 한 옥타브 높은 do 는 C5, 한 옥타브 낮은 do 는 C3와 같이 옥타브를 표현한다. ABC 코드는 중간 옥타브 do는 대문자 C로 한 옥타브 높은 do는 소문자 c, 두 옥타브 높은 do은 c’，한 옥타브 낮은 do는 컴마를 붙여 C,로 표시한다. Sharp와 flat은 문자열 뒤에 과 " - ”를 추가하여 나타낼 수 있다.Further, the search unit 330 may use symbol characters as index features. Pitch or duration can be mainly used to represent notes in symbolic letters. For example, a method of expressing only the pitch, a method of expressing with a sound length, and a method including a pitch and a sound length may be used, but are not limited thereto. The pitch can be expressed in terms of a method of expressing in Doremi, an absolute pitch used in a Midi file, a directed modulo-12 value in which the absolute pitch is reduced to 12 values. In the music format, the middle octave do is C4, one octave higher do is C5, and one octave lower do is C3. In the ABC code, the middle octave do is capitalized C, one octave higher do is lowercase c, two octave higher dos c', and one octave lower do is denoted by C, with a comma. Sharp and flat can be represented by adding and "-" after the string.

음정(interval)은 인접한 두 피치의 차이를 반 음을 1도로 도수에 증(+) 감(-)을 표현하는 exact interval과，선율의 키(key)와 피치와의 차이를 나타내는 key relative, 앞 음에 비해 뒤에 오는 음의 피치가 높고 낮고 동일한지만을 U(p), D(own), R(epeat)로 표현하는 파슨스(Parsons) 코드와, 높고 낮음의 차이가 임계치보다 크고 작음까지 표현하는 방법을 이용할 수 있다. 이때, 본 발명의 일 실시예에서는, 상술한 바와 같이 피치와 시간(temporal) 정보를 함께 이용하는 것이 효율이 좋으므로, 윤곽(contour) 보다는 음정(interval)을，음길이를 함께 사용하는 것을, 두 음표의 피치 변화량과 비율을 함께 사용하는 것이 바람직할 수 있다. The interval is an exact interval that expresses the difference between two adjacent pitches and increases (+) or decreases (-) the frequency by 1 degree, and the key relative, which represents the difference between the key of the melody and the pitch. Parsons code, which expresses only the pitch of the following note in terms of U(p), D(own), and R(epeat), and the difference between high and low is greater than or equal to the threshold. Method can be used. At this time, in an embodiment of the present invention, since it is efficient to use the pitch and temporal information together as described above, it is recommended to use the interval rather than the contour and the length together. It may be desirable to use a combination of the amount and rate of pitch change of the note.

또한, 오디오 파일에서 색인자질을 추출하여 심볼문자로 변환하기 위해서는, Midi 파일을 대상으로 심볼문자로 색인하고, 검색부(330)는, 스트리밍부(320)로부터 들어온 오디오 질의를 대상으로 오디오 파일의 변환과 멜로디 분리 및 추출，심볼문자로의 변환하는 과정을 거칠 수 있다. 우선, WAV, PCM, MP3와 같은 오디오 파일은 먼저 샘플링(sampling)을 통해 연속적인 소리 신호를 이산적인 데이터 값으로 변환한 후，색인자질로 음향의 신호 자질이나 오디오 핑거프린팅 정보 혹은 심볼문자 정보 등을 추출할 수 있다. 신호 자질로는 Loudness, Pitch, Tone (brightness and bandwidth), Mel-filtered Cepstral Coefficients(MFCCs), Derivatives 등이 이용될 수 있다. 또는, 오디오 핑거프린팅이 이용될 수도 있는데, 이때 핑거프린트는 시간ㅡ주파수 스펙트럼에서 강도(intensity)가 높은 피크 강도(peak intensity)의 주파수를 나타낸다. 10초 동안 몇 지점에서의 피크 주파수(Hz)로 나타낼 수 있다. 이때, 상술한 방법과 무관하게 질의가 길수록, 예를 들어 최소 15초 이상의 음원 데이터를 질의로 받아야 우수하므로, 최소한 15초동안 스트리밍을 받을 수 있다. 또는, 음악에서 보컬만 추출하여 핑거프린팅으로 색인한 DB에서 허밍질의를 핑거프린팅으로 유사도(유클리디안 거리)로 검색할 수도 있다.In addition, in order to extract the index feature from the audio file and convert it into a symbol letter, the MIDI file is indexed as a symbol letter, and the search unit 330 uses the audio file as a target for the audio query received from the streaming unit 320. Transformation, separation and extraction of melody, and conversion into symbolic characters can be performed. First of all, audio files such as WAV, PCM, and MP3 are first converted to discrete data values through sampling, and then sound signal quality, audio fingerprinting information, or symbol character information, etc. Can be extracted. As signal qualities, Loudness, Pitch, Tone (brightness and bandwidth), Mel-filtered Cepstral Coefficients (MFCCs), Derivatives, and the like may be used. Alternatively, audio fingerprinting may be used. In this case, the fingerprint represents a frequency of a peak intensity having a high intensity in the time-frequency spectrum. It can be expressed as the peak frequency (Hz) at several points over 10 seconds. In this case, regardless of the above-described method, the longer the query is, the better the sound source data must be received, for example, for at least 15 seconds, so that the streaming can be received for at least 15 seconds. Alternatively, a humming query may be searched for similarity (Euclidean distance) through fingerprinting in a DB indexed by fingerprinting by extracting only vocals from music.

검색부(330)에서 오디오 파일에서 심볼문자 자질을 추출하는 과정은 다성 파일에서 멜로디를 추출하는 것으로 시작된다. 노래를 반주와 분리한다든지, 주악기의 선율을 협주 악기의 선율과 분리하거나, 다중 트랙의 Midi 파일에서 멜로디 트랙을 분리하는 것이다. 오디오에서 주파수 영역으로 멜로디(중/고 주파수)와 베이스(낮은 주파수)를 구별할 수 있는데, Midi에서 피치 값을 기반으로 평균값이 낮고 편차가 큰 트랙을 반주 트랙으로 간주하여 제거할 수도 있다. 마이크를 통해 입력된 허밍질의나 음악 파일에서 피치나 음정，음길이와 같은 심볼문자 정보를 추출하기 위한 방법으로는 ZCR(zero crossing rate)이나 ACF(autocorrelation function)를 이용하는 시간 기반 방법과, FFT(fast Fourier transformation) 와 같은 주파수 기반 방법，HMM(hidden Markov model)과 같은 확률모델을 이용할 수도 있다. 질의의 키(key)를 추출하는 방법으로 PLCA(probabilistic latent component analysis)와, REPET(REpeating Pattern Extraction Technology)가 이용될 수도 있다.The process of extracting symbol character features from an audio file in the search unit 330 begins with extracting a melody from a polyphonic file. Separating the song from the accompaniment, separating the melody of the main instrument from the melody of the concerto instrument, or separating the melody track from the multi-track Midi file. In the audio, the melody (medium/high frequency) and the bass (low frequency) can be distinguished from the frequency domain, and the track with a low average value and a large deviation based on the pitch value in Midi can be considered as an accompaniment track and removed. To extract symbol character information such as pitch, pitch, and pitch from a humming query or music file input through a microphone, a time-based method using ZCR (zero crossing rate) or ACF (autocorrelation function), and FFT ( A frequency-based method such as fast Fourier transformation) and a probability model such as HMM (hidden Markov model) may be used. PLCA (probabilistic latent component analysis) and REPET (REpeating Pattern Extraction Technology) may be used as a method of extracting a query key.

검색부(330)는 그 다음으로 색인단위로 분할하고 정규화 과정을 거쳐야 하는데, 하나의 동기는 2개의 마디로 구성되며, 한 마디의 음표의 수는 박자에 의해 결정된다. 음악을 색인할 단위로 나누는 데는 첫 번째 동기만으로, 혹은 전체 동기를 대상으로 또는 주제(theme) 선율을 대상으로 분할하는 것에서 시작된다. 주제 선율이 되는 부분은 보통 곡의 시작위치의 2마디, 4마디，혹은 8마디가 될 수 있으나 곡에 따라 음악의 시작 부분이 아닌 곳에서 주제선율이 생성될 수 있다. 보통은 반복되는 선율을 주제선율로 추출하나, 검색부(330)는 일정 길이 이상의 쉼표 후에 시작되는 선율을 동기 간의 유사도를 기반으로 동기들을 클러스터링하여 각 클러스터를 대표하는 선율을 주제선율로 추출할 수도 있다. 주제선율 검색은 중요한 혹은 반복적인 선율만 색인함으로써 색인파일의 크기를 축소시켜 검색의 효율을 높일 수 있다. 또한, 검색부(330)는 N-gram 색인을 이용할 수 있는데, 이는 주제선율을 대상으로 한 음씩 오른쪽으로 이동시키며 N개의 음들을 하나의 색인단위로 분할하는 것인데, 주로 음정 정보를 사용할 수 있다.The search unit 330 is then divided into index units and subjected to a normalization process. One motive consists of two bars, and the number of notes per bar is determined by the beat. Dividing music into indexable units begins with the first motive alone, or the whole motive, or the theme melody. The part that becomes the subject melody can be usually 2 bars, 4 bars, or 8 bars at the beginning of the song, but depending on the song, the theme melody may be generated at a place other than the beginning of the music. Usually, the repetitive melody is extracted as the subject melody, but the search unit 330 clusters the melody starting after a comma longer than a certain length based on the similarity between the segments to extract the melody representing each cluster as the subject melody. have. Thematic melody search can increase the search efficiency by reducing the size of the index file by indexing only important or repetitive melody. In addition, the search unit 330 may use an N-gram index, which shifts the subject melody to the right by one note and divides the N notes into one index unit, and may mainly use pitch information.

또한, 검색부(330)는 오디오 질의를 분할해야 하는데, 시간을 기반으로 예를 들면 10ms 프레임이나, 20ms 프레임과 같은 단위로 분할하거나，N-gram, HMM 등을 사용하여 분할할 수 있다. 그리고, 검색부(330)는 탐색의 성능을 높이기 위해 색인 문자열과 질의를 정규화할 수 있다. 정규화란 동기의 길이를 일치시키는 것으로, 박자(meter)와 발생 시간(ontime) 대한 정규화가 있다. 박자 기반은, 예를 들어 3/4박자의 곡에는 모든 음의 길이에 4를 곱하고, 4/4박자 곡에는 3을 곱하여 두 곡의 동기를 같게 조정하는 것이다. 발생시간 기반은 2분음표를 8분음표 4개로 분리(split)하거나，반대로 8분음표 4개를 2분음표 하나로 합치는(union) 것이다.In addition, the search unit 330 needs to divide the audio query, and may divide it into units such as a 10 ms frame or a 20 ms frame based on time, or may divide it using an N-gram or HMM. In addition, the search unit 330 may normalize an index string and a query in order to improve search performance. Normalization is to match the length of synchronization, and there are normalizations for a meter and an ontime. The beat base is, for example, that for a song with a 3/4 beat, the length of all notes is multiplied by 4, and for a song with a 4/4 beat, the motive of the two songs is adjusted equally. The occurrence time basis is splitting a half note into four eighth notes, or vice versa, combining four eighth notes into one half note.

검색부(330)는 이렇게 정규화한 질의를 이용하여 기 저장된 데이터베이스(600) 또는 웹 크롤링으로 일치하는 음원 콘텐츠를 찾아야 하는데 매칭 방법을 이용할 수 있다. 이때, 도치 색인파일로 구축된 심볼문자나 핑거프린팅 정보는 완전일치 매칭에 유용할 수 있다. 파슨스 코드와 피치 검색에, 피치와 리듬에 대한 N-gram 검색에, 핑거프린트 검색 등에 완전일치 검색이 이용될 수 있다. 다만, 허밍질의의 경우 불일치한 키(key)나 부정확한 음높이와 음길이 등으로 완전일치는 적합 음악을 검색하지 못할 수도 있는데, 검색부(330)는 부정확한 질의 선율을 고려하여 완전일치보다는 부분일치(approximate matching) 기법을 사용할 수 있다. 부분일치 기법으로는 LCS(Longest common subsequence), 편집거리(Editing distanace), EMD(Earth Mover’s Distance), 기하학적(Geometric) 기법，LS(linear scaling), DTW(dynamic time warping) 등이 이용될 수 있다. LCS는 질의와 색인 두 문자열에서 일치하는 문자수 순으로 정렬하는 것으로, 파슨스코드에 대해 사용될 수 있다. 편집거리는 질의 문자열을 색인파일의 문자열로 변환하는데 필요한 최소한의 편집연산(삭제, 추가，교체)의 수로 유사도를 계산한다. 여기서, 다성음악 검색에는, 기하학적 패턴매칭 알고리즘이 이용될 수 있고, 벡터공간 모델 기반의 코사인 유사도를 이용할 수도 있다. 물론, 상술한 방법 이외에도 다양한 CBMR 방법이 이용될 수 있음은 자명하다 할 것이다.The search unit 330 needs to find a matching sound source content by using the pre-stored database 600 or web crawling using the normalized query, but a matching method may be used. At this time, symbol characters or fingerprinting information constructed as an inverted index file may be useful for perfect match matching. An exact match search may be used for Parson's code and pitch search, N-gram search for pitch and rhythm, fingerprint search, and the like. However, in the case of humming, it may not be possible to search for a suitable music that has a perfect match due to an inconsistent key or an incorrect pitch and pitch. The search unit 330 considers the melody of the inaccurate query and Approximate matching can be used. As a partial matching technique, longest common subsequence (LCS), editing distanace, EMD (Earth Mover's Distance), geometric technique, linear scaling (LS), dynamic time warping (DTW), etc. can be used. . LCS sorts by the number of matching characters in the query and index strings, and can be used for Parsons Code. The editing distance is calculated as the number of minimum editing operations (delete, add, replace) required to convert the query string into the index file's string. Here, for polyphonic search, a geometric pattern matching algorithm may be used, or a cosine similarity based on a vector space model may be used. Of course, it will be obvious that various CBMR methods can be used in addition to the above-described methods.

상술한 방법들로 곡명 등이 검색되고 나면, 전송부(340)는 진행 결과를 피드백 응답으로 사용자 단말(100)로 전송할 수 있다. 이때, 사람들이 곡명을 검색하는 이유가, 해당 곡을 소장하고 싶거나 더 듣고 싶기 때문인데, 이를 이용하여 전송부(340)는 적어도 하나의 음원 제공 서버(400)와 사용자 단말(100) 간을 중개할 수 있다. 즉, 전송부(340)는, 곡명 등을 알려줄 때 사용자 단말(100)로 해당 곡을 계속 스트리밍할 수 있는 사이트나 페이지의 URL을 알려주어 사용자 단말(100)이 적어도 하나의 음원 제공 서버(400)로 접속하게 하거나, 적어도 하나의 음원 제공 서버(400)로부터 위탁을 받고 음원을 대신 판매할 수도 있다. 대부분의 사용자들이 음원구매를 꺼리는 이유가, 1곡당 500원 내지 1000원하는 금액이 비싸서가 아니라, 음원을 구매하는 과정이 DRM 때문에 너무나 많은 보안절차, 액티브 엑스 등을 실행하고 깔아야 하기 때문이다. 만약, 엑티브 엑스를 실행하고 프로그램을 설치했다고 할지라도 웹 페이지를 종료하고 설치되기 때문에 사용자는 다시 음원을 검색해야 하고, 결제하는 과정을 반복해야 한다. 즉, 불법적인 사이트보다 더 귀찮은 과정을 몇 번이고 거쳐야 합법적인 음원을 습득할 수 있기 때문에 돈을 내고 합법적으로 구매하려는 사람들도 다시 불법사이트로 돌아가게 되는데, 본 발명의 일 실시예는 상술한 폐해를 막기 위해서 기 저장된 결제수단을 이용하는 방법으로 간단히 웹 페이지로부터 별도의 프로그램을 설치하지 않아도 다운로드받을 수 있도록 구성될 수 있다. 전송부(340)는, 설치되어야 하는 프로그램들은 모두 백그라운드 모드(Background mode)로 자동설치 및 실행되도록 할 수 있으며, 결과적으로 사용자는 어떠한 입력이나 추가적인 설치를 위한 조작을 하지 않아도 간단히 합법적인 음원을 구매할 수 있게 된다.After a song title or the like is searched by the above-described methods, the transmission unit 340 may transmit the progress result to the user terminal 100 as a feedback response. At this time, the reason people search for the song name is because they want to own or listen to the song. Using this, the transmission unit 340 communicates between the at least one sound source providing server 400 and the user terminal 100. You can mediate. That is, when the transmission unit 340 informs the name of the song, the user terminal 100 informs the URL of a site or page that can continue to stream the corresponding song to the user terminal 100 so that the user terminal 100 provides at least one sound source server 400 ), or receiving a consignment from at least one sound source providing server 400 and selling the sound source instead. The reason most users are reluctant to purchase music is not because the amount of 500 won to 1,000 won per song is expensive, but because the process of purchasing music is DRM, so too many security procedures, Active X, etc. have to be executed and installed. Even if you run Active-X and install the program, since it is installed after closing the web page, the user has to search for the sound source again and repeat the payment process. In other words, since it is possible to acquire a legitimate sound source through several more cumbersome processes than an illegal site, people who want to pay money and purchase legally will return to the illegal site again.An embodiment of the present invention is the above-described harm. In order to prevent this, it can be configured so that it can be downloaded without installing a separate program from a web page simply by using a pre-stored payment method. The transmission unit 340 can automatically install and run all programs to be installed in a background mode, and as a result, the user can simply purchase a legitimate sound source without any input or manipulation for additional installation. You will be able to.

전처리부(350)는, 사용자 단말(100)의 마이크를 통하여 입력된 오디오 신호를 질의 키워드로 설정하는 경우, 오디오 신호에 포함된 잡음을 제거하고 왜곡을 보정하여 전처리를 수행할 수 있다. 이때, 마이크를 통해서 입력된 오디오 신호에는 잡음이 내포될 수 있는데, 입력된 음원 콘텐츠의 잡음음원스펙트럼에서 최소잡음스펙트럼을 추정하는 MCRA(Minima-Controlled Recursive Averaging)을 이용할 수 있다. 정된 잡음스펙트럼을 이용하여 가우시안 확률분포 기반의 OM-LSA(Optimally Modified Log Spectral Amplitude) 음원추정방식을 이용하여 잡음제거이득을 계산하고 입력된 오디오 신호에 적용하여 잡음을 제거할 수도 있다. 다만, 상술한 방법에 한정되는 것은 아니고 다양한 방법이 적용될 수 있음은 자명하다 할 것이다.When the audio signal input through the microphone of the user terminal 100 is set as a query keyword, the preprocessor 350 may perform preprocessing by removing noise included in the audio signal and correcting distortion. In this case, noise may be contained in the audio signal input through the microphone, and MCRA (Minima-Controlled Recursive Averaging) may be used to estimate the minimum noise spectrum from the noise source spectrum of the input sound source content. Noise reduction gain can be calculated using the Gaussian probability distribution-based OM-LSA (Optimally Modified Log Spectral Amplitude) sound source estimation method using the determined noise spectrum, and applied to the input audio signal to remove noise. However, it will be apparent that it is not limited to the above-described method, and that various methods can be applied.

데이터베이스(360)는, 적어도 하나의 음원 콘텐츠로부터 음표 및 박자를 추출하여 적어도 하나의 악보 데이터를 구축하고, 적어도 하나의 악보 데이터와 적어도 하나의 음원 콘텐츠를 매핑하여 저장할 수 있다. 이때, 음악 검색 서비스 제공 서버(300)는, 사용자 단말(100)로부터 송출된 질의 키워드에 포함된 오디오 파일로부터 음표 및 박자를 추출하여 악보 데이터를 생성하고, 사용자 단말(100)의 악보 데이터와 데이터베이스의 악보 데이터를 비교하여 피드백을 사용자 단말(100)로 전송할 수 있다. The database 360 may construct at least one sheet music data by extracting notes and beats from at least one sound source content, and map and store at least one sheet music data and at least one sound source content. At this time, the music search service providing server 300 extracts notes and beats from the audio file included in the query keyword transmitted from the user terminal 100 to generate score data, and the score data and database of the user terminal 100 The score data of are compared and the feedback may be transmitted to the user terminal 100.

이하, 상술한 도 2의 음악 검색 서비스 제공 서버의 구성에 따른 동작 과정을 도 3을 예로 들어 상세히 설명하기로 한다. 다만, 실시예는 본 발명의 다양한 실시예 중 어느 하나일 뿐, 이에 한정되지 않음은 자명하다 할 것이다.Hereinafter, the operation process according to the configuration of the music search service providing server of FIG. 2 will be described in detail with reference to FIG. 3 as an example. However, it will be apparent that the embodiment is only any one of various embodiments of the present invention, and is not limited thereto.

도 3을 참조하면, 음악 검색 서비스 제공 서버(300)는 (a) 및 (c)와 같이, 사용자 단말(100) 자체 내에서 출력되는 음원 콘텐츠에 대한 질의를 받거나, (b)와 같이 사용자 단말(100)의 외부에서 출력되는 음원 콘텐츠에 대한 질의를 받을 수 있다. 또한, 음악 검색 서비스 제공 서버(300)는, (a), (b)와 같이 음성인식 인터페이스를 통하여 음원 검색 이벤트를 수신할 수도 있고, (c)와 같이 화면 상에 출력된 사용자 인터페이스의 선택 이벤트로 음원 검색 이벤트를 수신할 수도 있다. 그리고, 음악 검색 서비스 제공 서버(300)는, (b) 및 (c)와 같이 음원 자체 뿐만 아니라, 상황정보, 예를 들어 URL 주소나, IPTV의 편성표 정보와 같은 메타데이터를 함께 이용할 수도 있다.Referring to FIG. 3, the music search service providing server 300 receives a query for sound source content output from the user terminal 100 itself, as shown in (a) and (c), or the user terminal as shown in (b). It is possible to receive an inquiry about the sound source content output from the outside of (100). In addition, the music search service providing server 300 may receive a sound source search event through a voice recognition interface as shown in (a) and (b), or a selection event of a user interface displayed on the screen as shown in (c). You can also receive a sound source search event. In addition, as shown in (b) and (c), the music search service providing server 300 may use not only the sound source itself, but also context information, for example, metadata such as URL address or IPTV schedule information.

상술한 다양한 방법으로 질의를 입력받고 검색을 한 결과를 출력하고 나면, 음악 검색 서비스 제공 서버(300)는, 결과만 피드백하는 것에서 끝나는 것이 아니라, (d)와 같이 해당 검색 결과에 대응하는 음원을 스트리밍하여 플레이하거나, 음원과 관련된 뮤직 비디오를 검색하거나, 음원을 구매할 수 있는 인터페이스를 함께 토스함으로써 사용자의 추가검색이 없이도 다양한 서비스를 즐길 수 있도록 한다. 그리고, 메타데이터의 검색 및 악보 데이터로 간단한 검색을 위해서, 음악 검색 서비스 제공 서버(300)는 (e)와 같이 데이터베이스를 구축할 수도 있다.After receiving the query in various ways and outputting the search result, the music search service providing server 300 does not end with only feedback of the result, but provides a sound source corresponding to the search result as shown in (d). By streaming and playing, searching for music videos related to sound sources, or tossing the interface to purchase sound sources, users can enjoy various services without additional search. In addition, the music search service providing server 300 may construct a database as shown in (e) in order to search for metadata and for simple search with sheet music data.

이와 같은 도 2 및 도 3의 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1을 통해 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters not described for the method of providing a music search service using a CBMR-based sound of FIGS. 2 and 3 are the same as or described above with respect to the method of providing a music search service using a CBMR-based sound through FIG. Since it can be easily inferred from the contents, the description will be omitted below.

도 4는 본 발명의 일 실시예에 따른 도 1의 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템에 포함된 각 구성들 상호 간에 데이터가 송수신되는 과정을 나타낸 도면이다. 이하, 도 4를 통해 각 구성들 상호간에 데이터가 송수신되는 과정의 일 예를 설명할 것이나, 이와 같은 실시예로 본원이 한정 해석되는 것은 아니며, 앞서 설명한 다양한 실시예들에 따라 도 4에 도시된 데이터가 송수신되는 과정이 변경될 수 있음은 기술분야에 속하는 당업자에게 자명하다.FIG. 4 is a diagram illustrating a process of transmitting and receiving data between components included in the system for providing a music search service using a CBMR-based sound of FIG. 1 according to an embodiment of the present invention. Hereinafter, an example of a process in which data is transmitted/received between each component will be described with reference to FIG. 4, but the present application is not limitedly interpreted as such an embodiment, and is illustrated in FIG. 4 according to various embodiments described above. It is apparent to those skilled in the art that the process of transmitting and receiving data may be changed.

도 4를 참조하면, 음악 검색 서비스 제공 서버(300)는, 적어도 하나의 음원 제공 서버(400)로부터 음원 콘텐츠, 메타데이터, 음 정보 등 다양한 정보를 수집(S4100)하여 데이터베이스를 구축할 수 있고(S4200), 사용자 단말(100)에서 음성 또는 화면 상의 인터페이스를 통하여 구동신호를 트리거시켜(S4400), 검색 인터페이스가 구동하고(S4400), 오디오 신호(음원 콘텐츠)를 추출하여(S4500, S4700) 수신하면(S4710), 수신한 음원을 키워드로 검색을 실행한다(S4800). 만약, 사용자 단말(100) 자체에서 오디오가 출력되는 경우가 아니라면 마이크를 구동시켜 음원이 추출될 수도 있다(S4600, S4700).Referring to FIG. 4, the music search service providing server 300 may collect various information such as sound source content, metadata, and sound information from at least one sound source providing server 400 (S4100) to build a database ( S4200), the user terminal 100 triggers a driving signal through a voice or an interface on the screen (S4400), the search interface is driven (S4400), and an audio signal (sound source content) is extracted (S4500, S4700) (S4710), the received sound source is searched by keyword (S4800). If audio is not output from the user terminal 100 itself, a sound source may be extracted by driving a microphone (S4600 and S4700).

그리고, 음악 검색 서비스 제공 서버(300)는 그 결과값을 사용자 단말(100)로 반환하며(S4900), 스트리밍 또는 구매를 할 수 있도록 사용자 단말(100)과 적어도 하나의 음원 제공 서버(400) 간을 연결하고(S4910), 채널을 형성하거나 중개자 역할을 수행함으로써 사용자 단말(100)에게 후속 서비스가 제공될 수 있도록 한다(S4920).Then, the music search service providing server 300 returns the result value to the user terminal 100 (S4900), and between the user terminal 100 and at least one sound source providing server 400 to enable streaming or purchase. Connects (S4910) and forms a channel or acts as an intermediary so that a subsequent service can be provided to the user terminal 100 (S4920).

상술한 단계들(S4100~S4920)간의 순서는 예시일 뿐, 이에 한정되지 않는다. 즉, 상술한 단계들(S4100~S4920)간의 순서는 상호 변동될 수 있으며, 이중 일부 단계들은 동시에 실행되거나 삭제될 수도 있다.The order between the above-described steps S4100 to S4920 is only an example and is not limited thereto. That is, the order of the above-described steps (S4100 to S4920) may be mutually changed, and some of the steps may be executed or deleted at the same time.

이와 같은 도 4의 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 3을 통해 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters not described for the method of providing a music search service using a CBMR-based sound of FIG. 4 are the same as or described above with respect to the method of providing a music search service using a CBMR-based sound through FIGS. 1 to 3. Since it can be easily inferred from the contents, the description will be omitted below.

도 5는 본 발명의 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법을 설명하기 위한 동작 흐름도이다. 도 5를 참조하면, 음악 검색 서비스 제공 서버는, 사용자 단말로부터 음원 콘텐츠가 스트리밍 또는 재생되는 화면 상에 음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트가 출력되면, 음원 콘텐츠를 오디오 파일로 수집하여 실시간 스트리밍받는다(S5100).5 is a flowchart illustrating a method of providing a music search service using a CBMR-based sound according to an embodiment of the present invention. 5, when a music search service providing server outputs a music search event input through voice recognition or a user interface on a screen where sound source content is streamed or played from a user terminal, the sound source content is collected as an audio file. Receive real-time streaming (S5100).

그리고, 음악 검색 서비스 제공 서버는, 실시간 스트리밍으로 수신된 음원 콘텐츠의 일부 또는 전부를 키워드로 내용기반 음악 검색(CBMR: Content-Based Music Retrieval)을 진행하고(S5200), 진행 결과를 피드백 응답으로 사용자 단말로 전송한다(S5300).In addition, the music search service providing server proceeds with a content-based music search (CBMR: Content-Based Music Retrieval) using a keyword based on part or all of the sound source content received by real-time streaming (S5200), and the user receives the progress result as a feedback response. It is transmitted to the terminal (S5300).

이와 같은 도 5의 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 4를 통해 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.The matters not described for the method of providing a music search service using a CBMR-based sound of FIG. 5 are the same as or described above with respect to the method of providing a music search service using a CBMR-based sound through FIGS. 1 to 4 above. Since it can be easily inferred from the contents, the description will be omitted below.

도 5를 통해 설명된 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The method for providing a music search service using a CBMR-based sound according to an embodiment described with reference to FIG. 5 is also implemented in the form of a recording medium including a computer-executable instruction such as an application executed by a computer or a program module. Can be. Computer-readable media can be any available media that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. Further, the computer-readable medium may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 일 실시예에 따른 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.The method for providing a music search service using a CBMR-based sound according to an embodiment of the present invention described above is provided in an application basically installed in a terminal (this may include a program included in a platform or an operating system basically installed in the terminal). It may be executed by, and may be executed by an application (ie, a program) directly installed on the master terminal by a user through an application providing server such as an application store server, an application, or a web server related to the service. In this sense, the method for providing a music search service using a CBMR-based sound according to an embodiment of the present invention described above is implemented as an application (i.e., a program) installed basically in a terminal or directly installed by a user, and It can be recorded on a recording medium that can be read by.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The above description of the present invention is for illustrative purposes only, and those of ordinary skill in the art to which the present invention pertains will be able to understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims to be described later rather than the detailed description, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

Claims

음원 콘텐츠가 스트리밍 또는 재생되는 화면 상에 음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트를 출력하고, 상기 음원 콘텐츠를 오디오 파일로 수집하여 음악 검색을 위한 질의(Query) 키워드로 추출하여 송출하고, 상기 입력값에 대응하는 피드백을 수신하여 출력하는 사용자 단말; 및
상기 사용자 단말에서 출력된 음악 검색 이벤트를 수신하는 수신부, 상기 사용자 단말에서 출력되는 음원 콘텐츠를 수집하여 실시간 스트리밍하는 스트리밍부, 상기 실시간 스트리밍으로 수신된 음원 콘텐츠의 일부 또는 전부를 키워드로 내용기반 음악 검색(CBMR: Content-Based Music Retrieval)을 진행하는 검색부, 및 진행 결과를 피드백 응답으로 상기 사용자 단말로 전송하는 전송부를 포함하는 음악 검색 서비스 제공 서버;
를 포함하는, CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템.
Outputs a music search event input through voice recognition or a user interface on a screen where sound source content is streamed or played, collects the sound source content as an audio file, extracts it as a query keyword for music search, and transmits it, A user terminal receiving and outputting a feedback corresponding to the input value; And
A receiver for receiving a music search event output from the user terminal, a streaming unit for collecting and real-time streaming of sound source content output from the user terminal, and a content-based music search for part or all of the sound source content received by the real-time streaming as keywords A music search service providing server including a search unit for performing (CBMR: Content-Based Music Retrieval), and a transmission unit for transmitting a progress result to the user terminal as a feedback response;
Including, a music search service providing system using a CBMR-based sound.

제 1 항에 있어서,
상기 사용자 단말은,
상기 음원 콘텐츠가 출력되는 화면 상에 상기 음악 검색 이벤트를 오버레이하여 출력하고,
상기 음원 콘텐츠가 오디오 파일로 수집되는 시간은 기 설정된 시간 또는 상기 사용자 단말에서 상기 오버레이된 음악 검색 이벤트를 터치한 시간인 것을 특징으로 하는 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템.
The method of claim 1,
The user terminal,
The music search event is overlaid on the screen on which the sound source content is output and output,
The time when the sound source content is collected as an audio file is a preset time or a time when the user terminal touches the overlaid music search event.

제 1 항에 있어서,
상기 사용자 단말은,
음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트를 출력하고, 상기 사용자 단말에 내장 또는 외장된 마이크를 통하여 입력된 음원 콘텐츠를 오디오 파일로 수집하여 음악 검색을 위한 질의 키워드로 추출하여 상기 음악 검색 서비스 제공 서버로 전송하는 것을 특징으로 하는 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템.
The method of claim 1,
The user terminal,
The music search service by outputting a music search event input through voice recognition or a user interface, collecting sound source content input through a microphone built-in or external to the user terminal as an audio file, and extracting it as a query keyword for music search A music search service providing system using a CBMR-based sound, characterized in that transmitting to a providing server.

삭제delete

제 1 항에 있어서,
상기 음악 검색 서비스 제공 서버는,
상기 사용자 단말의 마이크를 통하여 입력된 오디오 신호를 질의 키워드로 설정하는 경우, 상기 오디오 신호에 포함된 잡음을 제거하고 왜곡을 보정하여 전처리를 수행하는 전처리부;
를 더 포함하는 것을 특징으로 하는 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템.
The method of claim 1,
The music search service providing server,
A preprocessor for performing preprocessing by removing noise included in the audio signal and correcting distortion when setting the audio signal input through the microphone of the user terminal as a query keyword;
A music search service providing system using a CBMR-based sound, characterized in that it further comprises.

제 1 항에 있어서,
상기 음악 검색 서비스 제공 서버는,
적어도 하나의 음원 콘텐츠로부터 음표 및 박자를 추출하여 적어도 하나의 악보 데이터를 구축하고, 상기 적어도 하나의 악보 데이터와 상기 적어도 하나의 음원 콘텐츠를 매핑하여 저장하는 데이터베이스;
를 더 포함하고,
상기 사용자 단말로부터 송출된 질의 키워드에 포함된 오디오 파일로부터 음표 및 박자를 추출하여 악보 데이터를 생성하고, 상기 사용자 단말의 악보 데이터와 상기 데이터베이스의 악보 데이터를 비교하여 피드백을 상기 사용자 단말로 전송하는 것을 특징으로 하는 CBMR 기반 음을 이용한 음악 검색 서비스 제공 시스템.
The method of claim 1,
The music search service providing server,
A database for constructing at least one sheet music data by extracting notes and beats from at least one sound source content, and mapping and storing the at least one sheet music data and the at least one sound source content;
Including more,
Generating score data by extracting notes and beats from the audio file included in the query keyword sent from the user terminal, comparing the score data of the user terminal with the score data of the database, and transmitting the feedback to the user terminal. A music search service providing system using CBMR-based sound as a feature.

음악 검색 서비스 제공 서버에서 실행되는 음악 검색 서비스 제공 방법에 있어서,
사용자 단말로부터 음원 콘텐츠가 스트리밍 또는 재생되는 화면 상에 음성인식 또는 사용자 인터페이스를 통하여 입력된 음악 검색 이벤트가 출력되면, 상기 음원 콘텐츠를 오디오 파일로 수집하여 실시간 스트리밍받는 단계;
상기 실시간 스트리밍으로 수신된 음원 콘텐츠의 일부 또는 전부를 키워드로 내용기반 음악 검색(CBMR: Content-Based Music Retrieval)을 진행하는 단계; 및
상기 진행 결과를 피드백 응답으로 상기 사용자 단말로 전송하는 단계;
를 포함하는 CBMR 기반 음을 이용한 음악 검색 서비스 제공 방법.In the music search service providing method executed in the music search service providing server,
When a music search event input through voice recognition or a user interface is output on a screen on which sound source content is streamed or played from a user terminal, collecting the sound source content as an audio file and receiving real-time streaming;
Performing a Content-Based Music Retrieval (CBMR) for part or all of the sound source content received through the real-time streaming as a keyword; And
Transmitting the progress result to the user terminal as a feedback response;
A method for providing a music search service using a CBMR-based sound including a.